Assigning Intra-European Ancestry to Identical-by-Descent Segments using a Large Database of Self-Reported Ancestry. J. M. Macpherson, B. T. Naughton, C. B. Do, J. Y. Tung, J. L. Mountain 23andMe.com, Mountain View, CA.

   For assignment of ancestry to a given genomic region, traditional autosomal ancestry analyses rely on probabilistic models of haplotype frequencies. This approach has been successful in assigning ancestry in individuals with ancestry from widely-separated geographic regions, for example in admixture mapping studies of African-American and Latino populations. However, this approach has difficulty in discriminating between haplotypes from more closely-related populations. Here we introduce a method for autosomal ancestry assignment using identical-by-descent (IBD) segments from a large database of individuals of European ancestry who have themselves provided information about their, their parents, and their grandparents ancestry. The method is frequently able to identify the European countries of origin of segments in individuals of known ancestry correctly, which suggests its use in identifying the origin of segments in individuals of unknown ancestry. The method is based on the idea that, if an individual shares an IBD segment with an individual of uniform ancestry from a given country, the segment likely derives from that country. To guard against the possibility of erroneous or misleading ancestry information, we use a procedure based on principal components analysis to filter the dataset. We examined the concordance of the methods results with the individuals own self-reported ancestry information; depending on the country of origin, the method correctly identifies European country of origin from 55% to 85% of the time, and correctly identifies European region of origin 65 to 100% of the time. We also explore the accuracy of the method in Ashkenazi Jewish individuals, finding 85% concordance in individuals with self-reported Ashkenazi Jewish ancestry. We conclude by analyzing how this methods coverage and accuracy depend on database size and mean population IBD sharing.

