skip to main content
10.1145/3041021.3054728acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

AncestryAI: A Tool for Exploring Computationally Inferred Family Trees

Published:03 April 2017Publication History

ABSTRACT

Many people are excited to discover their ancestors and thus decide to take up genealogy. However, the process of finding the ancestors is often very laborious since it involves comparing a large number of historical birth records and trying to manually match the people mentioned in them. We have developed AncestryAI, an open-source tool for automatically linking historical records and exploring the resulting family trees. We introduce a record-linkage method for computing the probabilities of the candidate matches, which allows the users to either directly identify the next ancestor or narrow down the search. We also propose an efficient layout algorithm for drawing and navigating genealogical graphs. The tool is additionally used to crowdsource training and evaluation data so as to improve the matching algorithm. Our objective is to build a large genealogical graph, which could be used to resolve various interesting questions in the areas of computational social science, genetics, and evolutionary studies. The tool is openly available at: http://emalmi.kapsi.fi/ancestryai/.

References

  1. A. Bezerianos, P. Dragicevic, J.-D. Fekete, J. Bae, and B. Watson. Geneaquilts: A system for exploring large genealogies. IEEE Transactions on Visualization and Computer Graphics, 16(6):1073--1081, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Christen. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Christen, D. Vatsalan, and Z. Fu. Advanced record linkage methods and privacy aspects for population reconstruction--a survey and case studies. In Population Reconstruction, pages 87--110. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Efremova, B. Ranjbar-Sahraei, H. Rahmani, F. A. Oliehoek, T. Calders, K. Tuyls, and G. Weiss. Multi-source entity resolution for genealogical data. In Population Reconstruction, pages 129--154. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64(328):1183--1210, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. R. Garey and D. S. Johnson. Crossing number is NP-complete. SIAM Journal on Algebraic Discrete Methods, 4(3):312--316, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Lazer, A. S. Pentland, L. Adamic, S. Aral, A. L. Barabasi, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann, et al. Life in the network: the coming age of computational social science. Science, 323(5915):721--723, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. E. Malmi, A. Solin, and A. Gionis. The blind leading the blind: Network-based location estimation under uncertainty. In Proc. ECML PKDD, pages 406--421. Springer, 2015.Google ScholarGoogle Scholar
  9. M. J. McGuffin and R. Balakrishnan. Interactive visualization of genealogical graphs. In Proc. INFOVIS, pages 16--23. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. E. Pettay, M. Lahdenperä. Rotkirch, and V. Lummaa. Costly reproductive competition between co-resident females in humans. Behavioral Ecology, pages 1--8, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  11. E. Salmela, T. Lappalainen, J. Liu, P. Sistonen, P. M. Andersen, S. Schreiber, M.-L. Savontaus, K. Czene, P. Lahermo, P. Hall, and J. Kere. Swedish population substructure revealed by genome-wide single nucleotide polymorphism data. PLoS One, 6(2):e16747, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  12. The Genealogical Society of Finland. HisKi project (Web interface). http://hiski.genealogia.fi/hiski?en, Accessed: 2017-01-07.Google ScholarGoogle Scholar
  13. W. E. Winkler. String comparator metrics and enhanced decision rules in the Fellegi--Sunter model of record linkage. In Proceedings of the Section on Survey Research Methods, pages 354--359. American Statistical Assn., 1990.Google ScholarGoogle Scholar
  14. E. Zagheni, V. R. K. Garimella, I. Weber, et al. Inferring international and internal migration patterns from twitter data. In Proc. WWW, pages 439--444. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AncestryAI: A Tool for Exploring Computationally Inferred Family Trees

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader