skip to main content
10.1145/1146847.1146865acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinfoscaleConference Proceedingsconference-collections
Article

CiteSeerχ: a scalable autonomous scientific digital library

Published:30 May 2006Publication History

ABSTRACT

CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.

References

  1. S. Lawrence, K. D. Bollacker, and C. L. Giles, "Indexing and retrieval of scientific literature." in CIKM. ACM, 1999, pp. 139--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. L. Giles, K. Bollacker, and S. Lawrence, "CiteSeer: An automatic citation indexing system," in Digital Libraries 98 - The Third ACM Conference on Digital Libraries, I. Witten, R. Akscyn, and F. M. Shipman III, Eds. Pittsburgh, PA: ACM Press, June 23-26 1998, pp. 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. "Smealsearch," http://smealsearch.psu.edu.Google ScholarGoogle Scholar
  4. H. Anan, X. Liu, K. Maly, M. Nelson, M. Zubair, J. C. French, E. Fox, and P. Shivakumar, "Preservation and transition of ncstrl using an oai-based architecture," in JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries. New York, NY, USA: ACM Press, 2002, pp. 181--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. "Computing research repository. http://arxiv.org/corr/home."Google ScholarGoogle Scholar
  6. G. Crane, "Building a digital library: the perseus project as a case study in the humanities," in DL '96: Proceedings of the first ACM international conference on Digital libraries. New York, NY, USA: ACM Press, 1996, pp. 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Lagoze, W. Arms, S. Gan, D. Hillmann, C. Ingram, D. Krafft, R. Marisa, J. Phipps, J. Saylor, C. Terrizzi, W. Hoehn, D. Millman, J. Allan, S. Guzman-Lara, and T. Kalt, "Core services in the architecture of the national science digital library (nsdl)," in JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries. New York, NY, USA: ACM Press, 2002, pp. 201--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. "Science direct. http://www.sciencedirect.com."Google ScholarGoogle Scholar
  9. "Google scholar. http://scholar.google.com."Google ScholarGoogle Scholar
  10. S. Bradshaw, A. Scheinkman, and K. Hammond, "Guiding people to information: providing an interface to a digital library using reference as a basis for indexing," in IUI '00: Proceedings of the 5th international conference on Intelligent user interfaces. New York, NY, USA: ACM Press, 2000, pp. 37--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Stribling, I. G. Councill, J. Li, M. F. Kaashoek, D. R. Karger, R. Morris, and S. Shenker, "Overcite: A cooperative digital research library," in Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS05), Ithaca, NY, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Kahn and R. Wilensky, "A framework for distributed digital services," http://www.cnri.reston.va.us/home/cstr/arch/k-w.html, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. "The simple digital library interoperability protocol (sdlip-core)," http://dbpubs.stanford.edu:8091/testbed/doc2/SDLIP//.Google ScholarGoogle Scholar
  14. M. D. Giacomo, M. Martinez, and J. Scott, "A large-scale digital library system to integrate heterogeneous data of distributed databases." in Euro-Par, 2004, pp. 391--397.Google ScholarGoogle Scholar
  15. A. Kumar, R. Saigal, R. Chavez, and N. Schwertner, "Architecting an extensible digital repository," in JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries. New York, NY, USA: ACM Press, 2004, pp. 2--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. "Gendl -- generic digital library," http://elib.cs.berkeley.edu.Google ScholarGoogle Scholar
  17. "Greenstone digital library software," http://www.greenstone.org/cgi-bin/library.Google ScholarGoogle Scholar
  18. "Dspace digital repository system," http://www.dspace.org/.Google ScholarGoogle Scholar
  19. T. Staples, R. Wayland, and S. Payette, "The fedora project: An open-source digital object repository system," D-LIb Magazine, Vol. 9, April 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. C. Lagoze, S. Payette, E. Shin, and C. Wilper, "Fedora: An architecture for complex objects and their relationships," Journal of Digital Libraries, Special Issue on Complex Objects, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma, "Object-level ranking: bringing order to web objects," in WWW '05: Proceedings of the 14th international conference on World Wide Web. New York, NY, USA: ACM Press, 2005, pp. 567--574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Lawrence, C. L. Giles, and K. Bollacker, "Digital libraries and Autonomous Citation Indexing," IEEE Computer, Vol. 32, no. 6, pp. 67--71, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Garfield, "Science citation index a new dimension in indexing," Science, Vol. 144, pp. 649--654, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  24. Linux Virtual Servers for Scalable Network Services, 2000.Google ScholarGoogle Scholar
  25. I. Councill, H. Li, Z. Zhuang, S. Debnath, L. Bolelli, W. Lee, A. Sivasubramaniam, and C. Giles, "Learning metadata from the evidence in an on-line citation matching scheme," submitted.Google ScholarGoogle Scholar
  26. V. I. Levenshtein, "Binary codes capable of correcting spurious insertions and deletions of ones," Problems of Information Transmission, Vol. 1, pp. 8--17, 1965.Google ScholarGoogle Scholar
  27. R. Kahn and R. Wilensky, "A framework for distributed digital object services," Working Paper, cnri.dlib/tn95-01, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Petinot, C. L. Giles, V. Bhatnagar, P. B. Teregowda, H. Han, and I. G. Councill, "Citeseer-api: towards seamless resource location and interlinking for digital libraries." in CIKM, D. Grossman, L. Gravano, C. Zhai, O. Herzog, and D. A. Evans, Eds. ACM, 2004, pp. 553--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Petinot, P. B. Teregowda, H. Han, C. L. Giles, S. Lawrence, A. Rangaswamy, and N. Pal, "ebizsearch: An oai-compliant digital library for ebusiness." in JCDL. IEEE Computer Society, 2003, pp. 199--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. "ebizsearch," http://www.ebizsearch.org.Google ScholarGoogle Scholar
  31. D. Horn, H. Balakrishnan, B. T. Maniampadavathu, J. Warnes, and D. A. Elko, "A logger system based on web services," IBM Systems Journal, Vol. 43(4), pp. 723--733, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CiteSeerχ: a scalable autonomous scientific digital library

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  InfoScale '06: Proceedings of the 1st international conference on Scalable information systems
                  May 2006
                  512 pages
                  ISBN:1595934286
                  DOI:10.1145/1146847

                  Copyright © 2006 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 30 May 2006

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  InfoScale '06 Paper Acceptance Rate33of91submissions,36%Overall Acceptance Rate33of91submissions,36%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader