ABSTRACT
CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.
- S. Lawrence, K. D. Bollacker, and C. L. Giles, "Indexing and retrieval of scientific literature." in CIKM. ACM, 1999, pp. 139--146. Google ScholarDigital Library
- C. L. Giles, K. Bollacker, and S. Lawrence, "CiteSeer: An automatic citation indexing system," in Digital Libraries 98 - The Third ACM Conference on Digital Libraries, I. Witten, R. Akscyn, and F. M. Shipman III, Eds. Pittsburgh, PA: ACM Press, June 23-26 1998, pp. 89--98. Google ScholarDigital Library
- "Smealsearch," http://smealsearch.psu.edu.Google Scholar
- H. Anan, X. Liu, K. Maly, M. Nelson, M. Zubair, J. C. French, E. Fox, and P. Shivakumar, "Preservation and transition of ncstrl using an oai-based architecture," in JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries. New York, NY, USA: ACM Press, 2002, pp. 181--182. Google ScholarDigital Library
- "Computing research repository. http://arxiv.org/corr/home."Google Scholar
- G. Crane, "Building a digital library: the perseus project as a case study in the humanities," in DL '96: Proceedings of the first ACM international conference on Digital libraries. New York, NY, USA: ACM Press, 1996, pp. 3--10. Google ScholarDigital Library
- C. Lagoze, W. Arms, S. Gan, D. Hillmann, C. Ingram, D. Krafft, R. Marisa, J. Phipps, J. Saylor, C. Terrizzi, W. Hoehn, D. Millman, J. Allan, S. Guzman-Lara, and T. Kalt, "Core services in the architecture of the national science digital library (nsdl)," in JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries. New York, NY, USA: ACM Press, 2002, pp. 201--209. Google ScholarDigital Library
- "Science direct. http://www.sciencedirect.com."Google Scholar
- "Google scholar. http://scholar.google.com."Google Scholar
- S. Bradshaw, A. Scheinkman, and K. Hammond, "Guiding people to information: providing an interface to a digital library using reference as a basis for indexing," in IUI '00: Proceedings of the 5th international conference on Intelligent user interfaces. New York, NY, USA: ACM Press, 2000, pp. 37--43. Google ScholarDigital Library
- J. Stribling, I. G. Councill, J. Li, M. F. Kaashoek, D. R. Karger, R. Morris, and S. Shenker, "Overcite: A cooperative digital research library," in Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS05), Ithaca, NY, February 2005. Google ScholarDigital Library
- R. Kahn and R. Wilensky, "A framework for distributed digital services," http://www.cnri.reston.va.us/home/cstr/arch/k-w.html, 1995. Google ScholarDigital Library
- "The simple digital library interoperability protocol (sdlip-core)," http://dbpubs.stanford.edu:8091/testbed/doc2/SDLIP//.Google Scholar
- M. D. Giacomo, M. Martinez, and J. Scott, "A large-scale digital library system to integrate heterogeneous data of distributed databases." in Euro-Par, 2004, pp. 391--397.Google Scholar
- A. Kumar, R. Saigal, R. Chavez, and N. Schwertner, "Architecting an extensible digital repository," in JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries. New York, NY, USA: ACM Press, 2004, pp. 2--10. Google ScholarDigital Library
- "Gendl -- generic digital library," http://elib.cs.berkeley.edu.Google Scholar
- "Greenstone digital library software," http://www.greenstone.org/cgi-bin/library.Google Scholar
- "Dspace digital repository system," http://www.dspace.org/.Google Scholar
- T. Staples, R. Wayland, and S. Payette, "The fedora project: An open-source digital object repository system," D-LIb Magazine, Vol. 9, April 2003.Google ScholarCross Ref
- C. Lagoze, S. Payette, E. Shin, and C. Wilper, "Fedora: An architecture for complex objects and their relationships," Journal of Digital Libraries, Special Issue on Complex Objects, 2005. Google ScholarDigital Library
- Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma, "Object-level ranking: bringing order to web objects," in WWW '05: Proceedings of the 14th international conference on World Wide Web. New York, NY, USA: ACM Press, 2005, pp. 567--574. Google ScholarDigital Library
- S. Lawrence, C. L. Giles, and K. Bollacker, "Digital libraries and Autonomous Citation Indexing," IEEE Computer, Vol. 32, no. 6, pp. 67--71, 1999. Google ScholarDigital Library
- E. Garfield, "Science citation index a new dimension in indexing," Science, Vol. 144, pp. 649--654, 1964.Google ScholarCross Ref
- Linux Virtual Servers for Scalable Network Services, 2000.Google Scholar
- I. Councill, H. Li, Z. Zhuang, S. Debnath, L. Bolelli, W. Lee, A. Sivasubramaniam, and C. Giles, "Learning metadata from the evidence in an on-line citation matching scheme," submitted.Google Scholar
- V. I. Levenshtein, "Binary codes capable of correcting spurious insertions and deletions of ones," Problems of Information Transmission, Vol. 1, pp. 8--17, 1965.Google Scholar
- R. Kahn and R. Wilensky, "A framework for distributed digital object services," Working Paper, cnri.dlib/tn95-01, 1995. Google ScholarDigital Library
- Y. Petinot, C. L. Giles, V. Bhatnagar, P. B. Teregowda, H. Han, and I. G. Councill, "Citeseer-api: towards seamless resource location and interlinking for digital libraries." in CIKM, D. Grossman, L. Gravano, C. Zhai, O. Herzog, and D. A. Evans, Eds. ACM, 2004, pp. 553--561. Google ScholarDigital Library
- Y. Petinot, P. B. Teregowda, H. Han, C. L. Giles, S. Lawrence, A. Rangaswamy, and N. Pal, "ebizsearch: An oai-compliant digital library for ebusiness." in JCDL. IEEE Computer Society, 2003, pp. 199--209. Google ScholarDigital Library
- "ebizsearch," http://www.ebizsearch.org.Google Scholar
- D. Horn, H. Balakrishnan, B. T. Maniampadavathu, J. Warnes, and D. A. Elko, "A logger system based on web services," IBM Systems Journal, Vol. 43(4), pp. 723--733, 2004. Google ScholarDigital Library
Index Terms
- CiteSeerχ: a scalable autonomous scientific digital library
Recommendations
Sixty-four years of informetrics research: productivity, impact and collaboration
This paper analyses the information science research field of informetrics to identify publication strategies that have been important for its successful researchers. The study uses a micro-analysis of informetrics researchers from 5,417 informetrics ...
Journal self-citation study for semiconductor literature: synchronous and diachronous approach
Special issue: InformetricsThe present study investigates the self-citations of the most productive semiconductor journals by synchronous (self-citing rate) and diachronous (self-cited rate) approaches. Journal's productivity of 100 most productive semiconductor journals was ...
Team size and retracted citations reveal the patterns of retractions from 1981 to 2020
AbstractThe growth of the retraction databases reveals the disturbing trend in science and also the rising trend of citations of retracted papers is a serious concern. The objective of the study is to investigate the patterns of retractions through the ...
Comments