Skip to main content

Correspondence as the Primary Measure of Quality for Web Archives: A Grounded Theory Study

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12246))

Abstract

Creating an archived website that is as close as possible to the original, live website remains one of the most difficult challenges in the field of web archiving. Failing to adequately capture a website might mean an incomplete historical record or, worse, no evidence that the site ever even existed. This paper presents a grounded theory of quality for web archives created using data from web archivists. In order to achieve this, I analysed support tickets submitted by clients of the Internet Archive’s Archive-It (AIT), a subscription-based web archiving service that helps organisations build and manage their own web archives. Overall, 305 tickets were analysed, comprising 2544 interactions. The resulting theory is comprised of three dimensions of quality in a web archive: correspondence, relevance, and archivability. The dimension of correspondence, defined as the degree of similarity or resemblance between the original website and the archived website, is the most important facet of quality in web archives, and it is the main focus of this work. This paper’s contribution is that it presents the first theory created specifically for web archives and lays the groundwork for future theoretical developments in the field. Furthermore, the theory is human-centred and grounded in how users and creators of web archives perceive their quality. By clarifying the notion of quality in a web archive, this research will be of benefit to web archivists and cultural heritage institutions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ainsworth, S.G., Nelson, M.L.: Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive. Int. J. Dig. Libraries 16(2), 129–144 (2014). https://doi.org/10.1007/s00799-014-0120-4

    Article  Google Scholar 

  2. Ainsworth, S.G., Nelson, M.L., Van de Sompel, H.: A framework for evaluation of composite memento temporal coherence. Computing Research Respository (CoRR) abs/1402.0928 (2014), http://arxiv.org/abs/1402.0928

  3. AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Detecting off-topic pages in web archives. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) TPDL 2015. LNCS, vol. 9316, pp. 225–237. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24592-8_17

    Chapter  Google Scholar 

  4. Archive-It: Learn more (2020). https://archive-it.org/learn-more

  5. Banos, V., Manolopoulos, Y.: A quantitative approach to evaluate website archivability using the CLEAR+ method. Int. J. Dig. Libraries 17(2), 119–141 (2015). https://doi.org/10.1007/s00799-015-0144-4

    Article  Google Scholar 

  6. Brunelle, J., Kelly, M., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: Not allmementos are created equal: measuring the impact of missing resources. Int. J. Dig. Libraries 1, 1–19 (2015). https://doi.org/10.1007/s00799-015-0150-6

    Article  Google Scholar 

  7. Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: The SHARC framework for data quality in web archiving. VLDB J. 20(2), 183–207 (2011). https://doi.org/10.1007/s00778-011-0219-9

    Article  Google Scholar 

  8. Glaser, B., Strauss, A.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transaction (2009). http://amazon.com/o/ASIN/0202302601/

  9. Glaser, B.: Theoretical Sensitivity: Advances in the Methodology of Grounded Theory. The Sociology Press, Mill Valley (1978)

    Google Scholar 

  10. Grbich, C.: Qualitative Data Analysis: An Introduction, 2nd edn. SAGE Publications Ltd, London (2012)

    Google Scholar 

  11. Kiesel, J., Kneist, F., Alshomary, M., Stein, B., Hagen, M., Potthast, M.: Reproducible web corpora: interactive archiving with automatic quality assessment. J. Data Inf. Qual. 10(4), 10 (2018). https://doi.org/10.1145/3239574

    Article  Google Scholar 

  12. Klein, M., Shankar, H., Balakireva, L., Van de Sompel, H.: The memento tracer framework: Balancing quality and scalability for web archiving. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge, pp. 163–176. Springer International Publishing, Cham (2019)

    Chapter  Google Scholar 

  13. Masanès, J.: Web Archiving. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-46332-0

    Book  Google Scholar 

  14. Ohio State University: Causal reasoning (2011). http://www.istarassessment.org/srdims/causal-reasoning-2/

  15. Poursardar, F., Shipman, F.: How perceptions of web resource boundaries differ for institutional and personal archives. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI). pp. 126–129 (2018). https://doi.org/10.1109/IRI.2018.00026

  16. QSR International: Nvivo product range (2016). http://www.qsrinternational.com/nvivo-product

  17. Spaniol, M., Mazeika, A., Denev, D., Weikum, G.: “Catch me if you can”: Visual analysis of coherence defects in web archiving. In: Proceedings of the 9th International Web Archiving Workshop (IWAW), Corfu, Greece, September 30–October 1, 2009. pp. 27–37 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brenda Reyes Ayala .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ayala, B.R. (2020). Correspondence as the Primary Measure of Quality for Web Archives: A Grounded Theory Study. In: Hall, M., Merčun, T., Risse, T., Duchateau, F. (eds) Digital Libraries for Open Knowledge. TPDL 2020. Lecture Notes in Computer Science(), vol 12246. Springer, Cham. https://doi.org/10.1007/978-3-030-54956-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54956-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54955-8

  • Online ISBN: 978-3-030-54956-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics