Abstract
Creating an archived website that is as close as possible to the original, live website remains one of the most difficult challenges in the field of web archiving. Failing to adequately capture a website might mean an incomplete historical record or, worse, no evidence that the site ever even existed. This paper presents a grounded theory of quality for web archives created using data from web archivists. In order to achieve this, I analysed support tickets submitted by clients of the Internet Archive’s Archive-It (AIT), a subscription-based web archiving service that helps organisations build and manage their own web archives. Overall, 305 tickets were analysed, comprising 2544 interactions. The resulting theory is comprised of three dimensions of quality in a web archive: correspondence, relevance, and archivability. The dimension of correspondence, defined as the degree of similarity or resemblance between the original website and the archived website, is the most important facet of quality in web archives, and it is the main focus of this work. This paper’s contribution is that it presents the first theory created specifically for web archives and lays the groundwork for future theoretical developments in the field. Furthermore, the theory is human-centred and grounded in how users and creators of web archives perceive their quality. By clarifying the notion of quality in a web archive, this research will be of benefit to web archivists and cultural heritage institutions.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ainsworth, S.G., Nelson, M.L.: Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive. Int. J. Dig. Libraries 16(2), 129–144 (2014). https://doi.org/10.1007/s00799-014-0120-4
Ainsworth, S.G., Nelson, M.L., Van de Sompel, H.: A framework for evaluation of composite memento temporal coherence. Computing Research Respository (CoRR) abs/1402.0928 (2014), http://arxiv.org/abs/1402.0928
AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Detecting off-topic pages in web archives. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) TPDL 2015. LNCS, vol. 9316, pp. 225–237. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24592-8_17
Archive-It: Learn more (2020). https://archive-it.org/learn-more
Banos, V., Manolopoulos, Y.: A quantitative approach to evaluate website archivability using the CLEAR+ method. Int. J. Dig. Libraries 17(2), 119–141 (2015). https://doi.org/10.1007/s00799-015-0144-4
Brunelle, J., Kelly, M., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: Not allmementos are created equal: measuring the impact of missing resources. Int. J. Dig. Libraries 1, 1–19 (2015). https://doi.org/10.1007/s00799-015-0150-6
Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: The SHARC framework for data quality in web archiving. VLDB J. 20(2), 183–207 (2011). https://doi.org/10.1007/s00778-011-0219-9
Glaser, B., Strauss, A.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transaction (2009). http://amazon.com/o/ASIN/0202302601/
Glaser, B.: Theoretical Sensitivity: Advances in the Methodology of Grounded Theory. The Sociology Press, Mill Valley (1978)
Grbich, C.: Qualitative Data Analysis: An Introduction, 2nd edn. SAGE Publications Ltd, London (2012)
Kiesel, J., Kneist, F., Alshomary, M., Stein, B., Hagen, M., Potthast, M.: Reproducible web corpora: interactive archiving with automatic quality assessment. J. Data Inf. Qual. 10(4), 10 (2018). https://doi.org/10.1145/3239574
Klein, M., Shankar, H., Balakireva, L., Van de Sompel, H.: The memento tracer framework: Balancing quality and scalability for web archiving. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) Digital Libraries for Open Knowledge, pp. 163–176. Springer International Publishing, Cham (2019)
Masanès, J.: Web Archiving. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-46332-0
Ohio State University: Causal reasoning (2011). http://www.istarassessment.org/srdims/causal-reasoning-2/
Poursardar, F., Shipman, F.: How perceptions of web resource boundaries differ for institutional and personal archives. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI). pp. 126–129 (2018). https://doi.org/10.1109/IRI.2018.00026
QSR International: Nvivo product range (2016). http://www.qsrinternational.com/nvivo-product
Spaniol, M., Mazeika, A., Denev, D., Weikum, G.: “Catch me if you can”: Visual analysis of coherence defects in web archiving. In: Proceedings of the 9th International Web Archiving Workshop (IWAW), Corfu, Greece, September 30–October 1, 2009. pp. 27–37 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ayala, B.R. (2020). Correspondence as the Primary Measure of Quality for Web Archives: A Grounded Theory Study. In: Hall, M., Merčun, T., Risse, T., Duchateau, F. (eds) Digital Libraries for Open Knowledge. TPDL 2020. Lecture Notes in Computer Science(), vol 12246. Springer, Cham. https://doi.org/10.1007/978-3-030-54956-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-54956-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54955-8
Online ISBN: 978-3-030-54956-5
eBook Packages: Computer ScienceComputer Science (R0)