Text Analysis of Archival Finding Aids: Collection Scoping and Beyond

Anne Bahde; Cara Key

doi:10.5860/ital.v43i4.17065

Authors

Anne Bahde Oregon State University
Cara Key Oregon State University

DOI:

https://doi.org/10.5860/ital.v43i4.17065

Keywords:

text analysis, archives, Machine learning

Abstract

Archival repositories must be strategic and selective in deciding what collections they will acquire and steward. Careful collection stewards balance many factors, including ongoing resource needs and future research use. They ensure new acquisitions build upon existing topical strengths in the repository’s holdings and reassess these existing strengths regularly through multiple lenses. In this study, we examine the suitability of text analysis as a method for analyzing collection scope strengths across a repository’s physical archival holdings. We apply a tool for text analysis called Leximancer to analyze a corpus of archival finding aids to explore topical coverage. Leximancer results were highly aligned with the baseline subject heading analysis that we performed, but the concepts, themes, and co-occurring topic pairs surfaced by Leximancer suggest areas of collection strength and potential focus for new acquisitions. We discuss the potential applications of text analysis for internal library use including collection development, as well as potential implications for wider description, discovery, and access. Text analysis can accurately surface topical strengths and directly lead to insights that can inform future acquisition decisions and archival collection development policies.

References

A. Sharma, K. Barrett, and K. Stapelfeldt, “Natural Language Processing for Virtual Reference Analysis,” Evidence Based Library and Information Practice 17, no. 1 (2022): 78–93, https://doi.org/10.18438/eblip30014.

Alexis A. Antracoli et al., “Archives for Black Lives in Philadelphia: Anti-Racist Description Resources,” Archives for Black Lives in Philadelphia’s Anti-Racist Description Working Group, last updated September 2020, p. 5, https://archivesforblacklives.files.wordpress.com/2020/11/ardr_202010.pdf.

Ben and Sarah Brumfield, “Ten Ways AI will Change Archives,” email message, January 26, 2024. See for example Erin Wolfe, “ChronoNLP: Exploration and Analysis of Chronological Textual Corpora,” Code4Lib Journal 57 (2023), https://journal.code4lib.org/articles/17502.

Chela Weber et al., Total Cost of Stewardship: Responsible Collection Building in Archives and Special Collections (Dublin, OH: OCLC Research, 2021), 8, https://doi.org/10.25333/ZBH0-A044.

Components of the baseline subject analysis including the complete LCSH topic model for SCARC finding aids, the original set of subject headings, and overviews of both top-level and right-sized topics are available in the OSF project repository.

Describing Archives: A Content Standard (DACS) (Chicago: Society of American Archivists, 2015).

EAD 2002 W3C Schema, http://www.loc.gov/ead/ead.xsd.

Emily Haynes et al., “Semiautomated Text Analytics for Qualitative Data Synthesis,” Research Synthesis Methods 10, no. 3 (September 2019): 459, https://doi.org/10.1002/jrsm.1361.

Erin Wolfe, “Natural Language Processing in the Humanities: A Case Study in Automated Metadata Enhancement,” The Code4lib Journal 46 (2019), https://journal.code4lib.org/articles/14834; Ryan Cordell, “Machine Learning and Libraries: A Report on the State of the Field,” LC Labs, Library of Congress, July 14, 2020, pp. 12, 32–33, https://labs.loc.gov/static/labs/work/reports/Cordell-LOC-ML-report.pdf?loclr=blogsig.

Jane Greenberg, “The Applicability of Natural Language Processing (NLP) to Archival Properties and Objectives,” The American Archivist 61, no. 2 (1998): 421, https://doi.org/10.17723/aarc.61.2.j3p8200745pj34v6.

Jonathan O. Cain, “Using Topic Modeling to Enhance Access to Library Digital Collections,” Journal of Web Librarianship 10, no. 3 (2016): 210–25, https://doi.org/10.1080/19322909.2016.1193455; Kate Gregory, Lauren Geiger, and Preston Salisbury, “Voyant Tools and Descriptive Metadata: A Case Study in How Automation Can Compliment Expertise Knowledge,” Journal of Library Metadata 22 no. 1/2 (January 2022): 1–16, https://doi.org/10.1080/19386389.2022.2030635.

Joo Soohyung, Erin Ingram, and Maria Cahill, “Exploring Topics and Genres in Storytime Books: A Text Mining Approach,” Evidence Based Library and Information Practice 16, no. 4 (2021): 41–62, https://doi.org/10.18438/eblip29963.

Kiri L. Wagstaff and Geoffrey Z. Liu, “Automated Classification to Improve the Efficiency of Weeding Library Collections,” The Journal of Academic Librarianship 44, no. 2 (2018): 238–47, https://doi.org/10.1016/j.acalib.2018.02.001.

Leximancer, Leximancer User Guide, Release 4.5, Leximancer Pty Ltd, March 10, 2021, p. 104, https://static1.squarespace.com/static/5e26633cfcf7d67bbd350a7f/t/60682893c386f915f4b05e43/1617438916753/Leximancer+User+Guide+4.5.pdf.

Mary Kidd et al., Total Cost of Stewardship: Tool Suite (OCLC Research, 2021), https://doi.org/10.25333/4bqc-5k43; Martha O’Hara Conway and Merrilee Proffitt, Taking Stock and Making Hay: Archival Collections Assessment (OCLC Research, 2011), https://doi.org/10.25333/C33S6M.

Melissa Harden, “First-Year Students and the Framework: Using Topic Modeling to Analyze Student Understanding of the Framework for Information Literacy for Higher Education,” Evidence Based Library & Information Practice 14, no. 2 (June 2019): 51–69, https://doi.org/10.18438/eblip29514.

Monika Glowacka-Musial, “Applying Topic Modeling for Automated Creation of Descriptive Metadata for Digital Collections,” Information Technology & Libraries 41, no. 2 (2022), https://doi.org/10.6017/ital.v41i2.13799.

Patricia J. Rettig, “Collecting Water: An Analysis of a Multidisciplinary Special-Subject Archives,” The American Archivist 80, no. 1 (2017): 82–102, https://doi.org/10.17723/0360-9081.80.1.82.

Qiana Johnson, “Moving from Analysis to Assessment: Strategic Assessment of Library Collections,” Journal of Library Administration 56, no. 4 (2016): 496, https://doi.org/10.1080/01930826.2016.1157425.

R. Litsey and W. Mauldin, “Knowing What the Patron Wants: Using Predictive Analytics to Transform Library Decision Making,” Journal of Academic Librarianship 44, no. 1 (2018): 140–45, https://doi.org/10.1016/j.acalib.2017.09.004.

Ryan Cordell, “Closing the Loop: Bridging Machine Learning (ML) Research and Library Systems,” Library Trends 71, no. 1 (2022): 132–43, https://doi.org/10.1353/lib.2023.0008; Christopher A. Lee, “Computer-Assisted Appraisal and Selection of Archival Materials,” in 2018 IEEE International Conference on Big Data (Big Data) (Seattle, WA: IEEE, 2018), 2721–24, https://doi.org/10.1109/BigData.2018.8622267; Thomas Padilla et al., “Final Report—Always Already Computational: Collections as Data,” Zenodo, May 22, 2019, https://doi.org/10.5281/zenodo.3152935.

Tiah Edmunson-Morton, “Oregon Hops and Brewing Guide,” https://guides.library.oregonstate.edu/brewingarchives; Natalia Fernández, “OSU Queer Archives: OSQA,” https://guides.library.oregonstate.edu/osqa.

Tim Hutchinson, “Natural Language Processing and Machine Learning as Practical Toolsets for Archival Processing,” Records Management Journal 30, no. 2 (2020): 12, https://doi.org/10.1108/RMJ-09-2019-0055.

Yewno Discover, for example, creates knowledge graphs of related materials and was integrated into Ex Libris’ Primo discovery in 2020. Scott Scheutze, “Enhance Resource Exploration through A New Yewno App,” ExLibris, October 19, 2020, https://exlibrisgroup.com/blog/enhance-resource-exploration-in-primo-through-a-new-yewno-app/; Anne Bahde, “Conceptual Data Visualization in Archival Finding Aids: Preliminary User Responses,” portal: Libraries and the Academy 17, no. 3 (2017): 485–506, https://doi.org/10.1353/pla.2017.0031.

Text Analysis of Archival Finding Aids

Collection Scoping and Beyond

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information