From Digital Library to Open Datasets

Embracing a "Collections as Data" Framework




This article discusses the burgeoning “collections as data” movement within the fields of digital libraries and digital humanities. Faculty at the University of Utah’s Marriott Library are developing a collections as data strategy by leveraging existing Digital Library and Digital Matters programs. By selecting various digital collections, small- and large-scale approaches to developing open datasets are explored. Five case studies chronicling this strategy are reviewed, along with testing the datasets using various digital humanities methods, such as text mining, topic modeling, and GIS (geographic information system).

Author Biographies

Rachel Wittmann, Marriott Library, University of Utah

Digital Curation Librarian

Anna Neatrour, Marriott Library, University of Utah

Digital Initiatives Librarian

Rebekah Cummings, Marriott Library, University of Utah

Digital Matters Librarian

Jeremy Myntti, Marriott Library, University of Utah

Head, Digital Library Services


Thomas G. Padilla, “Collections as Data: Implications for Enclosure,” College & Research Libraries News; Chicago 79, no. 6 (June 2018): 296,

Thomas Padilla et al., “The Santa Barbara Statement on Collections as Data (V1),” n.d.,

Christine L. Borgman, “Data Scholarship in the Humanities,” in Big Data, Little Data, No Data: Scholarship in the Networked World (Cambridge, MA: The MIT Press, 2015), 161–201.

Miriam Posner, “Humanities Data: A Necessary Contradiction,” Miriam Posner’s Blog (blog), June 25, 2015,

Thomas Padilla, “Always Already Computational,” Always Already Computational: Collections as Data, 2018,

Thomas Padilla, “Part to Whole,” Collections as Data: Part to Whole, 2019,

“Marriott Library Collections as Data GitHub Repository,” April 16, 2019,

“Century of Black Mormons,” accessed April 25, 2019,

Anna Neatrour et al., “A Clean Sweep: The Tools and Processes of a Successful Metadata Migration,” Journal of Web Librarianship 11, no. 3-4 (October 2, 2017): 194-208, 111,

Anna L. Neatrour, Elizabeth Callaway, and Rebekah Cummings, “Kindles, Card Catalogs, and the Future of Libraries: A Collaborative Digital Humanities Project,” Digital Library Perspectives 34, no. 3 (July 2018): 162–87,

David M. Blei et al., “Latent Dirichlet Allocation,” Journal of Machine Learning Research 3, no. 4/5 (May 15, 2003): 993–1022,

“Mary Nicolovo Juliana, Carbon County, Utah, Carbon County Oral History Project, No. 47, March 30 1973,” Carbon County Oral Histories, accessed April 29, 2019,

“Mrs. Emile Louise Cances, Salt Lake City, Utah, Carbon County Oral History Project, No. CC-25, February 24, 1973,” Carbon County Oral Histories, accessed April 29, 2019,

Nate Housley, “A Distance Reading of Immigration in Carbon County,” Utah Division of State History Blog, 2019,

“Harold Stanley Sanders Matchbooks Collection,” accessed May 8, 2019,; “Harold Stanley Sanders Matchbooks Collection Map,” accessed May 8, 2019,

Rebekah Cummings, David Roh, and Elizabeth Callaway, “Organic and Locally Sourced: Growing a Digital Humanities Lab with an Eye Towards Sustainability,” Digital Humanities Quarterly, 2019.

“Woman’s Exponent Data,”; “Woman’s Exponent Digital Exhibit,”

John Herbert et al., “Getting the Crowd into Obituaries: How a Unique Partnership Combined the World’s Largest Obituary with the Utah’s Largest Historic Newspaper Database,” in Salt Lake City, UT: International Federation of Library Associations and Institutions, 2014,




How to Cite

Wittmann, R., Neatrour, A., Cummings, R., & Myntti, J. (2019). From Digital Library to Open Datasets: Embracing a "Collections as Data" Framework. Information Technology and Libraries, 38(4), 49–61.