From Linked Open Data to Collections as Data
A Reproducible Framework Using Federated Queries
DOI:
https://doi.org/10.5860/ital.v44i4.17432Keywords:
linked open data, collections as data, libraries, digital collections, reproducible framework, federated queries, cultural heritageAbstract
Libraries are adopting Linked Open Data (LOD) and Collections as Data (CaD) approaches to present their collections as datasets for direct computational use. However, research focused on federated and reproducible access to these datasets is limited. This work aims to develop a federated and reproducible approach for extracting CaD from LOD repositories. In this context, data extracted from the single authors Jorge Juan y Santacilia and María de Zayas y Sotomayor, as well as from multiple authors from the Spanish Golden Age movement (1492–1659), are used as examples. Federated and reproducible queries are conducted using the Wikidata SPARQL public endpoint and three institutional LOD repositories on Jupyter Notebooks. The data are exported in a format compatible with computational tools (e.g., CSV) by focusing on works of a single author or works from a specific movement. Additionally, the work allows for the visualization of the queries. The results of this work provide a valuable framework for both digital humanities researchers working on datasets and libraries aiming to present their collections as accessible data for computational analysis.
References
“Aiohttp: Async Http Client/Server Framework (Asyncio),” MacOS :: MacOS X, Microsoft :: Windows, POSIX, Python, accessed April 7, 2025, https://github.com/aio-libs/aiohttp
“Artificial Intelligence for Libraries, Archives, and Museums,” AI4LAM, https://sites.google.com/view/ai4lam.
“Asyncio: Reference Implementation of PEP 3156,” Python, accessed April 7, 2025, http://www.python.org/dev/peps/pep-3156/
Benjamin Charles Germain Lee, “The ‘Collections as ML Data’ Checklist for Machine Learning and Cultural Heritage,” Journal of the Association of Information Science Technolology 76, no. 2 (2025): 375–96, https://doi.org/10.1002/ASI.24765
Bram Gaakeer et al., “Digitaal Erfgoed Referentie Architectuur (DERA) – Versie 4.0” (Zenodo, 2021), https://doi.org/10.5281/zenodo.5562062
Chris Dijkshoorn et al., “The Rijksmuseum Collection as Linked Data,” Semantic Web 9, no. 2 (2018): 221–30, https://doi.org/10.3233/SW-170257
Chris Drummond, “Reproducible Research: A Minority Opinion,” Journal of Experimental and Theoretical Artificial Intelligence 30, no. 1 (2018): 1–11, https://doi.org/10.1080/0952813X.2017.1413140
“Collections as Data Futures: A Recap, A Resource, Next Steps,” Collections as Data – Part to Whole, May 4, 2023, https://collectionsasdata.github.io/part2whole/recap/
“Collections as Data Interest Group,” Research Data Alliance, https://www.rd-alliance.org/groups/collections-as-data-ig/activity/
“Commission Recommendation of 10.11.2021 on a Common European Data Space for Cultural Heritage” (European Commission, 2021), https://digital-strategy.ec.europa.eu/en/news/commission-proposes-common-european-data-space-cultural-heritage
Consolidation Editorial Group of the IFLA FRBR Review, Pat Riva, Patrick Le Boeuf, and Maja Žumer, IFLA Library Reference Model: A Conceptual Model for Bibliographic Information (International Federation of Library Associations and Institutions, 2025), https://repository.ifla.org/handle/20.500.14598/40.2
Fiona Fidler and John Wilcox, “Reproducibility of Scientific Results,” in The Stanford Encyclopedia of Philosophy, ed. Edward N. Zalta (Metaphysics Research Lab, 2021), https://plato.stanford.edu/archives/sum2021/entries/scientific-reproducibility/
Guillermo Vega-Gorgojo, “LOD4Culture: Easy Exploration of Cultural Heritage Linked Open Data,” Semantic Web 15, no. 5 (2024): 1563–92, https://doi.org/10.3233/SW-233358
Gustavo Candela et al., “A Checklist to Publish Collections as Data in GLAM Institutions,” Global Knowledge, Memory and Communication 74, nos. 5–6 (2023): 1323–55, https://doi.org/10.1108/GKMC-06-2023-0195
Gustavo Candela et al., “A Systematic Review of Wikidata in GLAM Institutions: A Labs Approach,” in Linking Theory and Practice of Digital Libraries—28th International Conference on Theory and Practice of Digital Libraries, TPDL 2024, Ljubljana, Slovenia, September 24–27, 2024, Proceedings, Part II, ed. Apostolos Antonacopoulos et al., 34–50 (Springer, 2024), https://doi.org/10.1007/978-3-031-72440-4_4
Gustavo Candela et al., “An Ontological Approach for Unlocking the Colonial Archive,” Journal on Computing and Cultural Heritage 16, no. 4 (2023): article 74, https://doi.org/10.1145/3594727
Gustavo Candela, “An Automatic Data Quality Approach to Assess Semantic Data from Cultural Heritage Institutions,” Journal of the Association for Information Science and Technology 74, no. 7 (2023): 866–78, https://doi.org/10.1002/asi.24761
Gustavo Candela, “Browsing Linked Open Data in Cultural Heritage: A Shareable Visual Configuration Approach,” Journal of Computational Cultural Heritage 18, no. 1 (2024): article 9, https://doi.org/10.1145/3707647
Gustavo Candela, “Towards a Semantic Approach in GLAM Labs,” Journal of Information Science, advance online publication (2023), https://doi.org/10.1177/01655515231174386
Gustavo Candela, María Dolores Sáez, MPilar Escobar Esteban, and Manuel Marco-Such, “Reusing Digital Collections from GLAM Institutions,” Journal of Information Science 48, no. 2 (2022): 251–67, https://doi.org/10.1177/0165551520950246
Gustavo Candela, Pilar Escobar, María Dolores Sáez, and Manuel Marco-Such, “A Shape Expression Approach for Assessing the Quality of Linked Open Data in Libraries,” Semantic Web 14, no. 2 (2023): 159–79, https://doi.org/10.3233/SW-210441
Gustavo Candela, Pilar Escobar, Rafael C. Carrasco, and Manuel Marco-Such, “A Linked Open Data Framework to Enhance the Discoverability and Impact of Culture Heritage,” Journal of Information Science 45, no. 6 (2019): 756–66, https://doi.org/10.1177/0165551518812658
Gustavo Candela, Sally Chambers, and Alba Irollo, “A Workflow to Publish Collections as Data: The Case of Cultural Heritage Data Spaces,” Social Sciences & Humanities Open Marketplace, 2023, https://marketplace.sshopencloud.eu/workflow/I3JvP6
Gustavo Candela, Sally Chambers, and Tim Sherratt, “An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions,” Journal of the Association for Information Science and Technology 74, no. 13 (2023): 1550–64, https://doi.org/10.1002/asi.24835
Henk Alkemade et al., “Datasheets for Digital Cultural Heritage Datasets,” Journal of Open Humanities Data 9, no. 1 (2023): 17, https://doi.org/10.5334/johd.124
Jonathan Blaney, “Introduction to the Principles of Linked Open Data,” Programming Historian 6 (2017), https://programminghistorian.org/en/lessons/intro-to-linked-data
Laure Barbot et al., “Contextualizing Research Tools & Services Through Workflows in the SSH Open Marketplace,” Journal of Open Humanities Data 10 (2024): 22, https://doi.org/10.5334/johd.192.
Mahendra Mahey et al., Open a GLAM Lab (International GLAM Labs Community, 2019), https://doi.org/10.21428/16ac48ec.f54af6ae
Marilena Daquino et al., “Enhancing Semantic Expressivity in the Cultural Heritage Domain: Exposing the Zeri Photo Archive as Linked Open Data,” ACM Journal on Computing and Cultural Heritage 10, no. 4 (2017): article 21, https://doi.org/10.1145/3051487
Mark D. Wilkinson et al., “The FAIR Guiding Principles for Scientific Data Management and Stewardship,” Scientific Data 3, no. 1 (2016): article 160018, https://doi.org/10.1038/sdata.2016.18
Melanie Feinberg et al., “The New Reality of Reproducibility: The Role of Data Work in Scientific Research,” Proceedings of the ACM on Human-Computer Interaction 4, no. CSCW1 (May 28, 2020): 1–22, https://doi.org/10.1145/3392840
Meltem Dişli, “Veri Olarak Kültürel Miras Koleksiyonları [Cultural Heritage Collections as Data]” (PhD diss., Hacettepe University, 2024), https://openaccess.hacettepe.edu.tr/xmlui/handle/11655/34990
Mikko Koho et al., “WarSampo Knowledge Graph: Finland in the Second World War as Linked Open Data,” Semantic Web 12, no. 2 (2021): 265–78, https://doi.org/10.3233/SW-200392.
Milena Dobreva, Krassen Stefanov, and Krassimira Ivanova, “Data Spaces for Cultural Heritage: Insights from GLAM Innovation Labs,” in From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries, ed. Yuen-Hsien Tseng, Marie Katsurai, and Hoa N. Nguyen, 492–500 (Springer, 2022), https://doi.org/10.1007/978-3-031-21756-2_41
Nora Abdelmageed and Lois Hutubessy, “A Systematic Approach towards Higher Quality Linked Open Data at Nieuwe Instituut,” SEMANTiCS—20th International Conference on Semantic Systems 3795 (2024): paper 9, https://ceur-ws.org/Vol-3759/paper9.pdf
“OpenGLAM Principles,” OpenGlam, https://openglam.org/principles/
Rafael C. Carrasco, Gustavo Candela, and Manuel Marco-Such, “Measuring the Diversity of Data and Metadata in Digital Libraries” (arXiv, 2023), https://doi.org/10.48550/ARXIV.2301.01193
Richard Cyganiak, David Wood, and Markus Lanthaler, eds., “RDF 1.1 Concepts and Abstract Syntax,” World Wide Web Consortium, February 2014, https://www.w3.org/TR/rdf11-concepts/
Sally Chambers et al., “Position Statements: Collections as Data: State of the Field and Future Directions” (Zenodo, 2023), https://doi.org/10.5281/zenodo.7897735
“SPARQL Endpoint Interface to Python,” SPARQLWrapper, http://rdflib.github.io/sparqlwrapper
“SPARQLWrapper”
Stephanie Russo Carroll et al., “The CARE Principles for Indigenous Data Governance,” Data Science Journal 19 (2020): 43, https://doi.org/10.5334/dsj-2020-043
Thomas Padilla et al., “Always Already Computational: Collections as Data” (Zenodo, 2019), https://doi.org/10.5281/zenodo.3152935
Thomas Padilla, “On a Collections as Data Imperative” (UC Santa Barbara, 2017), https://escholarship.org/uc/item/9881c8sv
Thomas Padilla, Hannah Scates Kettler, Stewart Varner, and Yasmeen Shorish, “Vancouver Statement on Collections as Data” (Zenodo, 2023), https://doi.org/10.5281/zenodo.8342171
Tim Berners-Lee, James Hendler, and Olli Lassila, “The Semantic Web in Scientific American,” Scientific American Magazine 284, no. 5 (May 2001): 34–43, https://doi.org/10.1038/scientificamerican0501-34
Tim Sherratt, “GLAM-Workbench/Recordsearch” (Zenodo, 2023), https://doi.org/10.5281/zenodo.7553047
Vernon Gayle and Roxanne Connelly, “The Stark Realities of Reproducible Statistically Orientated Sociological Research: Some Newer Rules of the Sociological Method,” Methodological Innovations 15, no. 3 (2022): 207–21, https://doi.org/10.1177/20597991221111681
“Wikidata:SPARQL Query Service/Federation Report,” Wikidata, last edited October 2, 2025, https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federation_report
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Meltem Dişli, Giulia Osti, Gustavo Candela, Richard Zijdeman

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors that submit to Information Technology and Libraries agree to the Copyright Notice.