Digitization of Text Documents Using PDF/A
DOI:
https://doi.org/10.6017/ital.v37i1.9878Abstract
The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitization of textual documents, following recommendation of using PDF/A as a preferred digitization file format. The authors showed how to convert and combine all the TIFFs with associated metadata into a single PDF/A-2b file for a document. Using open source software with real-life examples, the authors show readers how to convert TIFF images, extract associated metadata and ICC profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container which accommodates all the data from digitization of textual materials, including page-level metadata and/or ICC profiles. With theoretical analysis and empirical examples, PDF/A file format has many advantages over traditional preferred file format TIFF / JPEG2000 for digitization of textual documents.Downloads
Published
2018-03-19
How to Cite
Han, Y., & Wan, X. (2018). Digitization of Text Documents Using PDF/A. Information Technology and Libraries, 37(1), 52–64. https://doi.org/10.6017/ital.v37i1.9878
Issue
Section
Communications
License
Copyright (c) 2018 Information Technology and Libraries
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Authors that submit to Information Technology and Libraries agree to the Copyright Notice.