Skip to main content

 

Scanning Guidelines
Optical Character Recognition & Transcription
Database Management, Metadata & Controlled Vocabulary
Web Delivery


Scanning Guidelines

Although the Upper Arlington Public Library will be adding more photographs and other items from community sources to the UA Archives, the current collection consists primarily of pages and images from the Norwester magazines, Norwester yearbooks, and the Record of Proceedings of the Village (City) of Upper Arlington, Ohio. All Norwester magazine pages were scanned as 600 dpi TIFFs with 24-bit color. The 600 dpi setting exceeds most scanning benchmarks used by local history digitization programs; however, this higher resolution proved useful when isolating and enlarging individual Norwester magazine photographs or illustrations for onscreen display.

Due to the sheer volume of pages in the Norwester yearbooks and the Record of Proceedings of the Village (City) of Upper Arlington, Ohio, each of these pages was scanned using the industry benchmark of 300 dpi. In most instances a 24-bit color setting was selected to provide the most accurately representative image possible. This setting also has the advantage of clearly displaying any colored inks and cover images, as well as minimizing illegibility caused by stains or darkened paper in the original materials. However, due to the large quantity of pages in the Norwester yearbooks, some black and white yearbook pages were scanned using an 8-bit grayscale setting. An additional "de-screening" setting was also used for the Norwester yearbooks and magazines since photographs within these publications were originally printed as halftones. This setting allows any conflict between the original halftone screen and the digital image grid—and the resulting moire effect—to be minimized.

All scanning of individual photographs, Norwester magazines, and the Record of Proceedings of the Village (City) of Upper Arlington, Ohio, has been performed on library premises. Scanning of the Norwester yearbooks was subcontracted to OCLC (Online Computer Library Center). When scanning the Norwester magazines, an entire two-page spread was captured with each scan, and after the scanning process was complete, each page was then isolated by cropping. This procedure cut the required scanning time approximately in half. To accommodate two-page spreads and large format items, an Epson Expression 1640XL 11" x 17" flatbed scanner was used. Adobe Photoshop was used for image manipulation, rotation, cropping, and creation of derivative JPEG images for onscreen display. The low-resolution JPEG compression images reduce file size and online loading time. Automation and batch processing features were also utilized in Photoshop, whenever possible, to save time and minimize potential for error.

Back to top


Optical Character Recognition & Transcription

After the scanning process was complete, each magazine, yearbook, and document page was processed using optical character recognition (OCR) software. By performing OCR, headlines, articles, and advertisements were automatically zoned and processed, and a text transcription file was created for each page. Most pages were processed using ABBYY FineReader optical character recognition during ingestion into the CONTENTdm database management software. These transcription files were then associated with each page’s image, appearing as the contents of an administrative metadata field, "Full Text". The contents of this field are available whenever a keyword search is performed, allowing documents within the UA Archives to be full text searchable. In accordance with standard practices for large-scale document digitization and due to the large volume of text generated, typically the "Full Text" field contains raw, uncorrected text. Supporting PDF files were also generated via CONTENTdm for printing and display purposes.

Back to top


Database Management, Metadata & Controlled Vocabulary

For management of the UA Archives digital collection and associated metadata, CONTENTdm software was selected. Its features allow the storage and management of single images; compound documents, such as magazines and books; three-dimensional artifacts with multiple associated views; and multimedia files. In addition, cataloging of materials is flexible and supports use of the Dublin Core Metadata Initiative’s metadata scheme. The software provides easily customizable support for controlled vocabularies and includes the Library of Congress Thesaurus for Graphic Materials, widely used by the digitization community for subject categorization of local history collections. CONTENTdm is also Z39.50 compatible, OAI compliant, and allows the export of data to non-proprietary formats such as XML, SGML, and tab-delimited text.

The Dublin Core metadata scheme was selected for the UA Archives collection since it is currently the most widely adopted digital collection metadata standard within the public library field. Some of the descriptive metadata elements displayed to users have been assigned more user-friendly names while certain elements used for administrative and structural purposes are hidden from display. As previously mentioned, CONTENTdm provides customizable support for controlled vocabularies, which was utilized for several metadata elements. The Library of Congress Thesaurus for Graphic Materials is included in CONTENTdm and was used as the controlled vocabulary for the "Subject/s" element. However, additional controlled vocabularies recommended by the Dublin Core Metadata Initiative were manually entered for other elements. The process of entering values for each item’s metadata elements was streamlined, as templates were created to automatically fill in repeating values for related items, such as multiple pages from a magazine issue.

However, not all cataloging processes can or should be streamlined. For instance, entries for the "Description" metadata element in the historical Norwester images collection are primarily culled from an image’s accompanying magazine article or advertisement, and additional research is then conducted to identify other sources of useful information about the subject.

Back to top


Web Delivery

The Upper Arlington Public Library utilizes OCLC’s hosted version of CONTENTdm. Items and their associated metadata are uploaded to the appropriate UA Archives collection on OCLC’s server, which is also where CONTENTdm’s web-based search client resides. In 2011, the entire UA Archives website was redesigned and rebuilt to take advantage of improved technologies and to add social features, such as tagging, commenting, and sharing through various social media. Additional web pages have been created on OCLC’s server, allowing users to view an overall description of the UA Archives program, descriptions of each collection, technical information, and collection policies. Direct links to the Upper Arlington Public Library, the Upper Arlington Historical Society, the Upper Arlington City School District, the Upper Arlington Alumni Association, the City of Upper Arlington, and other partners are also included. Users may browse entire collections or access items directly through a Search box at the top of each page, which allows simple keyword searches across every collection and also provides access to more advanced search functions. For further assistance when searching for items, users are also encouraged to access the Help feature at the top of each page.

Select the collections to add or remove from your search
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
 
OK