Technical Information
Scanning Guidelines
Although the Upper Arlington Public Library will be adding more photographs and other items from community sources to the UA Archives, the current collection consists primarily of pages and images from Upper Arlington High School’s Norwester yearbooks and the Upper Arlington community’s Norwester magazine. For the Norwester magazines each scan was captured as a 600 dpi TIFF with 24-bit color. The 600 dpi setting exceeds many scanning benchmarks typically used by local history digitization programs for photographs, however, this higher resolution proved useful when isolating and enlarging individual Norwester photographs or illustrations for onscreen display. Due to the sheer volume of pages in the Norwester yearbooks, however, each yearbook page was scanned at 300 dpi. The 24-bit color setting was selected to present the most accurately representative image possible. It also has the advantage of displaying the colored inks used for cover art and some advertisements, as well as minimizing illegibility caused by stains or darkened paper in the original materials. However, due to the large quantity of pages in the Norwester yearbooks, most black and white yearbook pages were scanned using an 8-bit grayscale setting. An additional "de-screening" setting was also used, since photographs within the Norwester were originally printed as halftones. This setting allows any conflict between the original halftone screen and the digital image grid to be minimized.
For the Norwester magazines all scanning was performed on library premises by library staff. An entire two-page spread was captured with each scan, and each page was isolated for individual cataloging by cropping after the scanning process was complete. This cut the required scanning time approximately in half. To accommodate the two-page spreads and other large format items, an Epson Expression 1640XL 11" x 17" flatbed scanner was used. Adobe Photoshop 7.0 was used for image manipulation, rotation, cropping, and to create 96 dpi derivative JPEG images for onscreen display. The low-resolution JPEG compression images reduce file size and online loading time. Automation and batch processing features were also utilized in Photoshop whenever possible, to save time and minimize potential for error. For the Norwester yearbooks all scanning was subcontracted to OCLC Online Computer Library Center, and all archival TIFF files were converted to derivative lossless JPEG 2000 images for onscreen display.
Optical Character Recognition & Transcription
After scanning, each Norwester magazine page was processed using optical character recognition (OCR) software, OmniPage Pro 12.0. By performing OCR, headlines, articles, and advertisements were automatically zoned and processed, and a text transcription file was created for each page. A member of the Upper Arlington Public Library staff then proofread the text files for each page. The goal of the proofreading process was to catch any OCR errors and to correct likely search terms where appropriate. Although transcription specifications for many local history digitization projects require transcriptions to match the original text exactly, members of the UA Archives program staff also used brackets to insert corrections. For example, several articles refer to the Grandview neighborhood as "Grand View," which would not be found by a keyword search on "Grandview." As a compromise, the corrected transcription "Grand View [Grandview]" still reflects the article’s original wording, but it also allows the article to be located by users searching for information on the Grandview neighborhood.
After verification and correction, the final transcription files were associated with each page’s image, appearing as the contents of an administrative metadata field, "Full Text." The contents of this field are available whenever a keyword search is performed, allowing documents within the UA Archives to be full text searchable. Although the OmniPage software can create files in a variety of formats, the final transcription file for each page was saved as plain ASCII text, since it is a non-proprietary format and thus less susceptible to obsolescence.
For the Norwester yearbooks each page was processed using ABBYY FineReader optical character recognition during ingestion into the CONTENTdm database management software. In accordance with standard practices for yearbook collection digitization, and due to the large volume of text generated, text files created during the optical character recognition process for the Norwester yearbooks utilize raw, uncorrected files. Supporting PDF files were also generated via CONTENTdm for printing and display purposes.
Database Management, Metadata & Controlled Vocabulary
For management of the UA Archives digital collection and associated metadata, CONTENTdm software was selected. Its features allow the storage and management of single images, compound documents such as magazines and books, three-dimensional artifacts with multiple associated views, and multimedia files. In addition, cataloging of materials is flexible and supports use of the Dublin Core Metadata Initiative’s metadata scheme. The software provides easily customizable support for controlled vocabularies and includes the Library of Congress Thesaurus for Graphic Materials, widely used throughout the digitization community for subject categorization of local history collections. CONTENTdm is also Z39.50 compatible, OAI compliant, and allows the export of data to non-proprietary formats such as XML, SGML, and tab-delimited text.
The Dublin Core metadata scheme was selected for the UA Archives collection, since it is currently the most widely adopted digital collection metadata standard within the public library field. Some of the descriptive metadata elements displayed to users have been assigned more user-friendly names, while certain elements used for administrative and structural purposes are hidden from display. As previously mentioned, CONTENTdm provides customizable support for controlled vocabularies, which was utilized for several metadata elements. The Library of Congress Thesaurus for Graphic Materials is included in CONTENTdm and was used as the controlled vocabulary for the "Subject/s" element. However, additional controlled vocabularies recommended by the Dublin Core Metadata Initiative were manually entered for other elements. The process of entering values for each item’s metadata elements was streamlined, as templates were created to automatically fill in repeating values for related items such as multiple pages from a magazine issue. ASCII text files from the OCR process were also imported for each page via the template to allow documents to be full text searchable through automated keyword indexing.
However, not all cataloging processes can or should be streamlined. For instance, entries for the "Description" metadata element in the historical Norwester images collection are primarily culled from an image’s accompanying magazine article or advertisement, and additional research is then conducted to identify other sources of useful information about the subject.
Web Delivery
The Upper Arlington Public Library decided to initially purchase OCLC’s hosted version of CONTENTdm. In this scenario items and their associated metadata are uploaded to the appropriate UA Archives collection on OCLC’s server, which is also where CONTENTdm’s web-based search client resides. By downloading and altering specified web templates, the search interface can be somewhat customized. However, no additional web pages can be created on OCLC’s server, and more flexibility was desired. As a result, a separately hosted web interface was also created, which functions as the portal to the collection.
Through this portal, users can view an overall description of the UA Archives program, lists of program participants, collection development and maintenance policies, technical information, and featured items. Direct links to the Upper Arlington Public Library, the Upper Arlington Historical Society, the Upper Arlington City School District, and other partners are also included. Predefined search strings are presented in a Browse page to allow users to browse the UA Archives by topic, decade, or collection. Users may also access items directly through a Search page, which allows simple keyword searches across every collection and also provides access to advanced search functions.
Inside the collections, individual items such as the "Historical Maps" and "Historical Images" are presented with their descriptive metadata directly below the image. However, compound documents, such as the multi-page Norwester yearbooks and Norwester magazines, display an individual page image on the right-hand portion of the page with a list of all pages in the issue appearing at the left. This list allows users to go directly to a specific page or leaf through the document one page at a time. The page image views at the right can be replaced with views of the document-level metadata or page-level metadata. Users can also take advantage of the "My Favorites" feature, which allows them to save selected items in their own customizable collection that can then be saved as a web page, used in a presentation, or e-mailed to family and friends. For further assistance when viewing items, users are also encouraged to access the Help feature within the collections.



Become a Fan on Facebook!