— CH. 1 · THE GLASS PLATE AND THE SPINE —

Book scanning

~4 min read · Ch. 1 of 6

6 sections

In 2011, the Internet Archive displayed a Scribe book scanner that used air suction to turn pages without human hands. This machine represented a shift from older methods where books were placed on flat glass plates. A light and optical array moved underneath the glass in ordinary commercial scanners. Manual book scanners extended the glass plate to the edge of the device to make lining up the spine easier. These early tools required an operator to physically hold the book open while scanning each page. The process was slow and often resulted in damage to fragile spines over time. High-end scanners capable of thousands of pages per hour now cost thousands of dollars. Do-it-yourself models built for US$300 can still reach speeds of 1,200 pages per hour. The evolution from manual labor to robotic automation changed how libraries handle their collections.
Project Gutenberg started its work in 1971 to create a digital library of free eBooks. The Million Book Project began around 2001 with similar goals of mass digitization. Google launched its own initiative in 2004 to scan books on a massive scale. The Open Content Alliance followed shortly after in 2005 to build a universal library. By 2010, estimates suggested there were around 130 million works existing as books in human history. Organizations chose three main paths to tackle this volume: outsourcing, in-house scanning with commercial machines, or using robotic solutions. Some companies shipped books to India or China for low-cost processing. Others kept operations internal to ensure safety and technology control. Internet Archive and Google both employ overhead scanners and digital camera-based machines that are substantially faster than traditional methods. These projects aim to make millions of titles searchable online for public use.
The National Archives of Australia suggests a resolution of 400 ppi for bound books during the digitization process. They recommend 600 ppi specifically for rare or significant documents to capture fine details. The Federal Agencies Digitization Guidelines Initiative also mandates a minimum of 400 ppi for archival materials. A V-shaped holder allows books to be photographed without laying them flat against glass. This method reduces the curvature distortion found near the spine known as the gutter. Cutting off the binding converts a book into separate sheets for standard automatic document feeders. This destructive method destroys monetary value for collectors but aids preservation by making pages more accessible. Unbound stacks become fluffed up and exposed to oxygen which may speed deterioration. Weights can be placed on unbound pages to address air exposure issues. Storage in appropriate containers further protects these fragile items from damage.
Most scanned books available online fall outside copyright protection due to legal restrictions. Google Books scans books still protected under copyright unless a publisher explicitly prohibits this action. In 2010, the sheer volume of existing works created a massive challenge for any single organization. Projects like Project Gutenberg focus primarily on out-of-copyright texts to avoid litigation. Commercial entities often rely on outsourcing to keep costs low while processing millions of pages. The Internet Archive uses robotic scanning solutions that do not require human hands to touch the book. These machines use infrared camera technology to detect and adjust the three-dimensional shape of each page. Publishers retain the right to block specific titles from being digitized or made searchable. The tension between public access and intellectual property rights remains a central issue in digital libraries today.
Optical character recognition software converts raw image files into digital text formats like ASCII. Common file formats include DjVu, Portable Document Format, and Tag Image File Format. Human proofreaders usually check the output for errors after the initial conversion process. This step reduces file size and allows text to be reformatted by other applications. Software adjusts document images by lining them up and cropping them before final e-book creation. Transparent plastic or glass sheets are pressed against pages to flatten them during scanning. Some scanners utilize ultrasonic or photoelectric sensors to detect dual pages and prevent skipping. Reports indicate machines can scan up to 2,900 pages per hour using these advanced detection methods. The combination of high-speed hardware and careful manual review ensures accuracy across large collections.
Archival institutions recommend higher resolutions than standard text conversion requirements. While basic scanning works for general purposes, preservation efforts demand greater detail capture. Tiered approaches balance quality with practical constraints such as storage capacity and resource limitations. Institutions apply higher resolutions selectively to rare materials while using standard settings for common documents. Wisconsin Heritage Online uses a wiki to build collaborative documentation for regional partners. Georgia's Digital Library of Georgia presents over one hundred digital collections from sixty institutions. These projects establish best practices for digitization that work with regional partners globally. Additional criteria have been established in the UK, Australia, and the European Union since the early twenty-first century. Libraries now chart new directions in information services through these collaborative frameworks.

Common questions

When did the Internet Archive display a Scribe book scanner that used air suction to turn pages without human hands?

The Internet Archive displayed a Scribe book scanner in 2011. This machine represented a shift from older methods where books were placed on flat glass plates.

What year did Project Gutenberg start its work to create a digital library of free eBooks?

Project Gutenberg started its work in 1971 to create a digital library of free eBooks. The Million Book Project began around 2001 with similar goals of mass digitization.

How many pages per hour can high-end robotic scanners process compared to do-it-yourself models?

High-end scanners capable of thousands of pages per hour now cost thousands of dollars. Do-it-yourself models built for US$300 can still reach speeds of 1,200 pages per hour.

What resolution does the National Archives of Australia suggest for bound books during the digitization process?

The National Archives of Australia suggests a resolution of 400 ppi for bound books during the digitization process. They recommend 600 ppi specifically for rare or significant documents to capture fine details.

Which file formats are commonly used when optical character recognition software converts raw image files into digital text formats like ASCII?

Common file formats include DjVu, Portable Document Format, and Tag Image File Format. Human proofreaders usually check the output for errors after the initial conversion process.

See all questions about Book scanning →

All sources

27 references cited across the entry

1web6 Factors to Consider while Digitizing Books at ScaleJuly 22, 2019
2webAn 8-Step Guide to Digitization for Book PublishersMike Harman — March 23, 2021
3webPreservation Digitisation StandardsNAA
4inlineFederal Agencies Digitization Guidelines Initiative (FADGI)
5webTechnical Guidelines for Digitizing Cultural Heritage MaterialsFADGI
6webDigitising the Queensland Ambulance Service Museum Archive: Preserving History for Future GenerationsAvantix — August 2024
7webDIY High-Speed Book Scanner from Trash and Cheap Camerasinstructables.com
8webLibraries & Archivists Are Digitizing 480,000 Books Published in 20th Century That Are Secretly in the Public DomainSeptember 27, 2019
9journalMass book digitization: The deeper story of Google Books and the Open Content AllianceKalev Leetaru — 2008
10webTransforming Our Libraries from Analog to Digital: A 2020 VisionBrewster Kahle — March 13, 2017
11webAs of Aug 5, 2010, google estimates that there are 129,864,880 different books in the worldLeonid Taycher — Googleblog.blogspot.co.at — 2010-08-05
12newsLibraries Shun Deals to Digitize BooksKatie Hafner — 22 October 2007
13webWhat Happened to Google's Effort to Scan Millions of University Library Books?Jennifer Howard — August 10, 2017
14webTorching the Modern-Day Library of AlexandriaJames Somers — April 20, 2017
15webNorth Carolina ECHO : Exploring Cultural Heritage Online
16journalDigital Libraries: Principles and Practice in a Global EnvironmentChris Awre — April 30, 2005
17webRecollection Wisconsin29 November 2006
18webWisconsin Heritage Online licensed for non-commercial use only / FrontPage
19webWelcome to the Digital Library of Georgia
20webGALILEO
21newsCodices decodedThe Economist — 18 December 2010
22webA Scanner for books with text VERY close to the gutterJThomas — April 2012
23inlineSinmaz, E. K., Kocaseçer, M., & Ayyildiz, M. (2022). The Effect of Book Preconditioning on Page-Turning Success Rate during Automated Book Digitization. Instruments & Experimental Techniques, 65(5), 8
24webMcFarlin's new ScanRobot protects rare books while increasing access for students, scholarsmchamberlin — 2025-03-26
25webProduct Watch: Library ScannersDavid Rapp
26patentDetection of grooves in scanned images
27inlineThe Secret Of Google's Book Scanning Machine Revealed, by Maureen Clements, April 30, 2009.

Book scanning

1. The Glass Plate And The Spine

2. Millions Of Books Online

3. Preserving The Bound Volume

4. Copyright And The Search Engine

5. From Image To Searchable Text

6. Resolution And Archival Standards

Common questions

All sources