
Hoover Digital Collections 2: A user's guide
An introduction to the Hoover Institution Library & Archives' new site for viewing digital material
Welcome to the Hoover Institution Library & Archives’ new digital collections site, Digital Collections 2 , now in beta release. This digital story is a guide to the site and how to use it.
What's in Digital Collections 2?
You may be familiar with our Digital Collections site. Digital Collections 2 is a new resource designed for access to Hoover's digitized archival collections, including audiovisual and born-digital material. We are in the process of migrating all our older digitized content from Digital Collections to Digital Collections 2. Until we finish this process, our content is split between the two sites.
We are in the process of migrating content from Digital Collections to Digital Collections 2, so that everything can be viewed in one place.
Currently, Digital Collections 2 contains archival material digitized since 2020, including the H. H. Kung papers and collections digitized for our Fanning the Flames and Bread + Medicine exhibits. We are continuously adding new material to the site. For a full breakdown of how to access Hoover collections online, including more information about what is available where, visit our Search the Collections page.
The landing page
Searching
Unless you’re navigating to a specific object—from a finding aid, web search, or online exhibit—you’ll probably start by searching the site. Here are some tips for searching.
The image viewer
Here's how to make the most of our online image viewer for digitized photographs, documents, and other image-based material.
The object page
In addition to the viewer, each object page contains key metadata, links, and download functions.
OCR and searchable text
Machine-enabled text recognition can make it easier for researchers to find what they are looking for without requiring archivists to create detailed descriptions of every item. For digitized documents and other image-based material, we use AI-assisted optical character recognition (OCR) and handwritten text recognition (HTR) tools to extract the text from the images. For time-based media, such as sound recordings and videos, we use speech-to-text tools. You can search transcripts via DC2's full-text search tool (see the section on "Searching") or download them from the object page. The resulting transcripts are never perfect, but it's often possible to search for a name or other keyword that does not appear in the archival description of an object and find it in the full-text transcript.
We create transcripts using a variety of third-party tools, including ABBYY FineReader, Whisper, and Transkribus. The tools available are constantly evolving, and their accuracy is improving over time, as more training data becomes available and as text recognition algorithms get smarter, and as we fine-tune our processes based on what works for different materials and languages. We make available the transcripts we have, even if they aren't very accurate, in the hopes that they may be useful for researchers.
What doesn't work?
In testing OCR accuracy on different materials, we have found a few reliable indicators of lower output accuracy. We're sharing these so that you can keep them in mind as you work with our digital collections:
OCR and accessibility
We are committed to making our text-based collection materials accessible to all researchers. While we do not currently have the technology or the resources to create accurate transcriptions of every object we digitize, we will continue to work toward this goal. We will also work with individual researchers who require accommodations, by pairing them with a research aide and/or by sourcing an accurate hand-keyed transcription of a particular object.
If you require an accommodation, please contact HILA . Stanford-affiliated researchers may also contact Stanford’s Office of Accessible Education .
A final note on OCR
Like most cultural heritage institutions, we use OCR tools that:
- are publicly available
- can be operated locally
- produce reasonable results for a wide range of languages
- fit easily into a mass digitization workflow.
These requirements don’t always translate to the highest level of OCR accuracy for a particular object. You might find that you get better results with the text recognition tools built into consumer apps such as Live Text (iOS 15 or later), Google Lens or Mac OS’s Preview. Whenever possible, taking into account copyright and privacy restrictions, we make our digital collections images available for download, so that you can study and analyze them with any tools at your disposal.
What’s next?
Digital Collections 2 is in beta, meaning we are still actively testing and developing it. Over the next one to two years, we will be rolling out new features, including:
- User accounts
- A virtual reading room for secure offsite access
- Improved integration of archival finding aids and collection hierarchy
- Support for audiovisual and born-digital materials
- Migration of content from digitalcollections.hoover.org, including automatic redirects for URLs referring to that site
If you notice any problems with the site, or if you have any questions about it, please contact the Library & Archives or fill out our feedback form .