Yiddish Book Center Awarded Grant from the National Endowment for the Humanities to Advance OCR Correction for Yiddish and Other Low Resource Languages

(AMHERST, MA) January 10, 2024 — The Yiddish Book Center is pleased to announce that it has been awarded a grant in the amount of $147,000 from the National Endowment for the Humanities (NEH) to support the next phase of its Yiddish optical character recognition initiative. This groundbreaking project processes Yiddish books with optical character recognition (OCR) software so that images of pages from Yiddish books become searchable text. The Digital Humanities Advancement grant will fund an experimental system to detect errors in the OCR text. The Yiddish Book Center is working with partners at the Linguistic Data Consortium at the University of Pennsylvania to develop this software. This multi-year project will result in a large set of corrected, searchable text which can be used for research in a wide range of areas. The methods employed here can be duplicated and used for similar research in other languages, particularly rare languages which have not been processed with OCR software. The Yiddish Book Center and the Linguistic Data Consortium at the University of Pennsylvania will collaboratively manage the project. 

The Yiddish Book Center recognized the need for researchers to be able to search scanned Yiddish texts included in its Steven Spielberg Digital Yiddish Library, an online collection of 12,000 Yiddish books. In 2018, the Center launched a beta version of its full-text search website, ocr.yiddishbookcenter.org, which now includes 10,000 searchable Yiddish works. This initiative is the first time Yiddish works have been processed with optical character recognition software at this scale, and it is the largest collection of Yiddish OCR text in existence. The site averages 10,000 searches per month and is used by scholars, students, writers, and artists. The NEH grant will improve the OCR, thereby enhancing the search experience for users and providing valuable insights for future research into OCR for other languages. To further increase access to Yiddish titles, the Yiddish Book Center is leading a collaborative project with the New York Public Library, the National Library of Israel, and YIVO Institute for Research to create a Universal Yiddish Library which will process scanned Yiddish books from all four institutions in one searchable website. The methods and learnings from this grant will be applied to the OCR text in this joint library, ultimately creating searchable access to at least 20,000 Yiddish books. 

"We are grateful to the National Endowment for the Humanities for their support of this important project. For more than four decades, the Yiddish Book Center has been committed to recovering and creating access to Yiddish literature,” stated Susan Bronson, the Center’s executive director. “We have taken the lead in utilizing technology to safeguard and enhance searchability for Yiddish books. Thanks to this grant we can continue to develop and refine our approach, creating new opportunities for scholars, students, and anyone with an interest in the literature, while also improving resources for other underserved languages. 

“I am thrilled that the Yiddish Book Center is receiving this incredible grant to help researchers decode and digitize Yiddish books. As an ardent supporter of the National Endowment for the Humanities in Congress, it’s so exciting to see things come full circle and watch important programs like this get critical NEH funding,” said Congressman Jim McGovern. “When we invest in uncovering history, as this grant will do, we learn more about the present, and just as importantly, the future. I believe this project will inspire, teach, and guide us in more ways than we can imagine.” 

PRESS: For additional information and images, please contact Rebecka McDougall, director of communications & marketing, at [email protected] or 413-256-4900, ext.118.