Digital Open-Access Platform "CorDeep" Brings New Opportunities in Detecting Visual Elements in Historical Documents

Recent advances in object detection facilitated by deep learning have had positive impacts in fields ranging from medical diagnosis to autonomous driving. Historical research, however, has been yet to reap the benefits of these developments.CorDeep Logo

Bridging this gap, MPIWG researchers Jochen Büttner, Julius Martinetz, Hassan El-Hajj, and Matteo Valleriani, in cooperation with BIFOLD and MDPI, are excited to announce the new CorDeep machine-learning based web application. The open-access platform presents a distinctive effort to bridge the gap between traditional and computational approaches in historical studies.

CorDeep is trained on the Sphaera corpus, a collection of 359 early modern textbooks on geocentric cosmology. Containing around 78,000 pages, 30,000 visual elements—including illustrations of machines and instruments, geometric diagrams, and astronomical images—and 10,000 pages containing tables, it serves as an extensive visual dataset for CorDeep's development.

What’s Special about CorDeep?

Uniquely, CorDeep is able to extract visual elements from historical sources, and classify pages containing (alpha)numerical tables. Using an open-source YOLO (“You Only Look Once”) algorithm, this experimental web application locates and categorizes visual elements into categories including “Content Illustrations,” “Initials,” “Decorations,” and “Printers' Marks.” This, in turn, makes it possible to find and organize visual elements in historical documents quickly and accurately.

A picture of a page from a Sphaera corpus, marked up to show visual elements

Page that displays all four identified classes (“Content Illustrations,” “Initials,” “Decorations,” and “Printers' Marks”). Left: Österreichische Nationalbibliothek. Right: courtesy of the Library of the Max Planck Institute for the History of Science.

CorDeep also enables historians to produce large image datasets in .csv tables for research purposes and create new datasets for the algorithm. The platform thus enables historians to accurately detect patterns in visual culture, such as how visual language in science has evolved over time.

The innovative web service has also been developed with sustainability in mind. CorDeep includes a feature that indicates how much carbon dioxide is used in each dataset search. This has been implemented with the aim of contributing towards an understanding of how to adapt the system to be more energy efficient. The more successful CorDeep is, the higher the potential energy efficiency, if it becomes a preferred service for image data research and makes defunct the development of multiple other web platforms.

Try CorDeep

Access the digital CorDeep platform here and explore your own visual historical datasets.

Users are invited to send feedback to Matteo Valleriani—