JudaicaLink

Overview

Scholarly reference works like encyclopediae, glossars, or catalogs function as guides to a scholarly domain and as anchor points and manifestations of scholarly work. On the web of Linked Data, they can take on a key function to interlink resources related to the described concepts. JudaicaLink provides support to publish and interlink existing reference works of the Jewish culture and history as Linked Data.

Here is the link to JudaicaLink.

FID Jewish Studies I to III

The “FID Jewish Studies” is an expert information service (Fachinformationsdienst) for the domain of Jewish studies. This service is developed at the University Library Johann Christian Senckenberg in Frankfurt. WISS takes care of the large-scale data integration and enrichment to contexualize the provided resources and to support innovative retrieval concepts.

Timeline

Phase 1: 2016 - 2019 (FID 1)

Staff: Dr. Maral Dadvar

During the first phase, we developed the base infrastructure for the knowledge graph:

A robust static data catalog (using the plain text formats Markdown, TOML and TTL). The data catalog got published using the Hugo static site generator.
A web frontend for search and data exploration built with Django. The frontend is available under https://labs.judaicalink.org (and integrated with the main static website).
Query and search capabilities based on Apache Fuseki and Elasticsearch.
Modeling of the graph data schema: https://ontology.judaicalink.org

During FID 1, many of the datasets of the current JudaicaLink have been created in ETL processes, starting from heterogenous sources such as Excel files, PDFs, scraping of HTML pages and many more. All data got transformed into RDF Turtle (TTL) files, the canonical data format of JudaicaLink.

The main goal for data integration was provision of and linking to data records of the German National Library (authority files such as the GND) and integration via Entity Facts.

Here is a poster illustrating the basic concepts:

Phase 2: 2019 - 2022 (FID 2)

Staff: Dr. Marco Rovera, Benjamin Schnabel

During FID 2, the focus was on working with Compact Memory. CM is a newspaper archive mainly from the 19th century, with over one million digitized pages. As part of the FID project, OCR text recognition was added, so that we could perform named entity recognition and a full text search via a large part of the corpus.

Phase 3: 2022 - 2025 (FID 3)

Staff: Benjamin Schnabel

The main goal of FID 3 was the migration of JudaicaLink to Frankfurt and a full integration with the FID portal. Therefore, we had to change our search backend from Elasticsearch to Solr (to harmonize the infrastructure with Frankfurt). The whole infrastructure got containerized and all processes needed to be redefined and reimplemented to work in Frankfurt.

The rationale for this big shift is that the FID funding is ultimately limited to at most 12 years and after the funding, the service has to be maintained by the University Library in Frankfurt. So we will finally hand over JudaicaLink into long-term production. Nevertheless, we plan to still contribute to Judaicalink and probably will also keep Judaicalink Labs in Mannheim for further research projects based on this knowledge graph.

Phase 4?

Yes, we plan to continue our collaboration with Frankfurt during the final phase of the FID funding. Details are not yet fully settled, so stay tuned for further information.