Design and prototype of a Help Desk System for EHRI: an Information Retrieval approach

User surveys of researchers and archivists realized at the EHRI project show that historians often find a significant amount of their archival sources not through finding aids, catalogs and research guides (online or not) but by talking directly to archivists.

In the case of very dispersed archival sources, an additional problem for researchers is to find the archival institution that can help them find useful material. That motivated the EHRI project to integrate an automatic helpdesk in the portal as a partial solution for this problem. This helpdesk should be able to interpret the user needs and compute the relevance of institutions to give help.

Categories: EHRI Tags:

Paper: Comparison of named entity recognition tools for raw OCR text

This short paper analyses an experiment comparing the efficacy of several Named Entity Recognition (NER) tools at extracting entities directly from the output of an optical character recognition (OCR) workflow. The authors present how they first created a set of test data, consisting of raw and corrected OCR output manually annotated with people, locations, and organizations. They then ran each of the NER tools against both raw and corrected OCR output, comparing the precision, recall, and F1 score against the manually annotated data.

Read the paper in the online proceedings of the KONVENS 2012

Categories: EHRI

Presentation: “Active Annotation of Corpora”

Presentation at the Text Analysis Seminar.
Göttingen Center for Digital Humanities (GCDH)

Annotation of corpora is a labor-intensive and time and resources consuming task. Active annotation is an active learning based semiautomatic annotation procedure. The goals of Active Learning are to speed-up and make easier the human annotation process. In Active Annotation we use the models learnt during the annotation process in order to find potential annotation errors and cases that are hard to be automatically annotated with the features used by the learner. The analysis of these cases allows extending and optimizing the set of features used by the learner.

Keywords: annotation of corpora, machine learning, semiautomatic annotation, statistical language modelling

Download presentation: GCDH_ALearning

Video of the meeting of ERHI in Prag

Video of the project meeting about privacy and copyright in May 2011.



Categories: EHRI Tags: ,

The second issue of the EHRI Newsletter

This edition contains more information on:

  1. EHRI in Prague
  2. Workshop on Privacy
  3. EHRI Fellowships in Holocaust Studies 2012
  4. EHRI is looking for Holocaust Researchers
  5. ‘Technothings’: A Presentation of EHRI in Athens
  6. An Information Database for EHRI
  7. EHRI Hosts International Workshops
  8. People in EHRI: Veerle Vanden Daelen

You can download the PDF file from this link.

Categories: EHRI

EHRI Fellowships in Holocaust Studies 2012

EHRI (European Holocaust Research Infrastructure) invites applications for its fellowship programme for 2012.

The EHRI fellowships are intended to support and stimulate Holocaust research by facilitating international access to key archives and collections related to the Holocaust. The fellowships intend to support researchers and younger scholars, especially PhD candidates with limited resources. Candidates from Central and Eastern Europe are especially encouraged to apply.

More information about the fellowships and application procedure:

Categories: EHRI