This short paper analyses an experiment comparing the efficacy of several Named Entity Recognition (NER) tools at extracting entities directly from the output of an optical character recognition (OCR) workflow. The authors present how they first created a set of test data, consisting of raw and corrected OCR output manually annotated with people, locations, and organizations. They then ran each of the NER tools against both raw and corrected OCR output, comparing the precision, recall, and F1 score against the manually annotated data.
Presentation at the Text Analysis Seminar.
Göttingen Center for Digital Humanities (GCDH)
Annotation of corpora is a labor-intensive and time and resources consuming task. Active annotation is an active learning based semiautomatic annotation procedure. The goals of Active Learning are to speed-up and make easier the human annotation process. In Active Annotation we use the models learnt during the annotation process in order to find potential annotation errors and cases that are hard to be automatically annotated with the features used by the learner. The analysis of these cases allows extending and optimizing the set of features used by the learner.
Keywords: annotation of corpora, machine learning, semiautomatic annotation, statistical language modelling
Video of the project meeting about privacy and copyright in May 2011.
This edition contains more information on:
- EHRI in Prague
- Workshop on Privacy
- EHRI Fellowships in Holocaust Studies 2012
- EHRI is looking for Holocaust Researchers
- ‘Technothings’: A Presentation of EHRI in Athens
- An Information Database for EHRI
- EHRI Hosts International Workshops
- People in EHRI: Veerle Vanden Daelen
You can download the PDF file from this link.
EHRI (European Holocaust Research Infrastructure) invites applications for its fellowship programme for 2012.
The EHRI fellowships are intended to support and stimulate Holocaust research by facilitating international access to key archives and collections related to the Holocaust. The fellowships intend to support researchers and younger scholars, especially PhD candidates with limited resources. Candidates from Central and Eastern Europe are especially encouraged to apply.
More information about the fellowships and application procedure:
The European Holocaust Research Infrastructure (EHRI) intends to organize an international expert meeting in which the place of photography in historical research of the Holocaust will be discussed. The meeting aims to comibine the growing academic interest in photography with the increasing digitisation and opening up of photographic archives. The overall aim is to generate a creative exchange between researchers on the many aspects of the photographic representation of the Holocaust. Archivists of picture collections and e-scientists will be invited to discuss how EHRI can fulfill its purpose of creating a European research infrastructure, focussing on the photographic represenatation of the Holocaust. Read more…