BERT for Named Entity Recognition in Contemporary and Historic German
Kai Labusch, Clemens Neudecker and David Zellhöfer
We apply a pre-trained transformer based representational language model, i.e. BERT (Devlin et al., 2018), to named entity recognition (NER) in contemporary and historical German text and observe state of the art performance for both text categories. We further improve the recognition performance for historical German by unsupervised pre-training on a large corpus of historical German texts of the Berlin State Library and show that best performance for historical German is obtained by unsupervised pre-training on historical German plus supervised pre-training with contemporary NER ground-truth.
A Supervised Learning Approach for the Extraction of Sources and Targets from German Text
Michael Wiegand, Margarita Chikobava and Josef Ruppenhofer
We present the first systematic supervised learning approach for the extraction of opinion sources and targets on German language data. A wide choice of different features is presented, particularly syntactic features and generalization features. We point out specific differences between opinion sources and targets. Moreover, we explain why implicit sources can be extracted even with fairly generic features. In order to ensure comparability our classifier is trained and tested on the dataset of the STEPS shared task.
A Descriptive Analysis of a German Corpus Annotated with Opinion Sources and Targets
Michael Wiegand, Leonie Lapp and Josef Ruppenhofer
We present a descriptive analysis on the two datasets from the shared task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS), the only existing German dataset for opinion role extraction of its size. Our analysis discusses the individual properties of the three components, subjective expressions, sources and targets and their relations towards each other. Our observations should help practitioners and researchers when building a system to extract opinion roles from German data.
Automated Assessment of Language Proficiency on German Data
Edit Szügyi, Sören Etler, Andrew Beaton and Manfred Stede
The proficiency level of the learner is an important factor in various educational settings. In order to find the adequate language difficulty level, we classify texts written by language learners of German into proficiency levels A, B and C, as defined by the CEFR (Common European Framework of Reference for Languages), based on linguistic features extracted from the texts. Working on a combined data set of previously-used corpora, we use both data- and theory-driven feature sets, and determine the best-performing features. Our model achieves an accuracy of 82%, and the best-performing feature set contains features from all the theoretical groups, while all groups alone perform significantly above the random baseline.
A Probabilistic Morphology Model for German Lemmatization
Christian Wartena
Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German is very limited. To fill this gap, we developed a simple lemmatizer that can be trained on any lemmatized corpus. For a full form word the tagger tries to find the sequence of morphemes that is most likely to generate that word. From this sequence of tags we can easily derive the stem, the lemma and the part of speech (PoS) of the word. We show (i) that the quality of this approach is comparable to state of the art methods and (ii) that we can improve the results of Part-of-Speech (PoS) tagging when we include the morphological analysis of each word.
To Act Or Not To Act - Annotating and Classifying Email Regarding Necessary Action
Veronika Hintzen and Alexander Fraser
Every user of email is aware of the problem of reacting to emails that require a timesensitive action by the recipient while being overwhelmed by informational emails. We define a new classification problem to capture this distinction, creating comprehensive annotation guidelines and carrying out annotation. We carry out a proof-ofconcept implementation of a classifier and discuss our future research which will result in a tool that is usable in an everyday business environment.
AkkuBohrHammer vs. AkkuBohrhammer: Experiments towards the Evaluation of Compound Splitting Tools for General Language and Specific Domains
Anna Hätty, Ulrich Heid, Anna Moskvina, Julia Bettinger, Michael Dorna and Sabine Schulte im Walde
We present a comparative evaluation study for splitting German compounds which belong to general language or to a specific domain. For the domain, we focus on DIY (”do-it-yourself”). The study consists of two parts: First, we evaluate three tools for compound splitting in German, one based on lexicons and corpus frequencies and two based on language-independent statistical processing. We introduce the tools, discuss the data and the construction of a gold standard, and show first results for binary and ternary noun compounds, as well as for the handling of non-splittable items. In a second experiment, we post-train one of the splitters with text data from the DIYdomain, and evaluate the splitting performance on domain-specific compounds.
Neural classification with attention assessment of the implicit-association test OMT and prediction of subsequent academic success
Dirk Johannßen and Chris Biemann
Operant motives are unconscious intrinsic desires that can be measured by implicit methods, such as the Operant Motive Test (OMT) employs. During the OMT, participants are asked to write freely associated texts to provided questions and images. Trained psychologists label these textual answers with one of four motives. The identified motives allow for psychologists to predict behavior, long-term development, and subsequent success. We use a long short-term memory neural network (LSTM) combined with an attention mechanism for classification of OMT textual answers and show state-of-the-art performance over previous work. When investigating tokens that have high associated attention weights with the Linguistic Inquiry and Word Count (LIWC) tool, we find a weak connection between LIWC categories and the OMT theory. Lastly, we automatically annotate and count motives per participant and correlate counts with academic grades, finding a weak correlation between certain motives and subsequent academic success.
Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning
Deniz Cevher, Sebastian Zepf and Roman Klinger
The recognition of emotions by humans is a complex process which considers multiple interacting signals such as facial expressions and both prosody and semantic content of utterances. Commonly, research on automatic recognition of emotions is, with few exceptions, limited to one modality. We describe an in-car experiment for emotion recognition from speech interactions for three modalities: the audio signal of a spoken interaction, the visual signal of the driver’s face, and the manually transcribed content of utterances of the driver. We use off-the-shelf tools for emotion detection in audio and face and compare that to a neural transfer learning approach for emotion recognition from text which utilizes existing resources from other domains. We see that transfer learning enables models based on out-of-domain corpora to perform well. This method contributes up to 10 percentage points in F1, with up to 76 micro-average F1 across the emotions joy, annoyance and insecurity. Our findings also indicate that off-the-shelf-tools analyzing face and audio are not ready yet for emotion detection in in-car speech interactions without further adjustments.
Visualising and evaluating the effects of combining active learning with word embedding features
Maria Skeppstedt, Rafal Rzepka, Kenji Araki and Andreas Kerren
A tool that enables the use of active learning, as well as the incorporation of word embeddings, was evaluated for its ability to decrease the training data set size required for a named entity recognition model. Uncertainty-based active learning and the use of word embeddings led to very large performance improvements on small data sets for the entity categories PERSON and LOCATION. In contrast, the embedding features used were shown to be unsuitable for detecting entities belonging to the ORGANISATION category. The tool was also extended with functionality for visualising the usefulness of the active learning process and of the word embeddings used. The visualisations provided were able to indicate the performance differences between the entities, as well as differences with regards to usefulness of the embedding features.
Creating Information-maximizing Natural Language Messages Through Image Captioning-Retrieval
Fabian Karl, Mikko Lauri and Chris Biemann
In this work, we propose the Image Captioning-Retrieval (ICR) problem that states the objective of language generation as information exchange. To solve the ICR problem, we design and implement an end-to-end neural network architecture that describes the content of images in natural language, and retrieves them solely based on these generated descriptions. The main goal is to be able to generate information-maximizing natural language messages. We experimentally show a strong increase in message information content while losing some grammatical correctness in the generated descriptions in a semi-supervised setting where caption generation is trained towards retrieval quality.
German End-to-end Speech Recognition based on DeepSpeech
Aashish Agarwal and Torsten Zesch
While automatic speech recognition is an important task, freely available models are rare, especially for languages other than English. In this paper, we describe the process of training German models based on the Mozilla DeepSpeech architecture using publicly available data. We compare the resulting models with other available speech recognition services for German and find that we obtain comparable results. Acceptable performance under noisy conditions would, however, still require much more training data. We release our trained German models and also the training configurations.
Label Propagation of Polarity Lexica on Word Vectors
Harald Koppen and Ritavan
The Semi-supervised learning (SSL) is an important research area in machine learning where both labeled and unlabeled data is used to build a model. One of the big advantages of semi-supervised methods is that they are transparent and easy to comprehend for humans, unlike most deep learning techniques which are black box. In this paper, we design a graph-based semisupervised learning framework to detect sentiment polarity in word vectors trained on a German corpus. We study theoretical aspects of the task, empirically analyze a seminal label propagation algorithm (Zhu and Ghahramani, 2002) and suggest variants to improve classification performance. Additionally, we review the literature of graph construction for SSL and propose new methods to avoid hubs, i.e., vertices of high degree, which are harmful as outlined by Ozaki et al. (2011).
Detecting the boundaries of sentence-like units in spoken German
Josef Ruppenhofer and Ines Rehbein
Automatic division of spoken language transcripts into sentence-like units is a challenging problem, caused by disfluencies, ungrammatical structures and the lack of punctuation. We present experiments on dividing up German spoken dialogues where we investigate the impact of task setup and data representation, encoding of context information as well as different model architectures for this task.
Dependency Trees for Greenlandic
Eckhard Bick
This paper presents a descriptive system for dependency structures in Greenlandic and proposes a method for implementing it using Constraint Grammar (CG) rules. Our approach aims at reconciling traditional dependency syntax with the polysynthetic morphology of Greenlandic by introducing a novel, morphologically informed tokenization model. For instance, verb-incorporated nominal arguments and adverbials are treated as clause-level constituents rather than morphemes. We discuss and evaluate our alternative tokenization in a crosslanguage perspective, arguing that the method allows the construction of more universal dependency trees, facilitating both lexical and syntactic transfer in a machine translation (MT) context.
Metaphor detection for German Poetry
Ines Reinig and Ines Rehbein
This paper presents first steps towards metaphor detection in German poetry, in particular in expressionist poems. We create a dataset with adjective-noun pairs extracted from expressionist poems, manually annotated for metaphoricity. We discuss the annotation process and present models and experiments for metaphor detection where we investigate the impact of context and the domain dependence of the models.
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Gregor Wiedemann, Steffen Remus, Avi Chawla and Chris Biemann
Contextualized word embeddings (CWE) such as provided by ELMo (Peters et al., 2018), Flair NLP (Akbik et al., 2018), or BERT (Devlin et al., 2019) are a major recent innovation in NLP. CWEs provide semantic vector representations of words depending on their respective context. Their advantage over static word embeddings has been shown for a number of tasks, such as text classification, sequence tagging, or machine translation. Since vectors of the same word type can vary depending on the respective context, they implicitly provide a model for word sense disambiguation (WSD). We introduce a simple but effective approach to WSD using a nearest neighbor classification on CWEs. We compare the performance of different CWE models for the task and can report improvements above the current state of the art for two standard WSD benchmark datasets. We further show that the pre-trained BERT model is able to place polysemic words into distinct ‘sense’ regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability.
Determining Response-generating Contexts on Microblogging Platforms
Jennifer Fest, Arndt Heilmann, Oliver Hohlfeld, Stella Neumann, Jens Helge Reelfs, Marco Schmitt and Alina Vogelgesang
In recent years the study of social media communities has come into the focus of research. One open but central question is which properties stimulate user interaction within communities and thus contribute to community building. In this paper, we provide a first step towards answering this question by identifying features in the Jodel microblogging app that trigger user responses as one form of attention. Jodel is a geographically restricted app that allows users to post threads and comments anonymously. The absence of displayed user information on Jodel makes the posted content the only trigger for user interaction, making the language the one and only means for users to gather contextual implications about their discourse partners. This enhanced function of language promises a revealing baseline investigation into linguistic behavior on social media. To approach this issue, we conducted a sequence of lexico-grammatical analyses and subjected the quantitative results to various statistical tests. While a Principal Component Analysis did not show a significant difference between the grammatical structure of original posts with and without answers, a negative binomial regression model focusing on the interpersonal meta-function yielded significant results. A further analysis of these features correlated to shorter or longer response times showed significant results for the interrogative mood. Additionally, keyword analyses revealed significant differences between posts with answers and without answers. Our study provides a promising first step towards understanding textual features triggering user interaction and thereby community building – an unresolved problem of practical relevance to social network operation.
Extraction and Classification of Speech, Thought, and Writing in German Narrative Texts
Luise Schricker, Manfred Stede and Peer Trilcke
For various purposes of narrative text analysis, it is helpful to identify speech and thought events: material that is uttered or imagined by some protagonist. This task commonly distinguishes between direct and indirect speech, but we will also consider free indirect and reported speech here. Specifically, we build upon earlier work by Brunner (2015), who presented an annotated German corpus of narrative texts as well as an automatic analysis system. We propose a variety of extensions and are able to substantially improve on the original results for all four categories.
Argumentative Relation Classification as Plausibility Ranking
Juri Opitz
We formulate argumentative relation classification (support vs. attack) as a textplausibility ranking task. To this aim, we propose a simple reconstruction trick which enables us to build minimal pairs of plausible and implausible texts by simulating natural contexts in which two argumentative units are likely or unlikely to appear. We show that this method is competitive with previous work albeit it is considerably simpler. In a recently introduced contentbased version of the task, where contextual discourse clues are hidden, the approach offers a performance increase of more than 10% macro F1. With respect to the scarce attack-class, the method achieves a large increase in precision while the incurred loss in recall is small or even nonexistent.
Predicting Semantic Labels of Text Regions in Heterogeneous Document Images
Somtochukwu Enendu, Johannes Scholtes, Jeroen Smeets, Djoerd Hiemstra and Mariet Theune
This paper describes the use of sequence labeling methods in predicting the semantic labels of extracted text regions of heterogeneous electronic documents, by utilizing features related to each semantic label. In this study, we construct a novel dataset consisting of real world documents from multiple domains. We test the performance of the methods on the dataset and offer a novel investigation into the influence of textual features on performance across multiple domains. The results of the experiments show that the neural network method slightly outperforms the Conditional Random Field method with limited training data available. Regarding generalizability, our experiments show that the inclusion of textual features aids performance improvements.
Evaluating Off-the-Shelf NLP Tools for German
Katrin Ortmann, Adam Roussel and Stefanie Dipper
It is not always easy to keep track of what tools are currently available for a particular annotation task, nor is it obvious how the provided models will perform on a given data set. In this contribution, we provide an overview of the tools available for the automatic annotation of German-language text. We evaluate fifteen free and open source NLP tools for the linguistic annotation of German, looking at the fundamental NLP tasks of sentence segmentation, tokenization, POS tagging, morphological analysis, lemmatization, and dependency parsing. To get an idea of how the systems’ performance will generalize to various domains, we compiled our test corpus from various non-standard domains. All of the systems in our study are evaluated not only with respect to accuracy, but also the computational resources required.