Selecting features for domain-independent named entity recognition

Maksim Tkachenko, Andrey Simanovsky; Proceedings of KONVENS 2012 (Main track: poster presentations), pp. 248-253, September 2012.

Abstract

We propose a domain adaptation method for supervised named entity recognition (NER). Our NER uses conditional random fields and we rank and filter out features of a new unknown domain based on the means of weights learned on known domains. We perform experiments on English texts from OntoNotes version 4 benchmark and see a statistically significant better performance on a small number of features and a convergence of performance to the maximum F1-measure faster than conventional feature selection (information gain). We also compare with using the weights learned on a mixture of known domains.

[pdf] [bibtex]