Font Size: a A A

A machine learning approach to multilingual proper name recognition

Posted on:1997-09-11Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Gallippi, Anthony FrankFull Text:PDF
GTID:2468390014481772Subject:Language
Abstract/Summary:
The development of natural language processing (NLP) systems that perform machine translation (MT) and information retrieval (IR) has highlighted the need for the automatic recognition of proper names. While various name recognizers have been developed, they suffer from being too limited: some only recognize one class of proper names, and all are language specific. This thesis develops an approach to multilingual name recognition that allows a system optimized for one language to be ported to another with little additional effort and resources. An initial core set of linguistic features, useful for name recognition in most languages, is identified. When porting to a new language, these features have to be converted (partly by hand, partly by on-line lists), after which point machine learning techniques build decision trees that map features to name classes. A system initially optimized for English has been successfully ported to Spanish and Japanese. The performance of this multilingual system is comparable to the best known systems in existence today, which recognize names in one language only. Issues of multilinguality addressed include porting effort, necessary resources, and performance. Results of this work have opened the door for future work in the following areas: improving performance, reducing necessary human effort, and further exploring language-based phenomena. This research represents a new application of learning theory.
Keywords/Search Tags:Language, Machine, Name, Multilingual, Proper, Recognition
Related items