Font Size: a A A

Chinese Organization Names Recognition Based On Support Vector Machine

Posted on:2008-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2178360212476035Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese organization names recognition is a fundamental task in Chinese Information Processing and an important subtask of Named Entity Recognition (NER). Named Entity includes the names of people, location, organization, Numbers and etc. Organization names account for a large percentage of all the Named Entities and they are also the most difficult category to recognize. Meanwhile, NER, including organization names recognition, is fundamental for many Natural Language Processing (NLP) tasks like Information Extraction, Machine Translation and information Retrieval, thus very significant and important.Machine Learning methods have long been applied and are widely used in NLP tasks. Support Vector Machine (SVM) is a newly developed machine learning method based on Statistical Learning Theory. Based on Structural Risk Minimization principle, SVM achieved high generalization performance in many pattern recognition tasks, especially in those with limited training data set. In recent years, SVM was applied to many Natural Language Processing tasks, like Text classification, shallow parsing and Chinese proper nouns recognition, and satisfying results were reported.In this thesis, we introduce a SVM based method in Chinese organization names recognition: we modeled the recognition process on the segmented text as a classification problem and applied SVM to solve it. In the training phase, we incorporated Activating Learning strategy and made it an incremental process in order to reduce annotation cost. We designed a series of experiments to test effectiveness of our method and differences of the outcome that different feature vectors and sample selection methods...
Keywords/Search Tags:Chinese Organization Names Recognition, Support Vector Machine, Active Learning
PDF Full Text Request
Related items