Font Size: a A A

Study On Chinese Named Entity Recognition

Posted on:2005-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2168360152965439Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
NLP (Natural Language Processing) is an interesting and challenging subject. The main task of NLP is to build up a computational model that is able to simulate language cognition for human beings. However, the intelligence of computer hasn't been comparable with the human beings nowadays. Unknown Word Recognition problem can be considered as the most important reason for the slow development of NLP. This paper presents a new person names recognition algorithm based on the part-of-speech. Meanwhile, on the base of SVM(Support Vector Machines), the Chinese person names' recognition problem is studied further. Aiming to enhance the recognition ability of unknown words, these algorithms proposed in this paper are to be applied in "Automatic Chinese Word Segmentation System" under development in our team.NLP is firstly introduced in this paper. There is no natural segmentation mark between Chinese words, which is one of the biggest differences between Chinese and English in NLP researches. So there should be an automatic segmentation process before dealing with the input text to perform further researches. With the development of automatic segmentation, more and more researchers have focused on the Unknown Word Recognition problem.Person names' recognition methods are mentioned in the primary part of this paper. Person names are considered as the cut-in point when starting the Name Entity recognition. A person names' recognition algorithm based on the part-of-speech detection is presented after analysis of the advantages and disadvantages of other recognition approaches. The algorithm integrated statistic model and semantic rules is able to recognize Chinese person names effectively. Moreover SVM, a machine learning method is introduced to study the features of Chinese person names recognition. By the aid of this statistic classification technique, it is proved that SVM is also feasible to recognize Chinese person names.According to the evaluation parameters, the experiment results along with corresponding conclusions are presented in the ending part. Then the paper comes to a concise summary and suggested future researches.
Keywords/Search Tags:Natural Language Processing, Word Segmentation, Unknown Word, Statistical Model, Support Vector Machine
PDF Full Text Request
Related items