Font Size: a A A

A Cambodian-named Entity Recognition Study Based On Constrained Random Fields

Posted on:2017-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:S H HuangFull Text:PDF
GTID:2358330488965632Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Khmer named entity recognition is a important work of Khmer information processing, and it is the basic task of text classification, machine translation, question answering system, information extraction. From the perspective of language analysis, named entity recognition belongs to the category of unknown word recognition in lexical analysis. Due to the large differences in grammar and semantics between different languages, sophisticated English named entity recognition technology cannot be directly transplanted to Khmer. In order to promote the research of Khmer information processing, this paper did some researches about construction of Khmer entity corpus and the named entity recognition model of constrained conditional random fields, the main research work have completed as follows:A Khmer entity annotation corpus based on constrained CRF is constructed. At first, constrained CRF based on constraints of proper noun is employed to segment and tag the part of speech, a high quality corpus which contains large entities is obtained. Then, constrained CRF based on constrained entity is employed to recognize named entity and after manual correction, the tagged entities are continually added to the user dictionary iteratively, then, a new round of constrained CRF is carried out. Finally, a large entity corpus is acquired.A new named entity recognition method which based on the constrained CRF fused Khmer entity feature is proposed. CRF model introduced into Khmer entity feature, the constrained CRF is constructed. Then use this model to recognize Khmer named entities. According to the comparison of several groups of experiments, the performance of this constrained CRF is higher than CRF in named entity recognition.According the research of named entity recognition above, a prototype system of Khmer named entity recognition, which could provide a platform for further study on natural language processing of Khmer.
Keywords/Search Tags:Constrained Conditional Random Fields, Khmer, entity corpus, entity feature, named entity recognition
PDF Full Text Request
Related items