Font Size: a A A

A Study On Cambodian Word Method Based On Conditional Random Field

Posted on:2015-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:H S PanFull Text:PDF
GTID:2208330431476601Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Khmer lexical analysis is a basic work of information processing of Khmer, and the lexical analysis results will directly affect the effect of application of Khmer information processing. As there exists differences between languages, the traditional lexical analysis technology of Chinese and English can not directly applied to Khmer. In order to enrich the theory research and application of Khmer lexical analysis, and offer basic support for Khmer information processing, this paper did some researches about the construction of Khmer word segmentation model, the construction of Khmer POS tagging model and the construction of Khmer named entity recognition:(1) A Khmer word segmentation and POS tagging method based on CRFs model is proposed. Firstly, we choose characteristic cluster as grain size, using context information and Khmer word features to define the feature template, to segment Khmer sentence, and then use the information to define the feature template to do POS tagging.(2) A named entity recognition method based on traditional feature information and the entity features of Khmer is proposed. According to the structure of different entities, we recognize the entities of fixed structures, such as time, digital expression, based on the matching rules, and for the entities of complex structures, such as names, places, organization names and so on, based on the features of style, POS and so on, adding Khmer entity features, and use the CRFs algorithm to do named entity recognition.(3) We design and realize the prototype system of Khmer lexical analysis system.
Keywords/Search Tags:Khmer, word segmentation, POS tagging, named entity recognition, CRFs
PDF Full Text Request
Related items