Font Size: a A A

Research On Identification Of Kazakh Basic Noun Phrase Based On Maximum Entropy

Posted on:2012-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:R N SunFull Text:PDF
GTID:2178330335486018Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The basic noun phrase of Kazakh Language (Referred to as the Kazakh Base Noun Phrase) is animportant phrase types. The identification of Base Noun Phrase has important significance for simplifyingsentence structure, reducing the difficulty of subsequent syntax analysis, retrievalling information andprocessing text, and can play a role in promoting minority linguistics and translation theory.This paper first describes the process of establishing maximum entropy method needed ripe corpus. Firstof all, studied the Kazakh BaseNP's mark standard; Secondly, in order to reduce the workload ofestablishing ripe corpus, using the summarized Kazakh BaseNP's structure rules, programming realized theBaseNP corpus preliminary mark, on this basis, through artificial amended and supplemented the results ofmark. Finally finished the establishment of the ripe corpus. The second step introduces how to use themaximum entropy method to identify the Kazakh BaseNP. First, According to the feature of Kazakh BaseNoun Phrase's classification and identification, proposed the maximum entropy model feature space thataccording to the Kazakh word, part of speech and affixes; Secondly, through the feature of frequencythreshold select algorithm and automatically obtain recognition effective feature set. again, according tocommon iterative algorithm to estimate the effective parameter, Finally, the last evaluate parameter; take thegreatest credibility as a recognition result output.This study shows that identification based on maximumentropy BaseNP Kazakh approach is effective: the system uses the Kazakh language version of the XinjiangDaily corpus, closed test, the accuracy rate can reach 94%. The proposed method has strong generalizationability, through this method can also be in the Kazakh language phrases to identify other types of promotion.Experimental results showed that the maximum entropy methods to identify the Kazakh Base NounPhrase is effective: the system use the version of the Xinjiang Daily corpus. The close test accuracy rate ismore than 94%. The proposed method has strong generalization ability, The proposed method has stronggeneralization ability. This method can also be used to identify the other types of Kazakh phrases.
Keywords/Search Tags:base noun phrase dentification, maximum entropy, Kazakh, natural language processing
PDF Full Text Request
Related items