Font Size: a A A

Approach To Uyghur Stemmer Using Combination Of Multi-Strategies

Posted on:2017-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:D Y G L A N W E SaiFull Text:PDF
GTID:2308330503484327Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Uyghur is one of the agglutinative language with complex morphology,the composition of each word and it’s meaning is relay on the suffix,suffix can determine the role of a word in a sentence.So as long as right to split stem and suffix will express the whole meaning of each word.But so far, the performance of the Uyghur words stem segmentation still has much room for improvement.According to the constraints of Uyghur words, we proposed two stem segmentation models. One is Approach to Uyghur stemmer using combination of multi-strategies.The accuracy rate of advanced extraction method based on rules and statistics of the Uyghur stemming reached 95%,in order to solve the problem of ambiguity and over segmentation phenomenon which exist in the system reference segmentation system,we put forward the part of speech feature and the context information of stem.Experimental results show that, the part of speech feature and the context information of stem can increase the performance of Uyghur words stem segmentation significantly with the accuracy reaching 95.19% and 96.60%respectively compared to the baseline system.Another Uyghur stem segmentation method is affix probability feature based Uyghur word stemming method.The accuracy of this model reached 94%,in order to further improve the system performance,the accurate rate reached 95.69% after we induced stem-suffix weight.Although this method has a good performance to Uyghur stemming,but we should be further studied the relation of stem and suffix,stem and compound suffix to improve stemming system accuracy.
Keywords/Search Tags:Morphology, Stem Segmentation, N-gram model, Part of speech, Context information
PDF Full Text Request
Related items