Font Size: a A A

Research On Laodian Participle And Part-of-speech Tagging Method

Posted on:2017-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:B YangFull Text:PDF
GTID:2358330488964841Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Word Segmentation and part of speech Tagging is the basis of Named Entity Recognition?Dependency Parsing?Word Sense Disambiguation?Semantic Role Labeling and they are also used to the field of Text index, Text Classification and Corpus Processing. So, it is meaningful to research on Word Segmentation and part of speech Tagging. Laos is a low-resource language. The research on Laos are so few, otherwise, The research on Chinese or English are rich. Because of the diversity on the different language, the Existed method can not be used on the Word Segmentation and part of speech Tagging of Laos. This paper utilizes few corpus, accords to the Laos words' characters, then research on Word Segmentation and part of speech Tagging of Laos, the main work is as follows,(1) We analyze the characters of Lao, including word structure feature, word feature and grammar feature. analyze and sum up above characters, then integrating these characters to Lao word segmentation and part of speech tagging.(2) This paper put forward a Laos Word Segmentation method that based on syllable of the maximum length matching. This method segments the text to syllables firstly, then match the syllables to the dictionary with maximum length matching. And then correct the segmentation result with the error dictionary and roles. Match the word series with the error dictionary and regular expression to correct part of the words. it can improve the efficiency and accuracy rate of Laos Word Segmentation.(3) As to the part of speech tagging, there are also few corpus resources. The supervised method can not be used to do part of speech tagging directly. This paper put forward semi-supervised method to do part of speech Tagging. We utilize few tagged corpus to train a Hidden Markov Model, during tagging the words, do twice decoding with forward viterbi algorithm and backward viterbi algorithm. If the result of twice compute are the same, the labels are right, otherwise, use word co-occurrence roles to compute again. Meanwhile, improve the tag ability of unknown words by compute the similarity of words. In the end, achieve a iterative Hidden Markov Model tag process. This method makes a good result on Laos part of speech Tagging.
Keywords/Search Tags:Word Segmentation, the maximum length matching, Hidden Markov, Part of speech Tagging, semi-supervised learning
PDF Full Text Request
Related items