Research On Laodian Participle And Part-of-speech Tagging Method

Posted on:2017-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:B Yang

Full Text:PDF

GTID:2358330488964841

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Word Segmentation and part of speech Tagging is the basis of Named Entity Recognition、Dependency Parsing、Word Sense Disambiguation、Semantic Role Labeling and they are also used to the field of Text index, Text Classification and Corpus Processing. So, it is meaningful to research on Word Segmentation and part of speech Tagging. Laos is a low-resource language. The research on Laos are so few, otherwise, The research on Chinese or English are rich. Because of the diversity on the different language, the Existed method can not be used on the Word Segmentation and part of speech Tagging of Laos. This paper utilizes few corpus, accords to the Laos words’ characters, then research on Word Segmentation and part of speech Tagging of Laos, the main work is as follows,(1) We analyze the characters of Lao, including word structure feature, word feature and grammar feature. analyze and sum up above characters, then integrating these characters to Lao word segmentation and part of speech tagging.(2) This paper put forward a Laos Word Segmentation method that based on syllable of the maximum length matching. This method segments the text to syllables firstly, then match the syllables to the dictionary with maximum length matching. And then correct the segmentation result with the error dictionary and roles. Match the word series with the error dictionary and regular expression to correct part of the words. it can improve the efficiency and accuracy rate of Laos Word Segmentation.(3) As to the part of speech tagging, there are also few corpus resources. The supervised method can not be used to do part of speech tagging directly. This paper put forward semi-supervised method to do part of speech Tagging. We utilize few tagged corpus to train a Hidden Markov Model, during tagging the words, do twice decoding with forward viterbi algorithm and backward viterbi algorithm. If the result of twice compute are the same, the labels are right, otherwise, use word co-occurrence roles to compute again. Meanwhile, improve the tag ability of unknown words by compute the similarity of words. In the end, achieve a iterative Hidden Markov Model tag process. This method makes a good result on Laos part of speech Tagging.

Keywords/Search Tags:

Word Segmentation, the maximum length matching, Hidden Markov, Part of speech Tagging, semi-supervised learning

PDF Full Text Request

Related items

1	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
2	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
3	Study On Disambiguation Algorithm For Chinese Word Segmentation
4	The Research Of Part-of-speech Tagging Based On Hidden Markov Model
5	Research And Implementation Of Chinese Lexical Analysis Technology
6	Research On The Application Of Semi-supervised Learning In Natural Language Processing
7	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
8	HMM-based Chinese Part-of-Speech Tagging And Improvement
9	The Effect Of Part Of Speech On Chinese Word Segmentation
10	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging