Research On Lao Language Part-of-speech Tagging With Multiple Features

Posted on:2021-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:X J Wang

Full Text:PDF

GTID:2438330620480346

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Lao network texts contains a large amount of information related to public opinion.How to extract valuable information from these data has become one of the research focuses of natural language processing,but there is less research work on Lao natural language processing at home and abroad.As one of China's neighbors,Laos is an important ally of "the Belt and Road Initiative",but it has not met the requirements of language exchange.Part-of-speech tagging is an important basic task in information extraction research.This paper proposes a Lao part-of-speech tagging method combining multi-feature to solve the research difficulties,which mainly includes the following three parts:(1)Because Lao language expresses grammatical meaning through word order,and it is characterized with long sentence,the BiLSTM-Attention-CRF model is established as the basic framework of the POS tagging model to integrate word order features and long-term context features.Firstly,the model uses the BiLSTM network with Attention mechanism to process the vector of each Lao word.Then,CRF model considers the correlation of part-of-speech to calculate part-of-speech tags.In the experimental stage,HMM,CRF,CNN-CRF and BiLSTM-CRF models were used as comparative models.The results show that BiLSTM-Attention-CRF model is superior and its accuracy rate reaches 92.67%.(2)Facing the main challenge of Lao low-frequency word recognition,this paper proposes a "phoneme-level" word vector method to fuse phoneme features into BiLSTM-Attention-CRF model.Phoneme features are conducive to expressing morphological and structural information of words.Firstly,the model takes phonemes as atomic units,and uses Convolutional neural network with multiple filter widths to extract the feature relationship between phoneme vectors to form "phoneme-level" word vectors.Then the"phoneme-level" word vectors will be contacted with the pre-trained word vectors by FastText to build the word feature vector of "phoneme level" for further enriching the word morphological features.According to the experimental results,the accuracy rate of BiLSTM-Attention-CRF model is 93.11%after fusing phoneme features.The experiment also measured the absolute improvement rate of F1 of BiLSTM-Attention-CRF,which integrated phoneme features,to the main part of speech tags.The consistency improvement of F1 values of part-of-speech proves the rationality of the proposed method.(3)In order to further strengthen the recognition of low-frequency words by the model,this paper proposes a multi-task learning method that combines TF-ISF auxiliary loss and main consonant auxiliary loss,which helps the model to fuse sentence topic features and main consonant distribution features.The TF-ISF algorithm applies the topic extraction algorithm TF-IDF algorithm to sentence level,and main consonant is an important part of Lao syllable.Under the fusion of multi-feature,the accuracy rate of the model reaches 93.41%,which has its own advantages over the language-assisted model.Moreover,in order to be reasonable in the experiment,this paper also used BiLSTM-CNNs-CRF as a comparison model and tested the performance of some ideas of the model in the public Danish and Spanish corpora.The results show that the proposed method is efficient in recognizing low-frequency words.

Keywords/Search Tags:

Part-of-Speech Tagging, Lao, phoneme-level, Multitask learning

PDF Full Text Request

Related items

1	Research On Laodian Participle And Part-of-speech Tagging Method
2	Research On Part-of-Speech Tagging Algorithms Of Mathematical Corpus Based On Deep Learning
3	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
4	Research And Implementation Of Modify Chinese Part-of-Speech Tagging Based On FST Technology
5	Research On The Construction Method Of Burmese Part-of-speech Tagging Corpus
6	A Research On Lao Language Part-of-speech Tagging With Multi-feature Fusion
7	Research On Parallel Corpora-based Unsupervised Part-of-speech Tagging For Chinese
8	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
9	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
10	Chinese Word Found Its Part Of Speech Tagging