Font Size: a A A

Design And Implementation Of Chinese Semantic Chunk Analysis System Based On Sequence Labeling

Posted on:2020-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J XiaFull Text:PDF
GTID:2428330599458567Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Chinese semantic analysis is a very important step for computers to understand Chinese sentences and realize human-machine Chinese conversation.It can transform a Chinese sentence into a kind of expression that can be understood by machines.According to the characteristics of Chinese grammar,the sequence labeling method of Chinese semantic chunk is designed,and the deep learning method is used to design and implement the Chinese semantic chunk analysis system to complete the semantic component division of Chinese sentences.In order to get a Chinese semantic chunk analyzer with high accuracy,the input sentence is cleaned,non-sentence components are filtered and sentences are segmented by punctuation marks.Chinese word segmentation model and part-of-speech tagging model are trained,these models can predict the word segmentation and part-of-speech of the cleaned sentence.Pre-train word vectors by word2 vec and randomly initialized part-of-speech vectors are joined together as inputs of neural network models.Several Chinese semantic chunk recognition models are designed and implemented.Initially,a model is implemented according to CRF algorithm,some feature templates for CRF are designed to extract the word features and part of speech features,phrases in sentences can also construct some word features,the Chinese semantic chunk recognition results can be obtained by combining word features and state transform matrix.Furthermore,a model of the state transition matrix in CRF with BiLSTM is realized.Additionally,increased the number of network layers and realized the double-layer BiLSTM+CRF model.At last,attention mechanism is introduced,and a new double-layer BiLSTM+Attention+CRF model is realized.By comparing these four models,the double-layer BiLSTM+Attention+CRF model was selected to recognize the Chinese semantic chunks.After completing the design and implementation of Chinese semantic chunk analysis system,other sequence labeling tasks are tested in the financial corpus using the improved model,the effect of the improved model is better than the current natural language processing tools of Baidu and Harbin Institute of Technology.In the task of Chinese semantic analysis,the F1 value reached 91.22%.Finally,a comprehensive functional test and performance test of the algorithm are carried out,and the results show that each module of the system runs normally and is consistent with expectations.
Keywords/Search Tags:Semantic Chunk Analysis, Sequence Labeling, Deep Learning, BiLSTM, CRF, Attention
PDF Full Text Request
Related items