The Key Technology On Chinese Word Segmentation Based On Bi-LSTM-CRF Model

Posted on:2020-01-16

Degree:Master

Type:Thesis

Country:China

Candidate:Qianli Ma

Full Text:PDF

GTID:2428330578452113

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Natural language processing is one of the core technologies in the field of artificial intelli-gence.Chinese word segmentation is the basis of Natural language processing.At present,the main method of word segmentation is based on the traditional machine learning model.In recent years,with the resurgence of artificial intelligence,LSTM neural.network models for both short-term memory and long-term memory have improved the traditional circular neural network model,which can rely on the lack of information for a long time.It has been widely used in natural language processing and achieved good results.However,due to the low level of logical errors and contextual relevance,the accuracy of Chinese word segmentation will be reduced.This paper aims to increase the contextual relevance through training and prevent logical errors by marking the logic between words.The main contributions of this paper are in the following:(1)The classic traditional LSTM model is improved.The purpose of adding forward LSTM layer and designing BILSTM model is to improve the insufficient dependence of traditional LSTM on the following text.(2)By adding CRF layer,the constraint between words is increased.Improve the precision of word segmentation results is caused by solve logic errors.(3)Verify the necessity of word vector embedding and the influence of parameter settings such as Dropout on accuracy.The main work of this paper has designed four experiments to verify the correctness and superiority of this model.And the main work of this paper is as follows:(1)This paper proposes a bidirectional lstm-crf model,which uses bidirectional LSTM neural network for data input and output,and establishes constraints through CRF layer to increase the correlation between words.At the same time,word direction quantization is realized by Ngram2Vec and dropout and learning rate are introduced to optimize the model during the whole training process.Finally,the correlation precision of the model is to get the compare of the correlation values obtained by self-defined word vector label layer and fractional function through golden contrast file.(2)Systematic comparison of the performance in the different model proposed with the other famous models(for example,LSTM,bi-lstm,lstm-crf and CRF++,etc.)on the NLP annotated data set.(3)The bi-lstm-crf model is applied to the NLP standard sequence label data set.Due to the existence of bidirectional LSTM components,the model can use the past and future input features at the same time.In addition,the model can use text-level tag information because of the CRF layer's existence.Compared with the previous observation results,it is found that this model has stronger robustness and less dependence on word embedding.It can generate accurate tag performance without relying on text embedding.

Keywords/Search Tags:

BI-LSTM-CRF Model, Chinese Segmentation, Machine Learning, Neural Net-work, Conditional Random Fields

PDF Full Text Request

Related items

1	Research And Implementation Of Chinese Segmentation System Based On Conditional Random Fields Model
2	Research And Application Of Chinese Word Segmentation Based On Conditional Random Fields
3	Research On Key Techniques For Chinese Word Segmentation With The Combination Of Deep Learning Features And Shallow Machine Learning Features
4	Research Of Chinese Word Segmentation With Conditional Random Fields
5	The Research On Short Text Mining With Conditional Random Fields And Improved LSTM
6	An Self-adaptive BLP Optimal Model Employing Conditional Random Fields
7	Research On Morpheme Analysis Based On Conditional Random Fields In Chinese Natural Language Understanding
8	Conditional Random Fields Based Location Name Recognition In Ancient Chinese
9	Research And System Implementation Of Chinese Word Segmentation In Specialized Fields Based On Conditional Random Fields
10	Research Of Traditonal Chinese Medicine Inquiry Modeling Based On Deep Learning And Conditional Random Fields