Chinese Base-Chunk Identification Based On Neural Network Model

Posted on:2017-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Liu

Full Text:PDF

GTID:2348330512451233

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Chinese basic-chunk identification task is one of the basic task of Chinese chunk analysis system and a step of shallow parsing.For one given Chinese sentence,this paper firstly formalize the Chinese basic-chunk identification task to a sequence labeling problem which use Chinese character as the labeled unit,secondly,based on the multilayer neural network model,using the Chinese character embeddings for the initial input,fusion segmentation hidden layer characteristics for model training,try to construct a more excellent performance neural network model of Chinese basic-chunk identification task.Neural network model directly use the original sentence as input,Chinese character is used for the labeled unit to construct model,different from the traditional method whose labeling model dependence on artificial features,and chunking is dependent on segmentation system performance.This paper has the following two innovations:Firstly,the sentence level likelihood is used for the objective function.Due to neural network models take single-point likelihood as optimization goal in Chinese basic-chunk identification in the current papers,it is not accurate enough to identify the longer chunks.Therefore,this paper learn from Collobert et al(2011),and use sentence-level likelihood function as objective function,programming to achieve the objective function with SGD optimization algorithm.The experimental results show that,after using this method,the label sequence which was outputed by the whole sentence is more reasonable,reduced the incidence of illegal label(for example,divide the punctuation in the chunk to the outside of the chunk),makes the performance of the Chinese basic-chunk identification promotion,especially for multi-token chunk,the recall increased by 3%to 5%.Secondly,Chinese base-chunk identification model use hidden-layer of segmentation as feature is proposed.This model is trained with the two tasks of segmentation and basic-chunk identification,both use Chinese character embeddings as input,sharing the same matrix of character embeddings.In the training process,the model parameters of two task model alternate training,and only update their own parameters,but they all need to update for shared parameters of embedding matrix.Not only can it avoid some bad effects on the identification performance of basic-chunk due to the parameters of the participle part overfitting to the task,but also can it avoid affecting the whole performance as a result of Chinese character embeddings not inclining to one task objective.The experimental results show that the F-value of Combined Model Using of Segmentation Hidden-Layer feature have a 2.1%increase in Chinese basic-chunk identification task.In addition,this paper also use Word2Vec to training-Chinese character embeddings as the initial input vector of the above neural network model to training the entire model.The experimental results show that,under the large corpus,Chinese character embeddings which are trained by Word2Vec are effective for improving the performance of Chinese basic-chunk identification.

Keywords/Search Tags:

Chinese basic-chunk identification, Neural network model, Sentence-level likelihood, Hidden-layer feature, Chinese character embeddings

PDF Full Text Request

Related items

1	Chinese Chunk Identification Based On Rule Extraction
2	On-Line Handwritten Chinese Character Recognition Approach Based On Sentence Level
3	On-line Handwritten Chinese Character Recognition Approach Based On Sentence Level
4	Sentence-Level Language Analysis With Contextualized Word Embeddings
5	Application Research On BWS-SOM Model In Chinese Recognition In Large Character Set
6	Research On Off-line Handwritten Chinese Character Recognition Algorithm Based On Chinese Character Recognition Instrument
7	Automatic Comparison Test And Identification Of Chinese Handwriting Based On Deep Learning
8	OCR Error Post-correction Based On Chinese Character-level Features And Language Model
9	Study On Chinese Website Ripping And Transcoding
10	A Research On Identification Of Chinese Prosodic Phrase Boundary Based On Chinese Chunk