Font Size: a A A

Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z T LiuFull Text:PDF
GTID:2428330611468725Subject:Aeronautical Engineering
Abstract/Summary:PDF Full Text Request
Chinese lexical analysis(Chinese word segmentation and part-of-speech tagging)makes the basis of Chinese natural language processing.Although the current baseline model performs well,it still ignores many problems,such as the features of adjacent words in word segmentation and the features of characters in part-of-speech tagging.The purpose of this paper is to put forward a method of adding adjacent word features in Chinese word segmentation task and a method of fusing character features in part of speech tagging task according to the characteristics of Chinese word segmentation and part of speech tagging task respectively,and put forward a new coding and decoding structure for sequence tagging task.Firstly,aiming at the characteristics of adjacent words in Chinese word segmentation tasks,we try to get the feature representation of adjacent characters based on window sampling,and fuse the features of adjacent characters into the model through attention in the representation learning layer.This way not only takes full account of the context information,but also takes into account the coupling relationship between adjacent characters,so that a more complete representation of each character feature can be obtained.At the same time,aiming at the sequence labeling problem,the strict alignment relation between the original input and the output is inspired by the coding and decoding structure.The word and label double embedding method is adopted to increase the prediction of the candidate labels in the encoder.The hidden layer feature based on the attention mechanism of the original input is used in the decoder to supervise the prediction of the final labels to correlate the relation between the original input and labels and between the front and rear labels,thus improving the generalization ability of the model.Then,considering that the tagging object is a word in the task of part-of-speech tagging,three ways to fuse the character features that make up the word into the model are proposed from the perspective of word formation.Through experiments,it is proved that the weighted feature fusion method similar to the attention mechanism obtains the optimal result.Like word segmentation task,experiments are carried out using the improved coding and decoding structure,and the results prove the effectiveness of the structure for part-of-speech tagging task.Finally,according to the characteristics of Chinese word segmentation and part-of-speech tagging tasks,this paper proposes corresponding feature improvement and model structure improvement,and proves the effectiveness of the model and method through experiments.In view of the commonness of sequence labeling tasks,the coding and decoding structure for sequence labeling proposed in this paper can be extended to other sequence labeling tasks.
Keywords/Search Tags:Neural network, Chinese lexical analysis, Chinese word segmentation, Part of speech tagging
PDF Full Text Request
Related items