Font Size: a A A

Recognition And Classification Of Fine Grained Chinese Entity

Posted on:2022-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:P Y FuFull Text:PDF
GTID:2518306572459724Subject:Computer technology
Abstract/Summary:PDF Full Text Request
It has been an important research content in the field of natural language processing to effectively utilize the massive text information in the network to facilitate people's life.As the first step of text information structure,the result of entity recognition directly affects the effect of downstream tasks,which also makes it meet various challenges.Among them,fine-grained entity recognition aims to describe entities more accurately and richly in different contexts,which puts forward higher requirements on the number and level of entity categories,and has gradually become a research hotspot in the field of entity recognition.Besides,due to the high cost of manual annotation,most of the existing data sets are annotated based on Distant Supervision method,and a large amount of noise also poses a higher challenge to fine-grained entity classification.In addition,due to the inherent language characteristics of Chinese,it also increases the difficulty of Chinese entity recognition.In this paper,the research of fine-grained Chinese entity recognition and classification consists of two sub tasks,namely entity boundary recognition and finegrained entity classification.This paper mainly studies three aspects:(1)Chinese entity boundary recognition based on word features.In this paper,boundary recognition is regarded as a sequence annotation task.Firstly,using the context sensitive word vector generated by the BERT pre-trained language model,the attention soft lexicon network is proposed to improve the soft lexicon network,which is used to extract the word features,and then sent to LSTM and CRF for Chinese entity boundary recognition.(2)Fine grained entity classification based on two-stage training.First,the text is encoded by the best model,then the context information of the text is extracted from different angles by Bi LSTM and CNN respectively,and the interaction between the entity and the context information is carried out by using the secondary interactive attention network,and the multi label classification is carried out by calculating the similarity of the label vector.In order to solve the problem of label noise and label vector convergence in data set,this paper uses multi-level hinge loss function to train the model in two stages,which is oriented to coarse-grained and fine-grained labels in turn,and achieves good results.(3)An entity label vector enhancement method based on context semantics.In this paper,we use language model and mask language model to build tag enhancement module,and pre-train the language model in the current corpus to learn the context semantic information of the text in the dataset.Then,through the joint training with fine-grained entity classification model,a large amount of entity context information is used to correct the influence of label noise,and the entity label vector is strengthened.
Keywords/Search Tags:Boundary Recognition, Entity Classification, Twice-interactive Attention Network, Label Enhancement
PDF Full Text Request
Related items