Font Size: a A A

Research On Named Entity Recognition For User-generated Contents

Posted on:2021-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:P C YangFull Text:PDF
GTID:2428330614959250Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,there is filled with a large amount of user data and information on social media.How to effectively mine,use,supervise and manage these data has attracted more and more researcher attention.Named entity recognition task is a basic task in the field of natural language processing task and it plays a key role in the follow-up research works.Therefore,research on named entity recognition for user-generated contents is of great significance.This thesis proposes external knowledge methods to identify named entities for user-generated contents.The main research content is divided into the following two parts:1.Due to the user-generated content dataset has the characteristics of noise and denormalization,and the number of entities in the text data is small,which makes the entity semantic information of neural network not rich during training,resulting in a low accuracy of named entity recognition.In order to improve the recognition effect,this thesis designs an improved method based on the Bi-LSTM-CNNs-CRF model,which is external knowledge-enhanced neural sequence labeling model(Knowledge-enhanced Neural Sequence Labelling Model,KNSLM).Through building an external knowledge layer in the neural network model,this thesis introduces external information to help the model identify more entities,and designs an external entity knowledge acquisition method and its fusion method.This method performs a comparative experiment on user-generated content datasets,the experimental results show that the recognition accuracy,recall rate,and F1-Measure are improved when the number of entities is 20,000 entity vectors integrated into the KNSLM model.It also verifies that,with the increase of the number of external entities,the recognition effect of KNSLM model will also be improved.2.Most of the named entity recognition methods utilize LSTM and other recurrent neural networks,but this network consumes a long time and it is not effective for dealing with long-distance dependence problems.Transformer algorithm is more effective than LSTM in feature extraction,and the pre-trained model based on Transformer algorithm is better suitable for downstream tasks,but the pre-trained model rarely considers external knowledge information.In order to incorporate the knowledge graph information into the pre-trained model,this thesis designs an external knowledge fusion method,and finally applies the pre-trained model of knowledge fusion to the named entity recognition task for user-generated contents.The experimental results show that,compared with the pre-trained model without fusion knowledge,the effect of named entity recognition is improved.
Keywords/Search Tags:name entity recognition, user-generated contents, external knowledge, pre-trained model
PDF Full Text Request
Related items