Font Size: a A A

Research On Text Data Sequence Labeling

Posted on:2021-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:G D LiFull Text:PDF
GTID:2428330605973025Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet,the relationship between various industries and the network is increasingly close.Although text can carry data information in various fields,but with the exponential growth of information,the problem of garbage information flooding and information overlapping is becoming more and more serious.How to effectively extract valuable information from massive text information has become the most urgent problem in information processing.Natural language processing(NLP)is a kind of computing technology with theory as the main expression,which is used to extract,analyze and express human language,so as to achieve the ultimate goal of human-computer barrier free communication.As a sub-task in the field of natural language processing,the accuracy of sequence labeling seriously affects the performance of top-level tasks,such as question answering system,machine translation and so on.Therefore,domestic and foreign scholars have done a lot of research on sequence labeling,from traditional machine learning methods to neural networks based on deep learning.Recently,researchers have put forward a variety of pre-training models,and the accuracy of sequence labeling is constantly improving.In this paper,we research chunk parsing and named entity recognition tasks in sequence labeling.Under the framework of traditional deep learning long short term memory network and machine learning conditional random field,we improve and propose the stack long short term memory network and semi-Markov conditional random field,which are training and testing on the official data sets of Co NLL-2000 and Co NLL-2003,respectively.When comparing with the official submitted paper data,our model has reached a very high score.Subsequently,the BERT model proposed by Google in 2018 was introduced.We used pruning to simplify the BERT model,and then proposed the BERT-Stack Bi-LSTM-NSCRF model,which was applied to the Chinese University of Pennsylvania(CTB)and CCKS2019 NER dataset,Chinese GLUE MSRANER dataset and Boson NLP dataset for model training and testing.Compared with the traditional single long-term memory network and conditional random field labeling results,our model is obviously superior to the traditional sequence labeling model.It can be seen that our model shows better robustness in multilingual and multi-domain datasets.
Keywords/Search Tags:named entity recognition, chunking, conditional random fields, long-short term memory, BERT model
PDF Full Text Request
Related items