Research On Text Data Sequence Labeling

Posted on:2021-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:G D Li

Full Text:PDF

GTID:2428330605973025

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of the Internet,the relationship between various industries and the network is increasingly close.Although text can carry data information in various fields,but with the exponential growth of information,the problem of garbage information flooding and information overlapping is becoming more and more serious.How to effectively extract valuable information from massive text information has become the most urgent problem in information processing.Natural language processing(NLP)is a kind of computing technology with theory as the main expression,which is used to extract,analyze and express human language,so as to achieve the ultimate goal of human-computer barrier free communication.As a sub-task in the field of natural language processing,the accuracy of sequence labeling seriously affects the performance of top-level tasks,such as question answering system,machine translation and so on.Therefore,domestic and foreign scholars have done a lot of research on sequence labeling,from traditional machine learning methods to neural networks based on deep learning.Recently,researchers have put forward a variety of pre-training models,and the accuracy of sequence labeling is constantly improving.In this paper,we research chunk parsing and named entity recognition tasks in sequence labeling.Under the framework of traditional deep learning long short term memory network and machine learning conditional random field,we improve and propose the stack long short term memory network and semi-Markov conditional random field,which are training and testing on the official data sets of Co NLL-2000 and Co NLL-2003,respectively.When comparing with the official submitted paper data,our model has reached a very high score.Subsequently,the BERT model proposed by Google in 2018 was introduced.We used pruning to simplify the BERT model,and then proposed the BERT-Stack Bi-LSTM-NSCRF model,which was applied to the Chinese University of Pennsylvania(CTB)and CCKS2019 NER dataset,Chinese GLUE MSRANER dataset and Boson NLP dataset for model training and testing.Compared with the traditional single long-term memory network and conditional random field labeling results,our model is obviously superior to the traditional sequence labeling model.It can be seen that our model shows better robustness in multilingual and multi-domain datasets.

Keywords/Search Tags:

named entity recognition, chunking, conditional random fields, long-short term memory, BERT model

PDF Full Text Request

Related items

1	Chinese Named Entity Recognition Based On Semantic Vectors Integration
2	Application Of Several Types Of Statistics And Deep Learning Methods In Enterprise Named Entity Recognition
3	Research Of Chinese Named Entity Recognition Based On Deep Neural Network
4	Named Entity Recognition In Medical Field
5	Named Entity Recognition Based On BiLSTM-CRF
6	Named Entity Recognition For Communication Terminology
7	Address Entity Identification Based On Bidirectional LSTM And CRF Models
8	Chinese Named Entity Recognition Based On Bidirectional LSTM-CRF Model
9	Research Of Named Entity Recognition Based On Probabilistic Dependency
10	Research On Complex Chinese Named Entity Recognition Based On BiLSTM-CRF