Font Size: a A A

The Research Of Weibo Entity Recognition Model Based On Active Learning

Posted on:2022-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ChaoFull Text:PDF
GTID:2518306605490234Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As an important underlying task in the field of natural language processing,named entity recognition task plays an important role in information retrieval,automatic question answering,knowledge mapping and other upstream tasks.Its performance directly affects the performance of related upstream tasks.In the information age,there are huge amounts of information in different fields in social media,so it is necessary for named entity recognition model to extract key information from these massive data.However,current research of named entity recognition algorithm for normative text is relatively mature,but on Weibo and other non-standard text research of named entity recognition is relatively small,the effect of named entity recognition on the Weibo test this is much less than the normative text.Therefore,in view of the named entity recognition on Weibo test this research is of great significance.Nowadays,deep learning method is widely used in the field of natural language processing.It has become a popular method to improve the performance of named entity recognition task by using neural network method.Therefore,designing a named entity recognition model for social media text is the focus of this paper.Based on the characteristics of Weibo text,such as strong spoken language and short text length,this paper proposes a named entity recognition model for Weibo text using long short memory network(BI-LSTM)combined with conditional random field(CRF)as the baseline model.Compared with the traditional BI-LSTM combined with CRF method,the main research work of this paper is as follows:1)Word vector model based on radicals and position information.Reflect entity recognition task of participle contains useful information,as well as the Weibo test this new words,vocabulary characteristics,this article use the word position information and word radical training word vector model,first using the segmentation results for a single Chinese character mark the location information,make the word vector contains useful word characteristics,to split into Chinese characters of the smallest unit: The radical and the character vector model are used to train the structural information of Chinese characters.Finally,connecting the two results used as the final character vector.The experiments show that the character vector model in this paper has better performance.2)Entity recognition model based on joint training.Considering that Chinese text is different from English text without obvious boundary information,this paper designs a new task: entity boundary recognition.This task only identifies the position of entity words,but does not identify the category of entity words.The model first inputs character vectors into BI-LSTM to learn global features,then uses the attention mechanism to increase the weight of important information.Finally,the boundary and entity recognition tasks were trained jointly.Experimental results show that compared with the existing entity recognition baseline model,the proposed model can enhance the ability of extracting relevant entities.3)Entity recognition model based on active learning.Considering Weibo test this high quality training data less,put forward a training method of fusion of active learning strategy.First will be a small amount of training data input network training model,using the model to forecast on unmarked text,using statement confidence to screen the data that was more worthy of annotation,and then added to the training data after man-made completion of the review.The entity recognition model was trained repeatedly for several rounds.The experiment showed that the active learning strategy greatly reduced the amount of man-made annotation and rapidly improved the model performance.Finally,this paper conducted a comparative analysis experiment and a ablation analysis experiment on the Weibo Dataset,MSRA Microsoft entity Corpus and 15,000 Weibo Tagging Corpus,between the proposed model and the baseline model,which proved that the performance of the proposed model was better than that of the baseline model.The comparison of the ablation experiment proved that each structure of the proposed model was necessary to improve the performance of the model.
Keywords/Search Tags:Named entity recognition, Radical information, Joint training, Attention mechanism, Entity boundary task, Active learning
PDF Full Text Request
Related items