Font Size: a A A

Research On Named Entity Recognition Method For Weibo Text

Posted on:2020-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhangFull Text:PDF
GTID:2428330578954646Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition is one of the basic tasks in the field of natural language processing,which plays an important role in the fields of information retrieval,auto-matic question and answer,knowledge map and so on.At present,the research on named entity recognition for normative texts is relatively mature,while the research on named entity recognition for non-canonical texts such as Weibo is relatively rare.The effect of named entity recognition on Weibo text is far less than that of physical recognition on normative text.The named entity recognition task for Weibo text has become a research hotspot.With the wide application of deep learning methods in the field of natural language processing,it has become a popular way to improve the performance of named entity recognition tasks through deep learning methods.Therefore,how to make full use of the characteristics of network text combined with in-depth learning methods,and then propose a Named Entity Recognition framework for network text has become the focus of this thesis.In view of the colloquial characteristics of the text content of Weibo media industry,this paper combines the Weibo text normalization process with the named entity recognition task.At the same time,we propose a joint text normalization named entity recognition framework,and non-standard network text through non-standardization.The dictionary replaces the non-canonical words in a standardized way,and proposes an entity recognition model that integrates the Attention Mechanism to further improve the entity recognition performance for Weibo text.The main innovations and contributions of this thesis can be summarized as follows:1.We proposed a method for calculating similarity of Word2vec word vector based on non-canonical word features.By training a high-dimensional vocabulary of non-normative words,the similarity calculation is performed between the vector representation of the combined entity and the vector of the high-dimensional vocabulary;The K-means clustering and the Brown clustering algorithm cluster the Weibo entities to obtain the candidate canonical word set to determine the best candidate entity,and finally replace the non-canonical entity with the canonical entity.2.We proposed a method for determining the number of candidate canonical words,which is filtered by rules;finally,the text is normalized according to the non-canonical dictionary.3.We proposed a Long Short-Term Memory(LSTM)incorporating the Attention mechanism,paying attention to the related information about the entity and alleviating the problem of redundancy or noise of the context.When designing the coding layer,the Two-Layer Bidirectional Long Short-Term Memory network(SC-BiLSTM)Model is used as the coding layer of the vector to extract the deep semantic information of the context to assist the entity recognition task.In this thesis,the above experiments are carried out.The experimental results show that the accuracy of the text normalization model proposed by Hassan is improved by 4%,of which the SC-BiLSTM ATT model is increased by 10%based on the baseline system;It can be seen that the joint normalized entity recognition framework is applicable to the named entity recognition task for Weibo text,and the proposed SC-BiLSTM ATT model can effectively improve the performance of entity recognition compared with the traditional model.
Keywords/Search Tags:Named Entity Recognition, Long Short-Term Memory Network, Attention Mechanism, Deep Learning, Normalization
PDF Full Text Request
Related items