Font Size: a A A

Engineering Construction Of Text Named Entity Recognition And Topic Extraction Based On Information Extraction Technology

Posted on:2020-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:S J LuoFull Text:PDF
GTID:2428330575498328Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Based on the problems encountered in the current news retrieval field,this paper proposes solutions in a targeted manner.In the news search and recommendation process,we found that people tend to focus on the people who appear in the news,where the news happens,and the organizations that appear in the news.If we can see people,places,and organizations in the news.Then we can use this to recommend current news to readers who have recently paid attention to the people,places,and organizations that appear in this news.In the news search process,if the user directly searches for keywords that appear in the news,for example,people,places,and organizations.The system is capable of responding in a timely manner,reducing the time spent on retrieval.The main content of this paper is divided into two parts.The first part is the study of the model of corpus information extraction in the news article.The three main parts that need to be extracted from the news corpus are the entities,topics and abstracts of the news.Firstly,this paper analyzes the problems of the traditional named entity recognition model,proposes a solution based on the algorithm structure of BI-LSTM and CRF combination,and customizes the personalized identification of the named entity recognition model.The method can effectively extract the entities needed for personalization.Secondly,this paper analyzes the principle and existing problems of the topic extraction method based on LDA,and proposes a solution of the cyclic neural network structure training word vector to solve the similar meaning of words in different contexts,and according to the solution.Related experimental verification.Finally,this paper proposes a method of training cyclic neural network word vector and TextRank,which greatly reduces the speed of the TextRank algorithm itself to calculate the similarity calculation speed between words,and verifies the accuracy of the news summary with little loss.The extraction speed is significantly improved.The second part is the design and implementation of the news content analysis system.This paper completes the definition,overall design and detailed design of the system,and applies the model of news entity,topic and abstract to the news service system.Finally,the author proposes a new model scheme for the entity,topic and abstract extraction of news to be applied to the intelligent chat assistant's news business system,which can provide fast and stable external data output services for large-scale data news services.
Keywords/Search Tags:News content analysis, Named entity recognition, Theme extraction, Long-term and short-term memory networks
PDF Full Text Request
Related items