Font Size: a A A

Research On Keyword Extraction Based On News Corpus

Posted on:2022-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:S X YouFull Text:PDF
GTID:2518306494971089Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,data such as webpage data and new media texts are increasing.The efficiency of full-text information retrieval is no longer sufficient to support the retrieval of massive data.Therefore,keyword extraction technology is widely used in search engines(such as Baidu search)and new media.Services and other fields(such as news retrieval).Traditional keyword extraction methods judge the criticality of words based on the context information and grammatical information of the words in the document.This algorithm is simple and effective,but cannot obtain the deep information and features in the document,and cannot achieve the accuracy of manual extraction in terms of extraction effect.In response to the above problems,this article proposes the Fusion Model that includes multiple feature information and multiple methods,and improves and optimizes the keyword extraction model from two aspects:1.Propose the Fusion Model that combines multiple algorithms and neural network models.The two traditional algorithms of TF-IDF and Text Rank are optimized for normalization and smoothing,so that the results of the two can be compared and mixed.Use the Bi LSTM model to label the input documents with keywords,and optimize them with the conditional random field.In order to solve the problem of insufficient generalization of deep learning models,this paper uses the results of traditional keyword extraction models to conduct feedback training on deep learning models,so as to continuously optimize the overall efficiency of the Fusion Model.After experimental demonstration,the F1 value of keyword extraction based on the Fusion Model is increased by 21.02% compared with the traditional model,which is 5.05%higher than that of the currently popular Bi LSTM-CRF sequence labeling model.2.Propose an algorithm for fusing a variety of artificial features with the Bi LSTM-CRF model,and propose a "LMRSN" sequence labeling method that is more suitable for the Fusion Model in this article.The Fusion Model uses a variety of algorithms to collect features such as part of speech,word frequency,word length,and word position of the document,and encodes the artificial features and the word embedding layer to obtain a word embedding vector containing artificial features.Multi-dimensional feature information can extract the deep feature information of keywords in a more comprehensive auxiliary model.And when dealing with tagging tasks,this paper proposes to use the "LMRSN" method for tagging,so as to effectively solve the problem of not being able to extract key phrases.After completing the research on keyword extraction technology,this paper continues to study the application direction of keywords,applies the keyword extraction technology based on fusion model to the task of news recommendation,and proposes a variety of effective candidate news document selection methods and the calculation method of recommendation index between news documents.Finally,the effectiveness of keyword extraction based on fusion model is demonstrated by experiments.
Keywords/Search Tags:Keywords extraction, LSTM, news recommendation, deep learning
PDF Full Text Request
Related items