Font Size: a A A

Keyword Extraction Based On LDA And Word2Vec

Posted on:2017-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q S WeiFull Text:PDF
GTID:2308330503978550Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet and information technology, Big Data Analytics become one of the hottest topics. Natural language processing(NLP) is an important sub-area in Artificial Intelligence. The purpose of NLP is to analyze vast amounts of text data. Analysis of the specific areas of text, important information in this area can be discovered, which is helpful for implementation of foreseeing the future outcome in the field. For the purposes of the financial field, by analyzing vast amounts of text information in the field of Finance and Economics, we can understand the prospects for economic development, to achieve economic development foreseeing. Keywords are important features of the text data and important basis for text analysis. Automatic Keywords identification method is the fundamental task of NLP and has important significance and application prospects.Keywords Extraction is one of the core technologies for Chinese text analysis, and plays an important role in text analysis. Most currently used methods need to segment Chinese text and count the word in large amounts of text vocabulary to extract the key words. Accuracy of words and the number of the texts are two important factors affect keyword extraction accuracy.In order to extract the keywords in Financial Field, this thesis proposes a method based on the combination of topic model and lexical similarity. The main contributions are as follows:This thesis analyzes the news text within a certain period in the financial and economic fields, and extracts the topic related words in the financial field based on the Latent Dirichlet Allocation(LDA) model.Firstly, topic model is used to extract the candidate keyword set, and calculate the accuracy of the set. Secondly, topic model and word2 vec model are used to get the keyword set, and calculate the accuracy of the set. Thirdly, the TF-IDF method is used to get the keyword set, calculate the accuracy of the set. According to the experimental results, in the financial field of text, keyword extraction based on the LDA model and Word2 Vec model method has a better result.
Keywords/Search Tags:keyword extraction, LDA model, Word2Vec model, specific area keyword extraction
PDF Full Text Request
Related items