Font Size: a A A

An Efficient Keywords Extraction Algorithm For Text Comprehension

Posted on:2017-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J H HanFull Text:PDF
GTID:2308330485461824Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, Internet information is growing explosively.How to quickly and accurately obtain the key information from these massive data becomes very meaningful. Keywords, as an important way to reflect the main thrust of the article, become an effective means to filter and understand the mass data. Thus,keyword extraction technology has been widely used in the field of Natural Language Processing, information retrieval and so on.The traditional keywords extraction algorithms, mainly only consider the statistical information in the text, but, ignore the theme of the article, fail to extract keywords from the semantic level. In view of the above problems, this paper proposes an efficient keyword extraction algorithm for text comprehension. The main work is as follows:1) A method is proposed for increasing the effective information of the document by calculating to find other documents associated with the document. We filter the relevant documents based on each sentence, the weak correlation sentences are removed from the document, to enhance the relevance of external information. These external information is added to the document.2) A keyword extraction algorithm based on the combination of extend document and document semantics is proposed. The algorithm can increase the effective information of the document. On the other hand, the weight of each word is examined according to the different topics of the document, to ensure that the key words have a good coverage of semantic Experiments show that, compared with textRank, the algorithm can effectively improve the accuracy of keyword extraction algorithm.3) A parallel latent dirichlet allocation algorithm is proposed. The algorithm distributes the global corpus to each nodes in Gibbs Sampling phrase, when performing a distributed sampling after each iteration, each node can get the information from the pre-node according to the "circle rule". Experiments show that the method has an effectively speed up the convergence rate of distributed Gibbs sampling.
Keywords/Search Tags:Keyword extraction, Topic Model, Gibbs sampling, Parallel, Virtual document
PDF Full Text Request
Related items