Font Size: a A A

Automatic Extraction Of Keywords And Text Summarization In Text Mining

Posted on:2019-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2348330542960738Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of information technology has promoted a rapid increase in the number of text messages in geometric speed.It is the current problem needed to solve that how to quickly capture the useful information from the mass text information,and then apply and manage the text information rationally.Text mining technology is an important technique to extract useful knowledge from complex text information.Keyword Extraction technology has been paid attention to by many researchers for its basic function in text processing as an important technology in the field of text mining.In addition,Text Summary technology is also a hot issue for scholars at home and abroad,because it is a concise means that provides useful information about the text for users.In this paper,we use a single Chinese document as a research object to study the automatic extraction method of the keyword and text summary,respectively.Firstly,a method of automatic extraction of Chinese text keyword based on complex network is proposed.The method constructs co-occurrence network of text words based on the complex network,and it combines the degree centrality,mediatoric and eigenvector centrality of the network nodes to construct the formula of node's comprehensive eigenvalue.Then,we output the network nodes by the descending order of the comprehensive eigenvalue and remove the single word node,and the former K-word is extracted as text keywords.The key words extracted by the improved method can express the text subject,which improves the accuracy of keyword extraction compared to the traditional TF-IDF algorithm.Secondly,the text summary the method that can compress a single document or multiple documents and summary the core idea of the document.The existing methods focus on the amount of information contained by the abstracts,ignore the consistency of the statements and make the generated abstracts less readable.In this paper,we use the single text as the research object,establish the connection relation between the sentences,and propose a method to automatically extract the text abstracts based on the graph model and the theme model.The method combines the text graph model,the complex network theory and the LDA theme model to construct the sentence comprehensive scoring function to compute the text single sentence weight,and outputs the sentences within the text threshold range(in descending order)as the text abstract.The algorithm provides sufficient information for the text summary while improving the readability of the text digest.Finally,combining with the automatic extraction methods of Chinese text keywords and text abstracts in this paper,the program and design of text analysis platform software is completed.Its core tasks include participle tagging(participle and pos tagging),Word frequency statistics,keyword extraction,syntactic analysis,subject model and abstract extraction and so on.
Keywords/Search Tags:complex network, keywords, text abstract, syntactic analysis, theme mode
PDF Full Text Request
Related items