Font Size: a A A

Research About Term Network Based Keywords Extraction Strategy

Posted on:2009-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:R Y KanFull Text:PDF
GTID:2178360242997290Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of Internet since 1990, we have seen a tremendous growth in the volume of online text documents available on the Internet, such as electronic emails,web pages,and digital books et al. To make more effective use of these documents, there is increasingly need for tools to deal with text documents. To meet such increasingly needs, some product for analyzing text documents has been developed. All techniques involved in document analysis have formed a new exciting research area often called as Text Mining.Keywords extraction plays a very important role in the text mining domain, because keywords are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyword extraction is to select keywords from the text of a given document. Automatic keywords extraction makes it feasible to generate keywords for the huge number of documents that do not have manually assigned keywords.There are some previous approaches on keywords extraction: 1 Supervised Classification, Turney firstly approach the problem of automatically extracting keywords from text as a supervised learning task, he treats a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keywords. The performance has been satisfactory for a wide variety of applications. 2 Unsupervised Classification, these keywords extraction algorithms that applies to a single document without using a corpus are presented, such as term frequency, based on SWN, the term graph, the term network..Based on the analysis of existing keywords extraction using term network, an effective algorithm is proposed to extract not only high frequent terms, but also important terms with low frequency. It bases on the term network and deleting actor index. The experiment results support the conclusion.
Keywords/Search Tags:deleting actor, co-occurrence, keyword extraction, term network, betweenness centrality
PDF Full Text Request
Related items