Font Size: a A A

Keyword Extraction Based On Sequential Pattern Mining

Posted on:2013-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:J J FengFull Text:PDF
GTID:2248330377960914Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of database and Internet technologies, people face withmore and more data. It is difficult for researchers to find information of theirinterest from these data. Data mining hence becomes increasingly important in theprocess. Text mining is an important part of data mining. Keyword extraction is afundamental text mining technology. It seeks to automatically extract keywordsfrom a document that reflect the theme of the document, and has importanttheoretical and applied research values. Keywords have been successfully used inthe following tasks: automatic indexing, document summarization, textcategorization, and text clustering. In this thesis, a keyword extraction algorithm isproposed with experimental results.The main contributions of the thesis are as follows.(1) After a study of keyword extraction techniques both within China andabroad, a brief review on the existing keyword extraction methods is given, and theadvantages and disadvantages of these methods are analysed. A detailedintroduction of sequential pattern mining is also provided. In the meanwhile, sometypical and common sequential pattern mining algorithms are described.(2) Wildcard constraints are taken into an existing SPAM algorithm, and thenthe SPAM algorithm is applied to sequential pattern mining on a text, digging outall the word patterns in the text.(3) A keyword extraction algorithm is proposed, which applies sequentialpattern mining to mine word patterns with wildcards from document sequences toobtain semantic features within words. This algorithm is independent on languageand does not need the help of a semantic dictionary. Experiments demonstrate thatpattern features obtained by sequential pattern mining enable improving the qualityof extracted keywords.(4) A keyword extraction prototype system based on sequential pattern miningis built with the above contributions of this thesis.
Keywords/Search Tags:Data Mining, Sequential Pattern Mining, Text Mining, KeywordExtraction
PDF Full Text Request
Related items