The Research Of Text Clustering Based On Frequent Selected Word Set

Posted on:2011-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:L F Wang

Full Text:PDF

GTID:2178360305972735

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

About fifteen years have passed after the proposal of data mining, but the development of data mining is very rapid because of the real need. Data mining technology is the technology of mining the potential knowledge that people have not found, through computer technology, using various disciplines of knowledge and technology, based on a large number of actual data. The birth of data mining is on the basis of the original database technology and data warehouse technology, to meet the need of people for the analysis processing of large data. In the rapid development of the modern information society, data mining technology is obtaining more extensive and in-depth attention and study. Text clustering technology is a kind of data mining technology, according to the task of data mining technology, text clustering belongs to the field of clustering; according to the data source of data mining technology, text clustering belongs to the field of text mining.As the development of the information society and the Internet, text document information is to increase speed. The technology for text clustering in query, collection and browse, plays an important supporting role, it is becoming increasingly important. In this paper, the author aims to:data mining, technology for mining frequent selected word set, text clustering technology, proposing an improved method of mining frequent selected word set used to improve the technology of text clustering based on frequent selected word set and optimizing the implementation.The status of the text clustering are reviewed in this paper; basic concepts, basic definitions and fundamental theorems about data mining are described and explained. Compared with the traditional method of Apriori algorithm for mining frequent selected word set, a new improved method of mining frequent selected word set based on linked list and matrix is proposed, a qualitative analysis is made. Instead of the traditional method of Apriori algorithm for mining frequent selected word set in text clustering based on frequent selected word set, the method of mining frequent selected word set based on linked list and matrix is used to generate frequent selected word set. In the specific implementation, in the face of the same information entropy, frequent selected word set that contains the more selected words is selected as a cluster, in the face of that both information entropy and the number of selected words are the same, frequent selected word set that is fronter is selected as a cluster, and an experimental process and results analysis are given. Finally, a summary of research of this paper is given and the related further research directions are discussed. The major improvement is the following:(1) Compared with the traditional Apriori algorithm for mining frequent terms sets, the new improved method of mining frequent selected word set based on linked list and matrix is presented to improve the efficiency of generating frequent selected word set.(2) Instead of the traditional method of Apriori algorithm for mining frequent selected word set in text clustering based on frequent selected word set, the method of mining frequent selected word set based on linked list and matrix is used to generate frequent selected word set, in the specific implementation, in the face of the same information entropy, frequent selected word set that contains the more selected words is selected as a cluster, in the face of that both information entropy and the number of selected words are the same, frequent selected word set that is fronter is selected as a cluster.

Keywords/Search Tags:

Text clustering, Frequent selected word set, Linked list, Matrix

PDF Full Text Request

Related items

1	Research On Text Clustering Algorithm Based On 2 Degree Frequent Word Sequence
2	Text Clustering Method And Application Research Based On NMF Algorithm
3	The Research On Distributed Large Data Full-Text Retrieval Based On Block Linked List Index Algorithm
4	Research On Microblog Hot Topic Discovery Technology Based On Frequent Word Sets
5	Message Text Clustering Based On Frequent Patterns
6	Text Clustering Method Based On Frequent Itemsets
7	Study And Application Of Frequent Pattern And Multi-modalities Data Clustering Algorithm
8	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
9	Research On Distributed Text Clustering Based On Frequent Item Set
10	Study And Implementation Of Frequent Closed Word Sequence Set Based Hierarchical Clustering Algorithm