Font Size: a A A

The Research Of Text Clustering Based On Frequent Selected Word Set

Posted on:2011-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:L F WangFull Text:PDF
GTID:2178360305972735Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
About fifteen years have passed after the proposal of data mining, but the development of data mining is very rapid because of the real need. Data mining technology is the technology of mining the potential knowledge that people have not found, through computer technology, using various disciplines of knowledge and technology, based on a large number of actual data. The birth of data mining is on the basis of the original database technology and data warehouse technology, to meet the need of people for the analysis processing of large data. In the rapid development of the modern information society, data mining technology is obtaining more extensive and in-depth attention and study. Text clustering technology is a kind of data mining technology, according to the task of data mining technology, text clustering belongs to the field of clustering; according to the data source of data mining technology, text clustering belongs to the field of text mining.As the development of the information society and the Internet, text document information is to increase speed. The technology for text clustering in query, collection and browse, plays an important supporting role, it is becoming increasingly important. In this paper, the author aims to:data mining, technology for mining frequent selected word set, text clustering technology, proposing an improved method of mining frequent selected word set used to improve the technology of text clustering based on frequent selected word set and optimizing the implementation.The status of the text clustering are reviewed in this paper; basic concepts, basic definitions and fundamental theorems about data mining are described and explained. Compared with the traditional method of Apriori algorithm for mining frequent selected word set, a new improved method of mining frequent selected word set based on linked list and matrix is proposed, a qualitative analysis is made. Instead of the traditional method of Apriori algorithm for mining frequent selected word set in text clustering based on frequent selected word set, the method of mining frequent selected word set based on linked list and matrix is used to generate frequent selected word set. In the specific implementation, in the face of the same information entropy, frequent selected word set that contains the more selected words is selected as a cluster, in the face of that both information entropy and the number of selected words are the same, frequent selected word set that is fronter is selected as a cluster, and an experimental process and results analysis are given. Finally, a summary of research of this paper is given and the related further research directions are discussed. The major improvement is the following:(1) Compared with the traditional Apriori algorithm for mining frequent terms sets, the new improved method of mining frequent selected word set based on linked list and matrix is presented to improve the efficiency of generating frequent selected word set.(2) Instead of the traditional method of Apriori algorithm for mining frequent selected word set in text clustering based on frequent selected word set, the method of mining frequent selected word set based on linked list and matrix is used to generate frequent selected word set, in the specific implementation, in the face of the same information entropy, frequent selected word set that contains the more selected words is selected as a cluster, in the face of that both information entropy and the number of selected words are the same, frequent selected word set that is fronter is selected as a cluster.
Keywords/Search Tags:Text clustering, Frequent selected word set, Linked list, Matrix
PDF Full Text Request
Related items