Text Clustering Method Based On Frequent Itemsets

Posted on:2010-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:J Xiao

Full Text:PDF

GTID:2208360278970221

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text is an important information carrier, the number of which expands with the development of Internet. As an unsupervised machine learning method, text clustering method is an important method for organizing text message, summary and navigation, and is focused by a growing number of researchers. Text clustering is playing an important role in many text mining and information retrieval systems.This paper focuses on how to improve Chinese text clustering steps so as to gain a good clustering result. Steps related to text clustering mainly include texts pre-processing, choosing features, text representation and clustering, which play a vital role in clustering quality. Traditional clustering algorithms are VSM-based. VSM is a model based on keywords, which ignores the potential semantic relations between words. Additionally, its inherent problem of "high-dimensional curse" has become the bottleneck to enhance algorithm's performance. These problems are very disruptive to the efficiency of text clustering algorithms. This paper introduces HowNet as ontology of clustering algorithms. By mapping keywords of texts to corresponding concepts in HowNet, algorithms can be carried out on set of concepts. Then, semantic-missing of VSM can be compensated. To improve the performance of algorithm, we introduce the concepts of frequent item-sets and non-overlapping and adopt a new partitioning rule to realize the clustering of original texts. Based on these ideas, a clustering algorithm base on frequent item-sets named CFI is proposed.In the final section of the paper, several experiments are designed to analyze the feasibility of CFI. Experimental results show that through integration of HowNet and the idea of frequent item-sets, the proposed algorithm effectively reduces the dimension of characteristics of texts, improves the accuracy of the cluster and reaches better quality compared with the traditional peer frequent item-sets based methods.

Keywords/Search Tags:

Text clustering, Concept mapping, Frequent Item-set

PDF Full Text Request

Related items

1	Research On Distributed Text Clustering Based On Frequent Item Set
2	Frequent item-based text clustering
3	Search Results Clustering Method Based On Maximal Frequent Itemsets
4	Research On Clustering Process Model About The Text Of The Web Based On Concept Lattices
5	Research On Mining Algorithms Of Maximal Frequent Item Sets
6	Research Of An Improvement Chinese Text Clustering Algorithm Based On Concept
7	Study On Mining Maximal Frequent Itemset Based On Iceberg Concept Lattice
8	Message Text Clustering Based On Frequent Patterns
9	Mining Of Maximal Frequent Item Sets Based On AFOPT
10	Research And Improvement The Algorithm Of Mining Frequent Item Sets In Text Association Analysis