Research And Application Of K-Means Algorithm Based On Concept Lattice

Posted on:2011-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Li

Full Text:PDF

GTID:2178360302499231

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Along with the rapid development of Internet, search engine becomes a major channel of accessing to information. However, one search result of search engines will be thousands and all of them are mixed together, it will be needle in a haystack to find the information they want for users. One of the effective ways to improve the quality of search engine is that, to cluster the similar web texts in the web search results to a class using text clustering technique. Clustering of web search results can provide a easy-to-navigate information navigation for user, and assist users to locate the subject category which meet their needs rapidly, thereby the efficiency of search engine retrieval is enhanced.Clustering is the process of grouping objects according to the similarity in the case of that classes are unknown. Vector space model are used by most text representations and similarities are calculated based on it. Vector space model calculates the weight using TFIDF (Term Frequency Inverse-Document-Frequency). It has the advantage of reflecting the importance of keywords to the text, but this representation model has brought two major problems:(1) the dimension of feature vector is too high; (2) the text vector space is regarded as a vector space which consist of orthogonal term vectors, assuming that there is no semantic links between words, but the reality is that the words of texts is often semanticly associated, therefore the reliability of the results have been affected.Concept lattice is a ordered set of concepts, the process of establishing concept lattice is that of clustering concepts. In the concept lattice, the extension of a concept is a set of objects which belong to the concept, and the content is the set of attributes all these objects sharing. For the given formal context, the concept lattice constructed must existed and be unique.K-Means algorithm is the most widely used clustering algorithm based on partition, to deal effectively with large text set. This article proposes K-MeansBCC(K-Means Algorithm Based on Concept Lattice) combining concept lattice and K-Means algorithm. First, Generate concept lattice on the basis of texts as objects and tokens in the texts as attributes; second, extracte the formal concepts in the concept lattice, represent texts using these concepts and definite the similarity function between concepts; finally, cluster the texts using K-Means algorithm. Representing texts using concepts can reduce the dimension of feature words and improve the clustering performance. Aiming at the problem of the existing K-Means algorithm, such as artificial assignation of number of final clustering and random selection of initial centers, we propose a method based on the density. We applied the K-MeansBCC algorithm in the clustering module of "Haisou", and compared it with K-Means algorithm, the experimental results show that, K-MeansBCC algorithm is obviously reasonable and effective.

Keywords/Search Tags:

Concept lattice, Text Cluster, Text Representation, K-Means Algorithm, Concept Similarity

PDF Full Text Request

Related items

1	Study On The Unlabeled Text Mining Methods Based On The Concept Lattice Extension Models
2	Design And Implementation Of Concept Lattice Based Text Filtering System
3	Research On Clustering Process Model About The Text Of The Web Based On Concept Lattices
4	A Study On The Chinese Text Summarization Method Based On Concept Lattice
5	Research On Constructing Method Of Iceberg Concept Lattice And Fuzzy Concept Similarity
6	Research On Key Technologies In Construction And Application Of Concept Lattice
7	Study On Text Representation Model Based On Concept
8	The Application Of Formal Concept Analysis And Ontology In Text Mining
9	Object-oriented Concept Lattice With Attribute-oriented Concept Lattice Compression
10	Research On Concept Lattice Updating Construction Algorithms Based On Concept Hierarchy