Research Of Document Clustering For User Interest

Posted on:2009-01-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Wang

Full Text:PDF

GTID:2178360248950004

Subject:Computer application technology

Abstract/Summary:

With the expansion of document resources and web pages on the internet, it becomesdifficult for people to get the information they need from the web. Therefore, how toeffectively organize the great magnitude document resources and help users access theinformation they really need is a problem that highly desirable to be solved in the field ofinformationretrieval.Document clustering is a very important technology in text mining. It has been widelyused in information management, search engine, recommendation system and other fields.The k-means algorithm is a method that most commonlyused in document clustering, whichis simple and with fast convergence. This paper mainly focuses on the research andimprovementofthek-meansalgorithm.Firstly, for the drawback that k-means algorithm needs the assignation of finalclustering's number and the random selection of initialization, a new kind of initialization ispresented, which is based on reference region. In fact, the algorithm is improved with thecombination of k-means algorithm and the clustering algorithm based on density. Theexperiment shows that the improved algorithm can get better result, compared with thetraditional k-means algorithm. Meanwhile, it can keep the efficiency of algorithm based ondensity.Secondly, for the drawback that the k-means algorithm tends to get stuck at a localmaximum far away from the optimal solution, an optimization based on local search is usedto improve the algorithm. According to the characteristic of text data, the clustering will bepartitioned by the way of moving much of the data. This procedure makes the appropriateiterations to enlarge the search space.The theory analysis and experimental results show thatthe optimization improves traditional k-means algorithm efficiently, and its computation isalsolinearinthesizeofthedocumentcollection. Finally,the technologies ofdocument clusteringanduserinterest modelingare carefullyresearched and integrated. A clustering system for user interest modeling is made, which iscalled CSUI (Clustering System of Users'Interest). This system uses the improved k-meansalgorithm to cluster those web pages which users have viewed. At last, it outputs the users'interestinaformofthecorrespondingmodel.

Keywords/Search Tags:

Document Clustering, k-means, Reference Region, Local Search, User InterestModeling

Related items

1	Research On Local Region Detection Methods And Its Application In The Filed On Image Retrieval
2	Based On K-means The Chinese Text Clustering Algorithm
3	Document Clustering In Search Engine
4	Research On Document Clustering Algorithm Based On K-means
5	Study of document clustering using the k-means algorithm
6	Research On Personalized WEB Search Based On Reference Document Model
7	Multi-document Summarization Based On Improved Fuzzy C-means Clustering Algorithm
8	Research And Application Of K-medoids Clustering Algorithm Based On ε_o-neighborhood Search Strategy
9	Research Of Image Segmentation Algorithms Based On FCM Clustering
10	Local Search Algorithms For Capacitated Facility Location And K-means Problems