Font Size: a A A

Research On Clustering System For Web Searching Results

Posted on:2008-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q S MengFull Text:PDF
GTID:2178360245993124Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this information times, with the flourish of Internet and the popularization of personal computer, World Wide Web has become the global resource center with infinite capacity to gain and exchange information. It will exert more and more important influence on the human being's activities.The search engine collects and stores web documents dynamically, and provides the user with a result set in response to his query. It improves the human's ability for information retrieval. With the proliferation of the web information, however, traditional search technology can only produce a longer list of results, which makes the user wasting time and efficiency over hunting for the information he really needs.An alternative to traditional methods is to cluster the retrieved set into groups of documents with common subjects. So it will facilitate users'quick browsing through search results.Firstly, this paper briefly introduces the concept and application of data mining, especially focuses on the clustering method, summarizes the implement of several classical clustering algorithms, and analyzes their advantages and shortcomings respectively.Based on related research work in this field, this paper brings forward a new document clustering method which combines the feature extraction based on salient scores calculation and the K-means clustering algorithm. The TFIDF relative term frequency and context independence degree are considered in computing the salient score. The coefficients in the salient score formula are fixed according to our experiments.A text clustering system which includes text preprocess, feature extraction, text vector representation and clustering is implemented with Java language. Experiments are carried out to evaluate the performance of the system. The experimental results verify the method's feasibility and effectiveness and also show that it has some advantages on time complexity.
Keywords/Search Tags:Web Information Retrieval, Text Clustering, Salient Score, K-means Algorithm
PDF Full Text Request
Related items