Font Size: a A A

The Study On Web Search Results' Clustering

Posted on:2008-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:D P ZhouFull Text:PDF
GTID:2178360212476051Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently, search engine technology is a very hot research area of Internet technology, and some excellent search engines also come with it. But now most of the search engines present the search result to the end user with linear search result, as there are thousands search results at one time, it make the user feel difficult to find what they really want. If we clustering the search results into hierarchical classes, and assigning a descriptive tag for them, the time for finding the results that user really wanted will be dramatically reduced.The main task of our thesis is we implemented Document Information Retrieval System, which is a search engine that is used to index and search HTML document based on Eclipse plug-in mechanism. Document Information Retrieval System implemented the whole process from indexing, searching to clustering, and provides rich functions such as let the user rating the search results, which can influence the search results in the future. Based on this, we also implement a search result clustering module. In the clustering module, the clustering algorithm we used has two significant characters: Semantic and Hierarchical. The key idea of our method is to first discover descriptive cluster labels and then, organize the labels into a hierarchical label tree. After that, based on the label tree, assign related documents to each label; finally determine the actual content of each cluster and form the cluster tree as the result. In this thesis, we show how the cluster label discovery can be accomplished with the use of the Latent Semantic Indexing technique and how to organize the discovered labels into a hierarchical label tree. At last, we also employ the classic Vector Space Model to classify the documents.
Keywords/Search Tags:Clustering Algorithm, Search Engine, Indexing, Search results
PDF Full Text Request
Related items