Font Size: a A A

The Study On Web Search Results' Clustering

Posted on:2009-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiuFull Text:PDF
GTID:2178360278471325Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the incessant development of computer technology and network technology, Internet becomes the largest database in the world. Facing large numbers of information, it is more and more difficult for users to try to find and search information through scanning Web. Search engine is the main approach for people to obtain information on the Web nowadays. However, some search engines such as Google, Baidu, Yahoo, often return a long, relevant information and irrelevant information mixed search results list. Users have to check the results of the list one by one to obtain the information them need, which makes users difficult to access the information them really need. Therefore, how to make users search the necessary information through the search engine more accurately and conveniently, becomes a very important and worthy of study subject.The emergence of data mining technology provides a new way to address this problem. Data mining aims at extracting hidden, unknown, useful, unusual pattern or knowledge. As one of the basic techniques of data mining, clustering can find out the data's inner characteristic and distribution rule by contrasting the similarity and dissimilarity in data, so we can obtain the further understanding. Using clustering technology to process the search results to display them in a more reasonable way, makes it possible for users to access to the necessary information more conveniently.In this thesis, we researched on the Web search engine technology and Data mining technology, responded to this issue, proposed a search results clustering system model which could be able to cluster search results obtained from Web search engines in Chinese language environment. The main idea of our model is to obtain the search results from Web search engines as the input data. First of all, generate good description, readable cluster labels, and then assign relevant search results to corresponding cluster labels. Therefore, the search results will be returned to the user in clustering way, allow user to find the information more conveniently.To design the model, we study the two classic search results clustering algorithms——SHOC and LINGO. Considering the characters between the Chinese language and the English language, we modify and adjust the original algorithm which used in English language, so that our model can be more effective in Chinese language environment.
Keywords/Search Tags:Clustering Arithmetic, Web Search Engine, Search results, Suffix Array, Latent Semantic Indexing
PDF Full Text Request
Related items