Font Size: a A A

Research And Improve On Clustering Method Of The Search Engine's Retrieved Results

Posted on:2008-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:P D LiFull Text:PDF
GTID:2178360212995256Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, although many search engine systems have been trying to improve the retrieval precision, the retrieved results still include a lot of irrelevance documents mixing with the relevance ones, so this brings the web users a huge burden. Clustering of the retrieved results of search engine, the groups which are formed should have a high degree of association between members of the same groups and a low degree between members of different groups. So the users can view their interested groups and so that it will save them much time.Firstly, this paper gives an exacting document feature method that dictionary identification work together with statistics based on key-phrase, it can not only find normal key-phrase, but also can find professional terms, words for short, temporary words, new words and so on which are not in the dictionary. The structure of index is improved. By using the sequence index and files inverted index of the snippets together, it is more adaptive to the cluster of snippets of search engine.Secondly, a fast cluster algorithm, HPMC, is presented. The similar degrees of snippets are calculated, then the initialized cluster center is created by using hierarchical clustering method. An algorithm based on k-means and single pass is presented and getting the function clusters. At last, the cluster results are got by uniting functional clusters properly.Lastly, the performance of HPMC is evaluated from the aspects of temporal complexity, special complexity, cluster quality, cluster numbers, the sensitivity degree to single point and is compared with algorithm before.
Keywords/Search Tags:Search Engine, Retrieved Result, Clustering, Key-phrase, Cluster
PDF Full Text Request
Related items