Font Size: a A A

Research And Implementation On Web Online Clustering

Posted on:2008-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiFull Text:PDF
GTID:2178360212476066Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid expanding of the Web, search engines increasingly become one of the necessary tools in the information age. Nowadays, the leading search engines will return a long list of records that are sorted according to the correlation with the query, which renders the manual search of actually desired information difficult, especially if the query encompasses several subtopics. Online clustering of the results returned by search engines became prevailing in academic field in recent times, which can group the search results according to some criterion, so that the users can quickly locate their interested information. We propose a new data model named DPG (Directed Probability Graph) to represent the documents, which makes it easy to recognize the shared phrases of them. The clustering algorithm based on this model can efficiently cluster the documents according to the shared phrases, and then assign meaningful labels for the clustering results. In general, our clustering algorithm improves the efficiency because it does not need to calculate the similarities between pair-wise documents; it enhances the readability of clustering results using phrase based labels; it avoids the segmentation phase during oriental language web clustering, so that it can adapt to web clustering of all languages. We implement a prototype system named ETOC (Efficient Token-based Online Clustering) to cluster search results returned from a search engine. As shown by our experiments, our clustering algorithm is very efficient, and the clustering results are more readable and accurate than the other ones. So that it's suitable for online Web-snippet clustering.
Keywords/Search Tags:Web clustering, online, efficient, Directed Probability Graph, Algorithm, Model
PDF Full Text Request
Related items