Search Results Clustering Method Based On Maximal Frequent Itemsets

Posted on:2010-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:C Su

Full Text:PDF

GTID:2178360332957853

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the explosive information growth on Internet, how to help users to locate the information they need becomes an urge and important issue. Clustering the search results on line can solve the problem by showing users results in groups. However, since the clustering of search results is real-time, and the resultant cluster labels should be readable, the traditional clustering algorithms can't meet the need. Besides, most previous research is based on webpages snippets and the accurracy needs to be improved. In this paper, we research and design a new clustering algorithms based on the full-text of webpages on the platform of an open-domain search engine. Through the investigation of the frequent itemset and its use in clustering algorithms, we proposed a search results clustering algorithm based on maximal frequent itemsets (Maximal Frequent Itemsets Clustering, MFIC). Through using dymanic min support and selecting maximal frequent itemsets as the foudation of clustering, MFIC breakthroughs the bottleneck of frequent itemsets used in on-line clustering. Cluster labels are also generated from frequent itemsets.This thesis mainly includes the following contents:(1) With the consideration of the search results, preprocess the web pages, apply the dynamic min support method in frequent itemsets mining and mine maximal frequent itemsets instead of mining all the frequent itemsets, improve the usability of frequent itemsets; and fulfill the real-time demand?(2) Design and implement web pages online clustring system, which computes the similarity and cluster pages based on the relation of frequent itemsets' covered pages set and words set;(3) Design the labels generation algorithm, combined with frequent itemsets and the order of word sequence, extract phrase labels, improve the label generation of the clustering algorithms based on frequent itemsets;(4) By comparing with other clustering algorithm experimentally, this thesis validates the advantage of our online clustering method.Finally, the system has been successfully used in an intelligent web information retrieval platform. Experimental results show that the proposed method can meet the requirements of online clustering, especially in time complexity and precision.

Keywords/Search Tags:

search engine, text clustering, frequent item set

PDF Full Text Request

Related items

1	Research On Distributed Text Clustering Based On Frequent Item Set
2	Text Clustering Method Based On Frequent Itemsets
3	Frequent item-based text clustering
4	Text Clustering And Its Application In Web Community Search Engine
5	The Research On Clustering Algorithm For Text Search Engine
6	Research On Text Clustering For Search Engine
7	Research On And Implementation Of Frequent Item Set Mining System In Data Stream
8	An Algorithm Of Search Engine Based On Text Clustering
9	Search Engine Results Ranking Based On Web Page Clustering
10	The Research Of A Multi-language Supporting Description-oriented Clustering Algorithm On Meta-Search Engine Result