Search Results Clustering Based On Web Structure

Posted on:2011-07-09

Degree:Master

Type:Thesis

Country:China

Candidate:S Wen

Full Text:PDF

GTID:2178360308963592

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Nowadays, the Internet has become one of the most important information sources, and more and more people use Search Engine as the first step of their surf. The traditional way of displaying search results in a one-dimension way, however, no longer meets the need of getting information efficiently. Three solutions have been proposed: query recommendations, personalized search, and search results clustering.Search Results Clustering is still far from satisfactory, though it has been studied extensively and for a long time. The main disadvantages: processing time is too long, cluster labels are not readable enough and cluster classification accuracy is too low. To avoid these drawbacks of the traditional search results clustering based on the summaries' similarity, this paper proposes a way to cluster search results according the web structure in an Intranet.A search results clustering system based on web structure is designed and implemented in this paper. It crawls web page, parses web structure, and determines web pages' semantic path offline, and merge semantic paths online once search results are returned. As we tag every web page in advance with a semantic path, what we do online is just merging these semantic paths, the processing time is cut down dramatically. According the observations, this paper proposes three rules to filter the non-hierarchical link, that is: a. there is no semantic child page for a topical page; b. links in the same link cluster points to the web pages of the same type; c. a link pointing to a semantic child web page is always at an outstanding position, compared to a link otherwise.In the last section, we compared the method we proposed with STC and Lingo, two famous search results clustering methods proposed by O.Zamir and O.Etzioni, Osinski Stanislaw and Dawid Weiss, respectively. As there is no similarity computation of search results' summaries, the method in this paper is much faster than Lingo. And as the cluster labels are extracted from anchor text, cluster label readability is more satisfactory as well. Compared to WWW pages, pages within intranets are more homogeneous, and so is the information need of people who use intranet search, which is why search results clustering according to web structure is better than that based on summaries' similarity.

Keywords/Search Tags:

Search Results Clustering, web structure, data clustering, search engine

PDF Full Text Request

Related items

1	Research On Search Results Clustering Technology For Cloud Search Engine
2	The Study On Web Search Results' Clustering
3	Research On Search Results Clustering And Label Extraction
4	Design And Implementation Of Web Search Results Clustering For Distributed Search Engine
5	The Study On Web Search Results' Clustering
6	Research And Implementation On Results Clustering Optimization Of Meta Search Engine
7	Search Engine Results Ranking Based On Web Page Clustering
8	Clustering Web documents: A phrase-based method for grouping search engine results
9	Research On Clustering Systems Of Search Engine Results
10	On The World Wide Web Search Engine Returns The Results Of Fuzzy Clustering Study