Font Size: a A A

Design And Implementation Of Web Search Results Clustering For Distributed Search Engine

Posted on:2013-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:X Z HuFull Text:PDF
GTID:2248330374975905Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The Internet is becoming a quite important way to get the information with get growthof Information Technology. Although the search engine is an efficient tool to find theinformation from this big data store, it is still hard to locate the real information for the people.Web search result clustering is an efficacious method to deal with this trouble. However,much work has been done in this field; it is still unusable. For example, the label’s is notreadable; it can’t finish on-line; the precision is not good. This paper aims to improvement theexisting method in CNGI-The distributed search engine oriented to next generation network(SE6). It consists of three parts: clustering search results based on semantic; clustering searchresults based on their publish time; clustering search results based on source sites. Above all,semantic clustering is the most important part of this paper.The paper deals with the documents with nature language processing method by usingthe dictionary whose name is “synonyms dictionary”. It merges the synonyms which are fromthe documents. It decreases the dimensionality by that using the synonym’s code vectordescribes the document. When it generates the phrases, this paper records the alreadyprocessed word. So it can stop as soon as quickly and avoid generating the same phrases morethan one time. It is nature that two phrase express the same mean but their outlook are not thesame. Accord to the problem, the paper uses the synonym’s code to compare two phrases notby the traditional way.In the last section, this paper compares the existing method, such as LQOM, Lingo andSTC. This paper gets good labels than Lingo and STC. It gets a significance improvement inthe processing efficiency than LQOM and meets the on-line processing demand. Apart fromsemantic clustering, this paper adds clustering base on their publish time and clustering basedon site. Obviously, user’s experience is improved.
Keywords/Search Tags:search result clustering, synonym, pruning, label’s readable, on-line processing
PDF Full Text Request
Related items