Research On Vertical Search Engine And Key Techniques

Posted on:2011-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:H J Xu

Full Text:PDF

GTID:2178360302994648

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, the Web has become a service network providing mass information, which includes all kinds of information resources and sites distributed all over the world. The Search Engine is a kind of search tools helping Web users to look up information what they need. Some normal Search Engines test to index all the pages of the Web for providing various services for users. But with the diverse development of information, the normal Search Engine can't fill with the need of professional user. Users need some searching technique and method for acquiring related topic resources. Vertical Search Engine is produced by the pushing of the requirement.In this paper, according to the difference between Normal Search Engine and Vertical Search Engine firstly, combining the specialized,refined and deep feature of Vertical Search Engine at the same time, and then introducing module of judging topic, module of extracting information and module of clustering to Vertical Search Engine framework which is proposed.Bases on the professional crawling algorithm which is core of net spider of Vertical Search Engine, the paper research Best-First algorithm extensively and deeply based on PageRank. First, the problem is that PageRank algorithm calculates page score with the hyperlinks between pages, the more score of page, the more important of page, which goes against searching topic information, an improved PageRank is proposed calculating hyperlink similarity of page. Second, considering from a single Web page, then using url, title, text of every page, the similarity based page content is proposed. Finally, a BLCT topic crawling algorithm is proposed combining content similarity and improved PageRank algorithm, and carrying out a corresponding experiment.Finally, the text clustering technique was researched deeply, through clustering of the searching results can reduce the number of results what users need to look up, leading into the time required for querying reduced. The problem is that k-means is the most popular clustering algorithm with the convergence to one of numerous local minima, which resulting in the results of clustering is sensitive for initial clustering center, an improved centroid-based text clustering algorithm is proposed and carrying out a corresponding experiment with initial center selected by special strategy.

Keywords/Search Tags:

Vertical Search Engine, Framework, Topic crawling, Text clustering, Net Spider

PDF Full Text Request

Related items

1	The Design And Implementation Of Vertical Search Engine Framework
2	Research On Topic Web Page Crawling Strategy For Vertical Search Engine
3	Research And Implementation Of Vertical Search Engine
4	The Theme Of The Search Engine Web Spider Search Strategy Study
5	A Vertical Search Engine In The Field Of News
6	The Research And Design On Vertical Search Engine
7	Research And Application Of Vertical Search Engine Key Technologies Based On The Lucene
8	Research On The Search Strategy Of Web Spider Based On Specific Topic
9	The Research Of Vertical Search Engine Based On The Education Information
10	Research On Focused Crawling Technique For Vertical Search Engine