Font Size: a A A

Research On Vertical Search Engine And Key Techniques

Posted on:2011-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:H J XuFull Text:PDF
GTID:2178360302994648Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the Web has become a service network providing mass information, which includes all kinds of information resources and sites distributed all over the world. The Search Engine is a kind of search tools helping Web users to look up information what they need. Some normal Search Engines test to index all the pages of the Web for providing various services for users. But with the diverse development of information, the normal Search Engine can't fill with the need of professional user. Users need some searching technique and method for acquiring related topic resources. Vertical Search Engine is produced by the pushing of the requirement.In this paper, according to the difference between Normal Search Engine and Vertical Search Engine firstly, combining the specialized,refined and deep feature of Vertical Search Engine at the same time, and then introducing module of judging topic, module of extracting information and module of clustering to Vertical Search Engine framework which is proposed.Bases on the professional crawling algorithm which is core of net spider of Vertical Search Engine, the paper research Best-First algorithm extensively and deeply based on PageRank. First, the problem is that PageRank algorithm calculates page score with the hyperlinks between pages, the more score of page, the more important of page, which goes against searching topic information, an improved PageRank is proposed calculating hyperlink similarity of page. Second, considering from a single Web page, then using url, title, text of every page, the similarity based page content is proposed. Finally, a BLCT topic crawling algorithm is proposed combining content similarity and improved PageRank algorithm, and carrying out a corresponding experiment.Finally, the text clustering technique was researched deeply, through clustering of the searching results can reduce the number of results what users need to look up, leading into the time required for querying reduced. The problem is that k-means is the most popular clustering algorithm with the convergence to one of numerous local minima, which resulting in the results of clustering is sensitive for initial clustering center, an improved centroid-based text clustering algorithm is proposed and carrying out a corresponding experiment with initial center selected by special strategy.
Keywords/Search Tags:Vertical Search Engine, Framework, Topic crawling, Text clustering, Net Spider
PDF Full Text Request
Related items