Font Size: a A A

The Research On The Key Technology Of Vertical Search Engine

Posted on:2013-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:X DongFull Text:PDF
GTID:2248330371986079Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With increasing information on the Web, getting more precise, detailed, in-depthinformation has been flinty challenges of general search engine on informationacquisition. Different from the general search engine vertical search engine networkrobots need to calculate the topic interrelated on the current webpage constantly whilecrawling, and evaluate the topic interrelated of webpage according to the figure,therefore, it can effectively avoid the large number of unrelated subject area, onlyretrieve Web with specific thematic areas related webpage information. Therefore, thevertical search engine’s accuracy rate, recall rate and efficiency are far better thangeneral search engines. Meanwhile, as the sharp decline in the number of webpages,the cost of vertical search engine system’s maintenance is also much lower thangeneral search engines.This paper first introduces the significance of the vertical search enginetechnology, and basedon full-text search engine Luceneframework introduces threecore technologies: indexing, searched Chinese word segmentation. With a generalsearch to compare, study the key technology of vertical search engine in-depth. Herethe main research of the paper includes the following points:(1)Point out the drift and tunnel phenomenon with maybe caused by the HITSalgorithm in the search strategy. And the algorithm has been improved, making theproblem solved in some degree. Introduce the predicted weighting parameters ofhyperlink. Making the Tunnel links higher accuracy of discrimination(2)The vector space model identification algorithm implicated in the subjectrelevance, before the application of this algorithm, assuming that the document entryindependent of each other, and reality does not match, resulting in cannot determinethe relevance of its them accurately. This algorithm has been improved.that is to say give all then tries in the document different weights, And then this factor is also addedto the theme of correlation algorithm, so the algorithm is more accurate.(3)At the same time, put forward a new webpage eliminating strategy achievedgood results through a large number of experiments. Based on the open source Javaand Lucene framework, Construction of a software prototype, with can be run on aTomcat server of the vertical search engine system with the improved algorithm.Finally, to demonstrate the improved algorithm has higher performance, thepaper conduct a lot of experiential test. Demonstrate the rationality and practicabilityof the improved algorithm bias experiments.
Keywords/Search Tags:Vertical search engine, Search strategy, Topic interrelated, Lucene
PDF Full Text Request
Related items