Font Size: a A A

For Several Key Technical Topics Of Search Engine Research

Posted on:2017-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:D M DongFull Text:PDF
GTID:2428330545462694Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With increasing information on the Web,getting more precise,detailed,in-depth information has been flinty challenges of general search engine on information acquisition.Different from the general search engine vertical search engine network robots need to calculate the topic interrelated on the current webpage constantly while crawling,and evaluate the topic interrelated of webpage according to the figure,therefore,it can effectively avoid the large number of unrelated subject area,only retrieve Web with specific thematic areas related webpage information.Therefore,the vertical search engine's accuracy rate,recall rate and efficiency are far better than general search engines.Meanwhile,as the sharp decline in the number of webpages,the cost of vertical search engine system's maintenance is also much lower than general search engines.This paper first introduces the significance of the vertical search engine technology,and basedon full-text search engine Luceneframework introduces three core technologies:indexing,searched Chinese word segmentation.With a general search to compare,study the key technology of vertical search engine in-depth.Here the main research of the paper includes the following points:(1)Point out the drift and tunnel phenomenon with maybe caused by the HITS algorithm in the search strategy.And the algorithm has been improved,making the problem solved in some degree.Introduce the predicted weighting parameters of hyperlink.Making the Tunnel links higher accuracy of discrimination.(2)The vector space model identification algorithm implicated in the subject relevance,before the application of this algorithm,assuming that the document entry independent of each other,and reality does not match,resulting in cannot determine the relevance of its them accurately.This algorithm has been improved.that is to say give all then tries in the document different weights,And then this factor is also added to the theme of correlation algorithm,so the algorithm is more accurate.(3)At the same time,put forward a new webpage eliminating strategy achieved good results through a large number of experiments.Based on the open source Java and Lucene framework,Construction of a software prototype,with can be run on a Tomcat server of the vertical search engine system with the improved algorithm.Finally,to demonstrate the improved algorithm has higher performance,the paper conduct a lot of experiential test.Demonstrate the rationality and practicability of the improved algorithm bias experiments.
Keywords/Search Tags:Vertical search engine, Topic interrelated, Search strategy, Lucene
PDF Full Text Request
Related items