Font Size: a A A

Research Of Web Spider Search Strategy In The Vertical Search Engine

Posted on:2018-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2428330545962759Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid increase of information and the wide use of the network,more and more people use General Search Engine to search information on the Internet.However,to find the theme of more detailed and accurate information is becoming more and more difficult.As Focused Search Engine comes,this kind of situation has greatly been improved.When the web crawler is constantly retrieving the statistics,Focused Search Engine will count up the matching values of the web pages and the information from time to time,and use this matching value to judge the correlation of the web pages and the information.Therefore,Focused Search Engine can well circumvent much irrelevant information and only show the matching web pages.From all mentioned above,Vertical Search Engine is better than General Search Engine in the searching speed,accuracy,and feedback.Because of the optimization of information,the need for maintenance of Focused Search Engine is reducing,and is fully better than that of General Search Engine System.This thesis firstly discusses the historical background,development and wide application prospect of the Focused Search Engine.Secondly,this thesis introduced in detail the basic theory and the specific implementation technologies of the Lucene Search Engine,including index technology,searching technology and segmentation technology,etc.Finally,the thesis introduces the basic technology on General Search Engine.We developed a search engine,and introducing the specific development process and the specific implementation of it.The thesis's main work is embodied in the following three aspects:(1)The thesis discusses the HITS algorithm strategy of General Search Engine,analyzing in detail the Authority and the Hub of HITS algorithm,and finding that in the development of search engines,it is easy to cause the channel inadequacy and theme drift phenomenon.Therefore,in the new development of search engines,these shortcomings and deficiencies are improved,and drift problems can be circumvented.In the way of optimization of the anticipated weighted values of hyperlinks we achieve and improve the precision of channel link recognition.(2)In order to solve the problem of matching accurately the subject and the information needed in the process of using General Search Engine,the thesis has carried on the corresponding improvement and optimization to matching algorithm,and gives each related terms,according to its correlation,different weight values,to make improvement of the matching degree of different themes.(3)The thesis develops a Vertical Search Engine based on the analysis of the advantages and weaknesses of the existing General Search Engine.This Search Engine takes the sccenic spots and city parks as the object,starting its experiment of searching the tourist attractions in Northeast China.The experimental results show that our search engine has obvious advantages in the precision comparing to the General Searching Engine.In the development of Vertical Search Engine,the thesis uses Java + Lucene open framework,and got a Focus Search Engine System that can be performed on the Tomcat server.Finally,the thesis also lists the test results of our search engine.The test results not only prove that the web crawler we developed has high search efficiency,but also show that it has certain actual application value and wide application range.
Keywords/Search Tags:Vertical Search Engine, Topic Interrelated, Search Strategy, Lucene
PDF Full Text Request
Related items