Font Size: a A A

Rresearch And Design Of Blog-oriented Vertical Search Engine With The It Technology As The Theme On The Basis Of Nutch

Posted on:2017-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:G WangFull Text:PDF
GTID:2348330518495601Subject:Cryptography
Abstract/Summary:PDF Full Text Request
With the rapidly development of the Internet,varieties of resources,like web page,music,pictures,videos and so on,ushered in the great expansion,which makes it difficult to find what you need quickly and accurately.This demand has given rise to the rapid development of vertical search engine.Researching the history and trend of vertical search engine,this paper studies various technologies involved in the vertical search engine in detail.After that,this paper designs and realizes a personalized vertical search engine of blog-oriented.The main work of this paper includes the following aspects:1)This paper analyzes and contrasts the system architecture and operating principle between the General Purpose Web Crawler and the Vertical Web Crawler,researches the topic decision algorithm,sums up the feedback of user's search behavior on vertical search engine,studies and summarizes the performance and features of the encryption algorithm.This paper introduces the project background of this project and researches technologies related to search engines and cryptography.2)This paper designs and implements a distributed subject crawler based on Nutch.Analyzing the system architecture and working principle of Nutch in detail,this paper proposes a scheme which can transform the general web crawler Nutch into a vertical web crawler.Based on the Naive Bayes Classification Algorithm,this paper realizes a Naive Bayes text classification plugin which also includes a module used for judging the theme relativity of URL,deepens the depth of crawler.Using the Nutch plug-in mechanism to include the plugin into Nutch,this paper converts the general web crawler Nutch into a vertical web crawler;3)This paper designs and implements a personalized query system based on Solr.Researching on the feedback of user's search behavior on vertical search engine,this paper a personalized query system based on Solr,with which can collect user's search behavior into the database.And then,this module will analyze and build user-interest model which is used to extend the user's current query,converting Solr into a personalized query system.4)Considering the protection of user's private information,the personalized query system uses SSL to transfer user' s searching information.After encrypting,these information by AES,the cipher-text will be stored into the database,defends the privacy of users.5)This paper designs and implements a personalized vertical search engine of blog-oriented,and gives some experiments on it.Experimental results shows that the proposed design scheme is effective,although filtering capabilities reduces the efficiency of the crawler,but compared with the original Nutch system and the general search engine Baidu,it greatly improving the precision of search engine.
Keywords/Search Tags:Personized vertical search engine, Naive Bayes, Nutch, Solr, Theme Crawler
PDF Full Text Request
Related items