Font Size: a A A

On Cloud Classification Academic Search Engine

Posted on:2018-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:P P ZengFull Text:PDF
GTID:2348330536484692Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
With the development of Internet information,search engine technology has become more and more mature.Massive web information,good and bad quality of the page,for the need for professional users of academic information will undoubtedly reduce the searc h experience.At present,with the academic search engine,has become a hot research.But the existing academic search engine,there are more or less disadvantages,such as can not download the original journal free,the user can not recommend the source of the journal and so on.And some small academic search engine using centralized architecture,the system requirements on the host is relatively high,if the host fails,may cause the entire network to stop working.Based on satisfying the actual needs of the users,this paper designs a cloud search engine,which is an academic,free access to the original,to meet the user's personal preferences,can be classified search and stable service.This paper first introduces the related technology of cloud search engine,analyzes and researchs the Hadoop distributed computing platform and the open source search engine Nutch.Secondly,this paper analyzes and obtains the URL of the free journal website,and sets up the database of the search engine,at the same time to meet the needs of users based on personal preference to recommend the source of the journal and download the original journal free;then designs and implements the function of distributed network information acquisition,and use IK-Analyzer to process the crawled web pages;then we will use the vector space model(VSM)to carry on the academic judgment;finally,the Chinese Library Classification and Naive Bayesian classification algorithm are used to realize the function of academic webpage classification.In addition,after the test results of this experiment,it is proved that the search engine has high accuracy and fast retrieval,which can meet the needs of users wanting to obtain free professional academic journals and customize the source of the journal,which shows that the search engine has a very important value.
Keywords/Search Tags:Search engine, Web crawler, Vector Space Model, Naive Bayesian algorithm
PDF Full Text Request
Related items