Font Size: a A A

Design And Implementation Of A Recruitment Information Vertical Search Engine System

Posted on:2015-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:J GuiFull Text:PDF
GTID:2308330452956886Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Today has witnessed the rapid development of information technology as well as theexplosion of web sites. It’s possible to claim that the internet world has entered the era ofbig data. With the mature development of general search engine, the vertical search enginehas appeared for sufficing the field of specialty to meet the ever-increasing demand.Although there have been some successful models of vertical search engines, thetopic crawlers of vertical search engines are far from prefect. Better algorithms are neededto improve the accuracy of search results.In this paper, the author will give acomprehensive analysis of related technology applied for vertical search engine for eachmodule, and put forward an improved algorithm according to the topic of the hybridmodel on the basis of an in-depth understanding and analysis of the existing verticalsearch algorithm.With the use of predictive factors relevant to the subject, this algorithmalso combines the potential factors relevant to the subject and the page quality analysisfactors which play an important role in topic URL prediction and sorting. As a result, thetopic crawlers have the priority to crawl the topic pages with an improvement of its workefficiency. Ranking algorithm of commercial search engine does not open. What’s more,page sorting is conditioned by many factors, including commercial bidding.Ranking algorithm for commercial search engine is not open source, pages are sortedby many factors, including the nature of the business with PPC. Nutch is an open sourceframework for general-purpose web crawler framework that provides basic functionalityfor web crawling and can use its plug-in mechanism to be extended and customized. Solris an open source index-based server based on Lucene, and can provide a good indexbuilding function. Both Nutch and Solr are open and transparent for internal algorithm,designed to break current situation that search engines are nearly colonized by largecompanies, and to provide high-quality search results. This paper designs and implements a Recruitment Information vertical search engine system based on Nutch and Solrframework, aiming at providing information in the field of professional recruitment searchresults. The system modified scoring algorithm for Nutch crawler theme pages by addingthe new plug for page score, sorting crawing URL by using themes related predicationalgorithms based on the hybrid model, preprocessing web documents by configuringIKAnalyzer for Solr. Front-end interact with users through Structs2framework.
Keywords/Search Tags:Recruitment Information, Focused Crawler, Nutch Framework, Solr Framework
PDF Full Text Request
Related items