Font Size: a A A

Design And Implementation Of Vertical Search Engine System For Recruitment

Posted on:2020-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:C FangFull Text:PDF
GTID:2518306104498604Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,there is a vast amount of information in all walks of life on the Internet.In the field of recruitment,a variety of recruitment websites provide a large number of recruitment information to the candidates,but also bring them some problems.Because the data of various recruitment websites are closed to each other,candidates need to browse dozens of recruitment websites to collect relevant recruitment information comprehensively,which brings great inconvenience to candidates.Therefore,it is necessary to build a vertical search engine system for recruitment field by using related technologies.The main work of this paper is to implement a vertical search engine system for recruitment field by using related technologies.The system includes crawler module,index module and retrieval module.Among them,the web crawler module is to improve the shark search algorithm to develop a web crawler strategy,so that the web crawler can filter out the links unrelated to the theme.After collecting the theme related web pages,HTML parser is used to extract the structured data from the web pages and store it in the database.For the system needs to store massive recruitment data,the database uses HBase to store.HBase database makes use of the characteristics of distributed,it is easy to solve the storage capacity through horizontal expansion.The web crawler framework adopts JLitespider,which is a crawler framework developed by java language,with the characteristics of light weight and distributed.In the index building module,word segmentation is used to segment the recruitment data stored in HBase,and then Lucene is used to build the inverted index.In the retrieval module,by studying the scoring mechanism and Rocchio algorithm of Lucene full-text retrieval,the default sorting results of Lucene are re sorted.Rocchio is used to sort the original result set twice,so that the recall rate of the result set is higher than that of the original result set,which is also in line with the purpose of obtaining complete recruitment information as much as possible.The vertical search engine for the recruitment field integrates the recruitment information of the whole network,then uses Lucene to build index library,and optimizes the original retrieval results of Lucene,which brings great convenience to the candidates to obtain the recruitment information,so that the candidates can focus more on the review of professional content and interview preparation.Finally,the vertical search engine in the recruitment field is tested systematically,and each functional module meets pre-requirement design.
Keywords/Search Tags:Vertical search engine, Web crawler, Full text search, Sorting algorithm, Recruitment
PDF Full Text Request
Related items