Font Size: a A A

Based On The Information's Acquisition And Cleaning System Of Unstructured Recruitment

Posted on:2018-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2348330515965426Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet to break the traditional geographical restrictions,corporate recruitment from the line transferred to the Internet,the emergence of a number of large online job search platform,such as Zhaopin recruitment,pull hook network,future worries and so on.However,due to the non-structural recruitment of information is not uniform,the lack of one-stop collection,extraction and cleaning program,job seekers difficult to find the right information in the mess of recruitment information.The collection,extraction and cleaning of unstructured recruitment information,the formation of a unified structure of the data set,has become an urgent need to study the subject.The research direction of this paper is for the IT industry's unstructured recruitment information,data collection,extraction and cleaning.The paper is arranged as follows:(1)Unstructured data collection: A brief introduction to the development of reptile technology and Scrapy framework to crawl the basic principles of data,and then use the Scrapy framework based on unstructured recruitment information,the data stored in the Mongodb database.(2)Data extraction: The Aho-Corasiek algorithm is used to extract the keywords of the collected unstructured data,to convert the unstructured data into structured data,and then re-store the data into the Mongodb database.(3)Data cleaning: First of all,the use of SQL statements on the structured data pretreatment,and then cleaned,due to the existing basic Skyline algorithm to clean the data efficiency is low.In this paper,the basic Skyline algorithm to improve,using the improved Skyline algorithm to clean the data,and finally after the cleaning of the recruitment information generated a variety of two-dimensional charts.System integration data acquisition,keyword extraction and data cleaning three functional modules,the front show job information of the various two-dimensional statistics,the background to complete the data real-time analysis and collection,keyword extraction,data cleaning and expansion of data interfaces and other functions.
Keywords/Search Tags:Scrapy framework, Mongo Db Database, Aho-Corasick algorithm, Skyline algorithm
PDF Full Text Request
Related items