Font Size: a A A

Design And Implementation Of Resume Information Extraction Ystem Based On Domain Knowledge Base

Posted on:2019-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2348330545958407Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Resume is a job seeker written description of their own situation,although there are certain characteristics in the structure,there are some content in the specification,but a variety of forms.So for recruiters,manual reading,recording and filtering resumes often cost a tremendous amount of work.Therefore,it is necessary to use information extraction technology to extract structured and valuable information from the free-form resume text,which can greatly simplify the resume analysis and construct an effective talent pool around the entity and event information in the resume so as to facilitate the talent matching,searching and filtering of resumes.Based on the brief introduction of the related technology of information extraction,this paper clarifies the demand and function design of resume extraction according to the actual needs,deeply studies the core technology solutions of resume information extraction,and implements a complete resume information extraction system and the following aspects of work:(1)Collect information from Internet resources such as Wikipedia and recruitment websites for collation,and build an enterprise name knowledge base,equivalent name knowledge base etc.(2)Trigger word matching algorithm is used in conjunction with Word2vec word vector to expand thesaurus to implement the segmentation of the resume information according to the structure characteristics.Trigger word matching algorithm is used in conjunction with Word2vec word vector to expand thesaurus to achieve the structure of the resume information block.For resumes that do not contain triggers,the resumes are expressed as eigenvectors,and the SVM classification algorithm is used to implement resume segmentation based on content features.(3)Comparative analysis the principle and application effect of Hidden Markov Model(HMM),Maximum Entropy Model(ME)and Conditional Random Field Model(CRF)which introduce domain knowledge in the named entities recognition of resume,select the optimal statistical model to achieve entity information extraction in various categories of resume block.(4)Proposing a backtracking strategy of resume information extraction.The rules matching method based on knowledge base was used to complete the results of entity recognition based on statistical methods.At the same time,identify some event information in sequence of entities.(5)The Elasticsearch distributed search engine is used to filter and search resume extraction results.In addition,using Zend framework,Echarts and other WEB related technology to achieve the resume information extraction data visualization and other business layer functions,so that it has a more practical value,enabling business recruiters to efficiently handle resumes.Based on the above work,this paper carried out a series of functions and performance tests on the resume information extraction system.The results show that system can automatically extract structured information from the resume texts and establish a job seeker database,and for most entities can achieve the expected results,illustrate the effectiveness of the proposed block citation scheme and entity extraction scheme in this paper.At the same time the system also provides users with resume management,filtering and retrieval capabilities to improve the efficiency of resume processing.
Keywords/Search Tags:resume information extraction, domain knowledge base, text categorization, named entity recognition, backtracking strategies
PDF Full Text Request
Related items