Font Size: a A A

Design And Implementation Of An Automatic Resume Acquisition And Information Extraction System

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q M SongFull Text:PDF
GTID:2518306107450124Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development and progress of computer technology,companies use the Internet to publish recruitment information on major recruitment websites,and job seekers choose corresponding positions to apply according to their own circumstances.Enterprises are increasingly facing a huge number and a wide variety of electronic resume documents,and relying solely on manual selection and entry of resume information is not only inefficient,but also takes a lot of time and energy.Therefore,it is of great practical significance to study how to use existing computer technology to download all kinds of resume documents required by enterprises in batches from various recruitment websites,and to quickly and accurately extract the important information that enterprises pay attention to for structured storage.In response to the massive resume information stored on major recruitment websites,we use web crawlers and other related technologies to download the resume documents required by the enterprise in batches,reducing the tedious work of personnel recruitment specialists for manual resume screening.Analyze the characteristics of the semi-structured Chinese resume text in structure,content and level,combined with a certain theoretical basis of text classification,put forward a method of dividing resume text by content category and by hierarchical structure,and divide the resume content There are six pre-defined general modules for personal basic information,job search intention,educational background,work experience,project experience,and award-winning experience.According to the regular characteristics of resume,use artificial prior knowledge to build the dictionary to be extracted.In the process of extracting the information of each module of the resume,a scheme for extracting information based on basic information and complex information was proposed.In the process of recognizing the resume content,combining the simple lexical and grammaticalfeatures of the basic information class in the resume,a regular expression that needs to be extracted is established to extract the information.Extract the complicated information categories such as education experience and work experience in the resume text,analyze the main features,and use the recognition methods based on dictionary matching,statistical hidden Markov model,deep learning BLSTM-CRF model to extract and Perform comparative analysis.The combination of Python and Java programming languages is used to achieve batch download and content identification of resume texts,and to extract and test quantitative resume texts.The results can meet the needs of enterprises for resume screening and information extraction and structured storage.
Keywords/Search Tags:Semi-structured resume, web crawler, dictionary matching, statistical model, BLSTM-CRF
PDF Full Text Request
Related items