Design And Implementation Of An Automatic Resume Acquisition And Information Extraction System

Posted on:2021-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:Q M Song

Full Text:PDF

GTID:2518306107450124

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous development and progress of computer technology,companies use the Internet to publish recruitment information on major recruitment websites,and job seekers choose corresponding positions to apply according to their own circumstances.Enterprises are increasingly facing a huge number and a wide variety of electronic resume documents,and relying solely on manual selection and entry of resume information is not only inefficient,but also takes a lot of time and energy.Therefore,it is of great practical significance to study how to use existing computer technology to download all kinds of resume documents required by enterprises in batches from various recruitment websites,and to quickly and accurately extract the important information that enterprises pay attention to for structured storage.In response to the massive resume information stored on major recruitment websites,we use web crawlers and other related technologies to download the resume documents required by the enterprise in batches,reducing the tedious work of personnel recruitment specialists for manual resume screening.Analyze the characteristics of the semi-structured Chinese resume text in structure,content and level,combined with a certain theoretical basis of text classification,put forward a method of dividing resume text by content category and by hierarchical structure,and divide the resume content There are six pre-defined general modules for personal basic information,job search intention,educational background,work experience,project experience,and award-winning experience.According to the regular characteristics of resume,use artificial prior knowledge to build the dictionary to be extracted.In the process of extracting the information of each module of the resume,a scheme for extracting information based on basic information and complex information was proposed.In the process of recognizing the resume content,combining the simple lexical and grammaticalfeatures of the basic information class in the resume,a regular expression that needs to be extracted is established to extract the information.Extract the complicated information categories such as education experience and work experience in the resume text,analyze the main features,and use the recognition methods based on dictionary matching,statistical hidden Markov model,deep learning BLSTM-CRF model to extract and Perform comparative analysis.The combination of Python and Java programming languages is used to achieve batch download and content identification of resume texts,and to extract and test quantitative resume texts.The results can meet the needs of enterprises for resume screening and information extraction and structured storage.

Keywords/Search Tags:

Semi-structured resume, web crawler, dictionary matching, statistical model, BLSTM-CRF

PDF Full Text Request

Related items

1	Information Extraction For Semi-structured Chinese Resume
2	Research And System Design Of Structured Data Extraction Method Of Resume
3	Research On Large-scale Structured And Semi-structured Biodata Query Method
4	Research Of Schema Extraction Algorithm Of Semi-structured Data Based On OEM Model
5	Research On Semantic Information Extraction For Semi-structured Documents
6	Research On The Storing And Querying Of Semi-Structured Data On The Web
7	Semi-structured Data Query Based Model-Checking And Its Application
8	Research On Structured Information Extraction Based On Pattern Matching
9	Semantic Retrieval Of Semi-Structured Text Based On Ontology Concept
10	Research Of Semi-structured Data Storage Technology On Xml