Font Size: a A A

Research And System Design Of Structured Data Extraction Method Of Resume

Posted on:2022-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2518306563961929Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Most resumes exist in the form of unstructured text,and there are a large number of them.It has a wide range of uses to accurately extract structured information from this kind of resume text,which can provide the basis for information retrieval,association analysis,data matching,and many other upstream applications.Resume information extraction methods are mostly based on rules and templates,which extract specific information by manually customizing rules.In the case of a large amount of data,such methods have the problems of high cost,low efficiency,and poor flexibility.Although the traditional machine learning method can reduce the labor cost to a certain extent,it relies too much on Feature Engineering.Aiming at these problems,this paper uses the deep learning method to build a resume information extraction model and designs the corresponding system.This paper is supported by the national key research and development program "Research on collaborative support technology of trial execution and litigation service through internal and external links"(2018YFC0831300).The main work of this paper is as follows:(1)This paper proposes a resume entity annotation model based on a dynamic word vector.Aiming at the problem that traditional word embedding can't interpret the polysemy of a word,the dynamic word vector is used as the representation of a word.Based on the improved BERT version,a BERT(wwm)-BiLSTM-CRF model is proposed to recognize the entity information in Chinese resume text.This method makes full use of the advantages of the pretraining model of BERT,which can directly import the published BERT to obtain the word vector with deep semantic features and can make the model converge faster without too much data training.It is implemented by using the best base version.The semantic features of sentences are obtained by using the bidirectional long short-term memory,and then the entities in the text are extracted by using the constraint of the conditional random field.(2)This paper proposes a resume relation extraction model based on sentence-level features and entity features.After the word vector is generated by the BERT model,the convolution neural network is used to extract the sentence level features,the entity location information is used to obtain the entity features,and then the attention mechanism is introduced to highlight the entity information.The baseline model Glo VeCNN was set,and the proposed BERT-ACNN model was compared with the other three models Glo Ve-CNN,Glo Ve-ACNN,and BERT-CNN.The experimental results show that the F1 value of the proposed model is 2.8% higher than that of the baseline model.(3)A Chinese resume information extraction system is designed.Based on the proposed method,the framework of resume information extraction system is designed,including input processing,data analysis,and other functions.The extracted results are stored in the graph database,and the data analysis results can be visualized through d3.js.
Keywords/Search Tags:Resume Extraction, Pre-trained Model, Sequence Labeling, Attention Mechanism
PDF Full Text Request
Related items