Font Size: a A A

Research On Skill Word Extraction In Chinese Online Recruitment Text

Posted on:2021-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:P YangFull Text:PDF
GTID:2518306554966159Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of higher education in China,the number of college graduates is also increasing.Although the number of jobs is increasing,the structural problem of supply-demand mismatch in China's labor market is still very serious.Nowadays,with the popularity of the Internet,online recruitment has become the main way for enterprises to recruit talents.The skill words listed in the recruitment information provide the possibility to understand the enterprise's demand for talents in real time and accurately.In this thesis,skill words extraction is transformed into sequence tagging problem,and the methods of named entity recognition or term extraction are used for reference.However,due to the complexity of Chinese semantics and context and the high cost of manual annotation,it is not easy to automatically extract skill words from recruitment texts.At present,deep neural network has become the mainstream method to solve the problem of sequence tagging.However,this type of method focuses on supervised learning in the domain and needs a lot of labeled data.On the one hand,for online recruitment data,manual annotation takes time and effort.Therefore,only a few sentences can be manually annotated by domain experts.On the other hand,this method relies on neural network to extract features completely,neglects the corpus features in the field,and cannot make full use of the domain knowledge.In addition,for the difficulty of lacking enough annotation data,a better method should use transfer learning to help improve the recognition performance of skill words by annotation data from other domains.However,the existing transfer methods based on deep learning require that the source domain and the target domain have the same label set or the same label meaning.It's also a challenge that transfer the knowledge learned from the source domain to the target domain.Therefore,in view of the above shortcomings and difficulties,this thesis carried out two studies:(1)This study is based on the classical model Bi-LSTM-CRF of sequence annotation.In order to make full use of domain knowledge,corpus features are added to its input layer.The output of the input layer and the output of the Bi-LSTM layer are spliced as the input of the CRF layer.A large number of experimental results show that the method of skill word extraction in this study is reasonable,and the added corpus features can improve the accuracy of skill word extraction.(2)In order to solve the problem of lack of enough annotation data,this study proposes a cross domain transfer learning method for skill word extraction.Firstly,it decomposes the source domain corpus into three sub source domains.Secondly,a domain adaption layer is inserted between the Bi-LSTM layer and CRF layer.It can help transfer the knowledge learned from each source domain to the target domain.Then the parameter transfer method is used to train each sub model.Finally,the prediction of marker sequence is obtained by majority vote.A large number of experimental results show the rationality of this research method,which can alleviate the scarcity of manual annotation data.The innovations in this thesis are as follows:1)In this study,we propose a skill word extraction algorithm based on the combination of deep learning and corpus features.2)This study proposes an algorithm of skill word recognition in online recruitment text based on cross domain transfer learning.3)A corpus of recruitment texts for IT industry is established.
Keywords/Search Tags:Online Recruitment, Transfer Learning, Deep Learning, Skill Words, Parameter Transfer
PDF Full Text Request
Related items