Font Size: a A A

Recognition And Analysis Of Skilled Words In Chinese Online Recruitment Corpus

Posted on:2022-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q MaoFull Text:PDF
GTID:2518306554470314Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the structural contradiction of mismatch between supply and demand has appeared in my country's talent market.Improving the matching degree of higher education talent training and labor market demand has become an important means to solve the mismatch between supply and demand.At present,online recruitment has become the main way for companies to recruit talents.Extracting,identifying and analyzing the skill word information contained in the online recruitment corpus can directly and effectively understand the company's job requirements for recruiting talents,so that universities can further improve the pertinence of talent training,thereby effectively alleviating the contradiction between supply and demand.This article first uses the long and short-term memory deep neural network to identify skill words,and secondly,for the identified skill words,further use the LDA(Latent Dirichlet Allocation)topic model for data analysis to understand the characteristics of the needs of enterprises and regions for talent skills.And the relationship between skills mastery and salary.Therefore,it is of great practical significance to identify and analyze the skill words in online recruitment advertisements.At present,the mainstream method for named entity recognition and term extraction is to use deep neural networks for extraction.This type of method focuses on supervised learning in professional fields and requires a large amount of labeled data.The semantics of the Chinese recruitment corpus is changeable,the sentences are not standardized,the context is more complicated,and there is a lack of sufficient annotation data.How to rely on a small amount of labeled data and a large amount of unlabeled data to establish a semi-supervised learning model so as to perform effective skill word recognition has brought great challenges.In addition,how to use the topic model of LDA(Latent Dirichlet Allocation)to identify the hidden topic information contained in skill words to achieve in-depth analysis and visual presentation of online recruitment corpus is also extremely challenging.In response to the above challenges,this article tried two studies:(1)Aiming at the difficulty of lack of annotated data,this paper proposes a method of skill word recognition based on semi-supervised learning model.It is based on the classic model of sequence labeling Bi-LSTM(Bidirectional Long Short Term Memory),introduces the MMNN(Max-Margin Neural Network)model,combines the prediction results of Bi-LSTM with the dependencies learned by MMNN,and establishes The semi-supervised learning model of sentence confidence is jointly trained on the basis of a small number of labeled samples and a large number of unlabeled samples.The experimental results show the rationality of this research method,and the introduction of a semi-supervised learning model can effectively alleviate the scarcity of artificially labeled data.(2)Aiming at the potential subject information contained in skill words,this paper builds an IT skill word dictionary based on the identified skill words,combined with machine learning methods and expert judgment.Then,according to the topic model of LDA(Latent Dirichlet Allocation),extract the topic information from the skill words,divide the skill word sets,and further construct the relational topic matrix,concrete the abstract information,and compare it from the perspectives of employer,work area,salary,etc.Perform statistical analysis on the extracted subject information,and present the analysis results in visual ways such as word cloud diagrams and Sankey diagrams.
Keywords/Search Tags:online recruitment, semi-supervised learning model, deep learning, LDA topic model, visual analysis
PDF Full Text Request
Related items