Font Size: a A A

Portrait Analysis Of Data Practitioners Based On Machine Learning Model

Posted on:2022-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:C M LiFull Text:PDF
GTID:2517306527452304Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet industry and information technology,the era of massive data has arrived.Massive data has brought rapid development of artificial intelligence.The rapid development of computer technology provides assistance to the popularization of artificial intelligence algorithms;artificial intelligence-based machines learning and deep learning have been widely used in the Internet,finance,communications,and real estate industries,bringing the rise of related majors such as mathematics,statistics,and computing.A large number of talents mastering massive data processing technologies have flooded into the recruitment market.At the same time,there are huge gaps in the recruitment market,those gaps not only cause labor market pressure but also decline in productivity of data mining and artificial intelligence talents.In the context,this is not only a concern for data practitioners,but also a key point for further thinking by universities,society and the country.Therefore,the analysis of artificial intelligence labor market demand and corresponding data practitioners is particularly important.This article analyzes the 23,859 electronic questionnaire data of artificial intelligence technology-related talents which is collected on the Kaggle in November2018.There are 395 questions in this questionnaire.The question setting includes not only the basic information of the respondent and the company information,but also the big data processing related information such as machine learning or deep learning.The main content of this article is as follows.First,for an imbalanced data set with missing values,a random forest combined with the SMOTE algorithm is used to fill in the missing values;then,based on the Light GBM model,a data practitioner's ability attribute model and potential attribute model are established,and based on the results of the two models,divide all data practitioners into four categories: high-energy highpotential people,low-energy high-potential people,high-energy low-potential people,and low-energy low-potential people;finally,based on the SHAP function,determine the ability and potential attributes of different groups of people,and then get the corresponding population The user labeling system and the establishment of user portraits on this basis can not only help data practitioners choose appropriate programming languages and algorithm,but also provide references for data job recruiters in the assessment standards.
Keywords/Search Tags:User portrait, SMOTE, Random Forest, LightGBM, SHAP
PDF Full Text Request
Related items