Font Size: a A A

Mining The Needs Of Data Science Talents

Posted on:2021-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2438330623972309Subject:Mathematical Statistics
Abstract/Summary:PDF Full Text Request
Following the country's implementation of the big data strategy and accelerating the pace of building a digital China,the development in the field of data science has entered a new and vibrant era,and the demand for data science talents in all walks of life is increasing.In order to meet the needs of social and economic development,many talent training units have newly added undergraduate majors or degree programs for data science talents.In order to further improve the quality and adaptability of data science talents,in-depth analysis of the needs of data science talents must be conducted.The main idea of this article is to use the demand information of data science talents crawled from the recruitment website,through a variety of text clustering methods and the construction of text subject word extraction models,to present and analyze the core content of the demand in the form of network relationship visualization Realize the full excavation of the demand for data science talents,and provide a reference basis for the personnel training unit to formulate the training mode of data science talents.The specific work and conclusions of the paper are summarized as follows:1.Collect and organize data.Use Python to crawl the recruitment information data of Zhilian recruitment and data science talents on the future worry-free website,and perform preprocessing operations on the structured data in the original data,such as missing values and outliers,and also unstructured text.The data uses text clauses,Chinese word segmentation,deleting stop words and special characters to preprocess text data,and finally makes it a data type that conforms to the model establishment and analysis selected in this article.2.Descriptive statistical analysis of data.Descriptive statistical analysis of the structured data of data science talent needs from six aspects: company industry,city location,company category,company size,work experience and academic requirements.By observing the descriptive statistical analysis results of the above indicators,we have a preliminary understanding of the current status of the demand for social data science talents.3.Use the clustering method to mine the needs of data science talents.First,use K-means,GMM,NMF three clustering methods to mine unstructured data of data science talent demand information;then quantify and analyze the professional skills of talents on the basis of text clustering;The comparison of three kinds of angles,such as the effect of class,operating efficiency and RAND index,enables the evaluation and analysis of the clustering results and methods in this paper.4.Build subject word extraction models to analyze the needs of data science talents.First,through the selection of the number of topics in the LDA topic model,the initial topic word set for data science talent needs is extracted;then on this basis,the word2 vec word vector model is introduced to optimize and expand the extraction of topic words;finally,the expanded topic words will be optimized The co-occurrence relationship between the two is transformed into the co-occurrence matrix of thesubject words,and the network relationship of the subject words is visually presented from the four aspects of educational background,work experience,professional knowledge skills and personal qualities through gephi software to realize the data science talents.Specific research and analysis of needs.5.Based on the summary analysis of the previous research work,with social needs as the guide,from the perspective of individual students and colleges and universities and other training institutions,specific opinions and suggestions are given on the formation and cultivation of data science talents.The personal recommendation is mainly to allow students to study the school curriculum clearly and conscientiously to build a more complete basic theory system of data science,actively participate in practical internships and other activities to master data science practical skills and data science application technology.In terms of talent training units such as colleges and universities,specific suggestions are made for improving the data science talent system from three aspects: training model,curriculum setting and comprehensive quality ability training.
Keywords/Search Tags:Data science, talent demand, text clustering, LDA topic model, gephi visualization
PDF Full Text Request
Related items