Font Size: a A A

Model Construction And Visual Analysis For Big Data Industry Classification And Industry Chain

Posted on:2022-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z SunFull Text:PDF
GTID:2518306773497764Subject:Information and Post Economy
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology today,the big data industry has seen explosive growth.The use of modeling techniques to analyze the current state of industrial development has become an important research topic at the intersection of industrial economics and computing.However,the modeling analysis of big data industry is still facing several problems that need to be solved.First,there are many industrial classification standards,and the industrial classification system is rather vague.Second,the real production and operation data sets of enterprises are scarce,and the previous analysis models can only be trained on the surface data sets of small-scale enterprises,which usually have large errors when applied in real scenarios.Third,in the face of data with many features and unbalanced samples of a few categories,the traditional decision tree algorithm,and random forest algorithm are subject to large interference,and the prediction accuracy on the test set is usually large.The prediction accuracy on the test set is not satisfactory.Fourth,the output of the model analysis is often a large number of empty values,which cannot intuitively represent the deep association of data.Therefore,this paper uses industrial economics theory and big data modeling technology to propose a new industry classification system,establish a big data industry classification model and an industry chain model,and provide an intuitive visual analysis system for the big data industry.The specific contents are as follows:(1)This paper researches academic literature and research reports related to big data industry,and combines theories related to industrial economics,proposes a three-level classification system for big data industry? adds the missing information in the original data through network search,adds four categories of ”industry category”,”position in the industry chain”,”big data talent recruitment scale” and ”AI talent recruitment scale”,and maintains a data set of enterprise information.(2)This paper proposes the Improved Decision Tree(IDT)algorithm.Before tree construction,a feature pre-filtering strategy is set up to eliminate features with low relevance to the category.During the construction process,the feature relevance weight is calculated as the preference coefficient of each feature,and it is used as the criterion for feature set classification.The experiments prove that IDT has good reliability and the prediction accuracy of the industry classification model reaches 94.733%.(3)This paper proposes the Improved Random Forest(IRF)algorithm.In the sampling session,the algorithm establishes a few sample size lower bounds to ensure that each base classifier learns a certain number of positive samples.The classification performance of each base classifier is examined on out-of-bag samples and validation sets,and voting weights are assigned to the base classifiers based on the scores of evaluation metrics.The experiments prove that the IRF algorithm has a good classification performance and the prediction accuracy of the industrial chain classification model reaches 91.247%.(4)In this paper,we design and implement a big data visualization and analysis system.The system is built with the mainstream framework,integrates the two types of industrial analysis models in the previous section,and uses the ECharts component to visualize the model prediction results.It also provides several services such as data set management,so that the system has good functional completeness and scalability.
Keywords/Search Tags:Big Data Industry, Random Forest Algorithm, Feature Engineering, Industry Chain Analysis, Visualization Analysis System
PDF Full Text Request
Related items