Font Size: a A A

Research And Implement Of Hyperparameters Optimization For High Dimensional Sparse Data

Posted on:2020-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LiFull Text:PDF
GTID:2428330590973245Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of data explosion,manufacturing,finance,educatio n,health and other industries,the distribution of data is very different from before.In many scenarios,data is becoming sparse and scattered.Faced with the data mini ng problem of high-dimensional sparse data,machine learning algorithms can be faster and better than manual analysis.In the actual business,all walks of life,due to their own needs,while applying academic results,also need to adapt the algorithm model to get better business indicators or production results.In the process of analyzing high-dimensional sparse data,the reasona b le application of machine learning model can make data analysis more accurate and data mining more effective.In the actual business,according to the specific distribut io n characteristics of the data,the selection of the machine learning model and the setting of the model hyperparameter are the key.In this paper,based on the high-dimensio na l sparse data in the specified scenario,the data analysis processing results and Bayesian optimization algorithm based on multiple algorithms are used to realize and improve the automatic establishment and tuning process of the machine learni ng model.The main research content of this paper has three parts.The first is to deal with high-dimensional sparse data and target model selection.Based on a variety of data analysis indicators,this paper replaces qualitative analysis with quantitative analys is,analyzes data distribution characteristics,fills data based on SVD collaborat i ve filtering method with bias term,reduces dimensionality based on integrated tree model,and selects appropriate target model.Subsequent modeling and tuning.The second is to establish a proxy function to predict the distribution of the real regress io n model between the hyperparameter and performance of the target model.After comparing the prediction results of various parallel tree models,the proxy funct io n is established based on the random forest algorithm,and the proxy function model structure is adjusted by the AIC Akaike information criterion based on the regress io n tree.Finally,this paper proposes a super-parameter tuning framework SMAC-T based on Bayesian optimization algorithm.For the target model,based on the improved accuracy of the agent function,the traditional Bayesian optimizat io n algorithm is added to the simulated annealing factor,combined with the advanta ge s of Bayesian probability and heuristic algorithm to accelerate the search efficiency of the optimized solution.Optimize the quality of the solution.Through experimental comparison,the quantitative data analysis results more accurately describe the data distribution.For the data processing operation of highdimensional sparse data,the noise in the data features is effectively reduced,and the prediction effect of the classifier is improved by no less than 10%.The structure of the agent function model is adjusted by using the Akaike information criterion based on the regression tree,and the prediction effect and generalization ability of the agent function are effectively balanced.Based on the improved hyperparameter tuning algorithm,it is possible to find optimized configurations faster within a certain time limit.Compared with the SMAC optimization framework,the average performa nc e of the configuration is greatly improved.Compared to the target model defaul t configuration,the average performance of the tuned configuration is increased by more than 10%.
Keywords/Search Tags:High-Dimensional Sparse Data, Data Mining, Hyperparameters Optimization, Bayesian Optimization Algorithm
PDF Full Text Request
Related items