Font Size: a A A

Research On Regression Acceleration Algorithm Based On Partition And Sampling

Posted on:2019-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:E J LiuFull Text:PDF
GTID:2428330551458743Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the data from various industries are growing at an unprecedented speed,prompting us to enter a mass data era.In the face of massive data,traditional data mining technology has suffered from severe challenges in execution efficiency.Therefore,exploring fast and effective accelerated learning algorithm is extremely important.At present,the existing accelerated learning algorithm's research is mainly focused on large-scale data classification,regression is as important as classification,but the research on regression is relatively small.Therefore,this paper takes partition and sampling as the research strategy,aiming at the problem of low efficiency of the traditional regression analysis algorithm.Specifically,the main results are summarized as follows:(1)Using the idea of divide-and-conquer,this thesis proposes an accelerated algorithm of kernel ridge regression based on data partition.Firstly,a cluster of parallel hyper planes is used to divide the space of the current data into several disjoint regions,and then the kernel ridge regression model is trained on each region.Finally,this paper uses every kernel ridge regression model to predict the cases to be identified in the same area.The experimental results show that the algorithm greatly improves the efficiency of the algorithm,and provides a feasible scheme for the study of the regression acceleration algorithm.(2)Aiming at the problem of low efficiency of the kernel matrix computation in regression algorithm,this thesis proposes a kernel matrix approximation algorithm based on two-phase sampling.Firstly,we use clustering algorithm to divide the data into blocks,on the strategy of sampling to calculate the low rank approximate matrix of the kernel matrix obtained from each block,then construct the measure of mutual contribution between blocks and blocks based on tag information,subsample part of non-diagonal block kernel matrix.Finally,using the diagonal block low rank matrix to approximate the non-diagonal block kernel matrix.Experiment reasons show that the algorithm can greatly reduce the computation of the kernel matrix approximation,improve the efficiency of the regression algorithm,and provide a new research idea for the research of the efficient regression acceleration algorithm based on the low-rank matrix approximation in the large-scale data background.Aiming at the problem of low efficiency in dealing with massive scale data by regression analysis,this thesis proposes two regression acceleration algorithms,which can improve both the efficiency and the predictive performance of the algorithm.The research results of this thesis provide a new strategy for regression analysis in large-scale data environment,and further enrich the research content of regression acceleration algorithm.
Keywords/Search Tags:Regression analysis, Data partition, Sample, Matrix approximation
PDF Full Text Request
Related items