Research On Regression Acceleration Algorithm Based On Partition And Sampling

Posted on:2019-10-21

Degree:Master

Type:Thesis

Country:China

Candidate:E J Liu

Full Text:PDF

GTID:2428330551458743

Subject:Computer application technology

Abstract/Summary:

With the rapid development of Internet technology,the data from various industries are growing at an unprecedented speed,prompting us to enter a mass data era.In the face of massive data,traditional data mining technology has suffered from severe challenges in execution efficiency.Therefore,exploring fast and effective accelerated learning algorithm is extremely important.At present,the existing accelerated learning algorithm's research is mainly focused on large-scale data classification,regression is as important as classification,but the research on regression is relatively small.Therefore,this paper takes partition and sampling as the research strategy,aiming at the problem of low efficiency of the traditional regression analysis algorithm.Specifically,the main results are summarized as follows:(1)Using the idea of divide-and-conquer,this thesis proposes an accelerated algorithm of kernel ridge regression based on data partition.Firstly,a cluster of parallel hyper planes is used to divide the space of the current data into several disjoint regions,and then the kernel ridge regression model is trained on each region.Finally,this paper uses every kernel ridge regression model to predict the cases to be identified in the same area.The experimental results show that the algorithm greatly improves the efficiency of the algorithm,and provides a feasible scheme for the study of the regression acceleration algorithm.(2)Aiming at the problem of low efficiency of the kernel matrix computation in regression algorithm,this thesis proposes a kernel matrix approximation algorithm based on two-phase sampling.Firstly,we use clustering algorithm to divide the data into blocks,on the strategy of sampling to calculate the low rank approximate matrix of the kernel matrix obtained from each block,then construct the measure of mutual contribution between blocks and blocks based on tag information,subsample part of non-diagonal block kernel matrix.Finally,using the diagonal block low rank matrix to approximate the non-diagonal block kernel matrix.Experiment reasons show that the algorithm can greatly reduce the computation of the kernel matrix approximation,improve the efficiency of the regression algorithm,and provide a new research idea for the research of the efficient regression acceleration algorithm based on the low-rank matrix approximation in the large-scale data background.Aiming at the problem of low efficiency in dealing with massive scale data by regression analysis,this thesis proposes two regression acceleration algorithms,which can improve both the efficiency and the predictive performance of the algorithm.The research results of this thesis provide a new strategy for regression analysis in large-scale data environment,and further enrich the research content of regression acceleration algorithm.

Keywords/Search Tags:

Regression analysis, Data partition, Sample, Matrix approximation

Related items

1	Study On Matrix Regression Based Image Classification
2	A Distributed Data Management System For Data Analysis
3	Indirect Light Field Adaptive Partition And Light Field Matrix Completion Algorithm
4	Research On LED Color Gamut Transformation Based On Polynomial Regression And Approximation
5	Research On Virtual Sample Generation Technology Based On Quadrat And Quantile Regression
6	Study On The Improvement And Application Of Low Rank Matrix Approximation Model
7	The Research Of Accelerated Learning Algorithms Based On Partition And Condensation
8	Regularized Regression Learning Algorithms With Unbounded Sampling
9	Research On Kernel Matrix Learning And Approximation Algorithms In Kernel Methods
10	The Study Of Complex Data Processing Method Based On Classification