| In the era of big data,with the development of data generation,collection and storage technology,large-scale data with large samples and high-dimensional features will emerge in large numbers.This brings opportunities to explore objective laws and challenges to statistical analysis.In statistical methods,quantile regression is often used to reflect the heterogeneous influence of explanatory variables on the whole conditional distribution of response variables.It is one of the important means and methods to explore objective laws.Commonly used statistical software can perform quantile regression,but due to the limitation of computing memory and running time,quantile regression of large-scale data is often difficult to work.Therefore,in the context of big data,it is of great theoretical significance and practical value to study the quantile regression method of large-scale data and solve the technical problems in its modeling process.This paper proposes a new communication efficient proxy quantile regression method for non random distributed systems with large-scale data,and performs sparse penalty learning on the method.Specifically,first,small samples are randomly extracted from different working machines and transmitted to the host.Then,the full sample quantile regression loss function is approximated to the proxy loss function on the host.The working machine and the host communicate through gradient vectors,and finally,the final quantile regression estimate is obtained on the host.The new method not only can overcome the non random distribution property,but also the communication only includes sampling samples and local gradients,Greatly reducing the amount of computing and communication costs.On this basis,a new communication efficient proxy quantile regression and penalty sparse learning are further combined to construct a penalty quantile regression method for non random distributed systems.The proxy quantile regression method inherits the characteristics of non random distribution,reduces communication costs,and can solve high-dimensional data problems.Finally,numerical simulation and empirical analysis verify that both the proxy quantile regression method and the penalty proxy quantile regression method have good performance in large samples and high-dimensional data,and are better than other known methods in non random situations. |