| With the progress of science and technology and the development of society,the form of data is becoming more and more complex.Functional data is a very important and widely used data type at present.Functional regression model is widely used in medical,financial,meteorological,biological and other fields.It can effectively analyze functional data and provide accurate and reliable results for decision makers,so as to improve the efficiency and accuracy of decision-making.Because there may be model setting errors in the practical application of functional linear regression model,when the model assumptions are not established,the estimation effect of statistical inference based on the model assumptions may be poor,and even lead to wrong conclusions.Therefore,this paper considers the establishment of functional partial linear model,which can make the model more flexible.In addition,we consider the application of functional partially linear model under the background of massive data.The main contents are as follows:(1)A functional partial linear model is developed,the slope function is represented by B splines,the loss function is minimized based on least squares,and the roughness penalty and Smoothly Clipped Absolute Deviation(SCAD)penalty are applied to the slope function,and the SCAD penalty is applied to the scalar coefficients,giving the slope function and scalar coefficients estimated values.The simulations are analyzed by setting the variance of the error term such that the signal-to-noise ratio of the model is 4.Three different slope functions are considered for each simulation,and the results show the limited sample performance of the locally sparse estimation method.Finally,the empirical part of the study analyzes the air quality of 31 major Chinese cities in 2018 to verify the effectiveness and practicality of the above algorithm.(2)A functional partial linear model was established.Under the framework of Communication-efficient Surrogate Likelihood(CSL),a surrogate loss function was constructed,and roughness or SCAD penalty was applied to parameters.The parameter estimates are given by iterative local estimation algorithm.In the simulation and analysis of massive data,three basic coefficients,fixed population data N and sample data n,were considered.With the increase of the number of machines K,the influences of three methods: no penalty,roughness penalty and SCAD penalty on the coefficient estimation effect were compared.The results show the superiority and accuracy of CSL estimation method.In the final empirical analysis,global climate data based on RCP4.5 is used for analysis.The results show that in the case of a large amount of data,the number of machines can be appropriately increased and the data can be evenly divided on each machine,which can not only reduce the difficulty of calculation,but also stabilize the values of ISE and Sd with the increase of the number of machines,ensuring a good estimation effect.The method proposed in this paper enrichis the relevant research of functional partial linear models,and provides reference for the specific application of functional partial linear models under complex data,which will help us to solve more practical problems. |