| With the widespread popularity of the Internet and the rapid development of infor-mation technology,high-dimensional data has widely appeared in the fields of natural sciences and social sciences.In high-dimensional data,variables often have certain correlations.How to characterize this high-dimensional correlation has become an im-portant research topic in recent years.High dimensional factor model is an effective statistical tool for modeling high-dimensional data.It can describe the relationship between multiple variables and reduce the dimension of data.High dimensional factor model has a wide range of applications in statistics,econometrics,sociology,and many other fields.In the high-dimensional factor model,structural breakpoint detection and factor clustering have become important research topics in recent years.This paper mainly conducts theoretical research in these two areas,and proposes corresponding estima-tion algorithms,and proves the theoretical properties of estimators,and conducts em-pirical analysis.The main research content,conclusions and innovations of this article are summarized as follows:First,in order to detect the structural breakpoints in the high-dimensional factor model,the third chapter uses the quasi-maximum likelihood method to estimate the structural breakpoints in the model.To the best of our knowledge,this is the first article to link the consistency of break point estimator with the number of factors between the pseudo-factors and the original pre-and post-break factors,and this method effectively solves the unidentifiable problem proposed in Bai et al.(2020)[12],and theoretically proves the corresponding statistical properties of the estimators.Under some appro-priate conditions,(1)if the break leads to more pseudo-factors than the original pre-or post-break factors,then the quasi-maximum likelihood estimator is consistent;(2)If the number of pseudo factors in the entire data is equal to the number of the original pre-and post-break factors,that is,there is only the rotational change between the pre and post factor loading,then the difference between the estimated and the true change point is stochastically bounded,and the limit distribution of the quasi-maximum like-lihood estimator is derived in this case.The related theoretical results are verified by Monte Carlo simulation,and compared with several existing estimation methods,the results show that the quasi-maximum likelihood estimation method has some advan-tages.Finally,we use the quasi-maximum likelihood estimation method to detect the macroeconomic data set of the United States from December 2001 to January 2013.The result shows that the structural breakpoint is July 2007,which means that the mod-el structure has changed in the early stages of the subprime mortgage crisis.The fall in U.S.housing prices in July 2007 led to the sell-off of securities and further reduced their value.Therefore,economist Mark Zandi wrote that the events of July 2007“may be the most direct catalyst for the subsequent financial market turmoil”.Second,compared with the existing algorithms,the calculation speed of the quasi-maximum likelihood estimation method is significantly improved,because the PCA estimation is only used once for all samples in the calculation process.Bai et al.(2020)[12]needs to use PCA for all possible split points,which increases the complexity of the calculation and leads to a decrease in the calculation speed.Ma and Su(2018)[55]and Cheng et al.(2016)[34]need to use the Lasso method in the calculation process,which causes the calculation speed to be slower.At the end of this chapter,the algorithm of the one-at-a-time estimation method for multiple breakpoints is provided.Monte Carlo simulation shows that in the case of multiple breakpoints,compared with the Baltagi et al.(2020)[22]method,the quasi-maximum likelihood estimation proposed in this chapter is still performing well.Therefore,for large-scale data sets with large time dimensions and multiple breakpoints,the quasi-maximum likelihood estimation method has certain advantages in terms of calculation speed and accuracy,which has practical application value.Third,Chapter 4 extends the high-dimensional factor model to a panel data model with interactive effects and assumes that the factor loadings have subspaces structure.In panel data,due to individual heterogeneity,clustering heterogeneity is already an important issue.This chapter extends K-means clustering to a more general clustering method–subspace clustering,and proposes a least-squares subspace clustering algo-rithm to iteratively solve the unknown parameters,subspaces of factor loading and group function.At the same time,we also prove the statistical properties of these es-timators.The related theoretical results are verified by Monte Carlo simulation,and compared with several existing estimation methods,the results show that the estima-tion of least-squares subspace clustering has some advantages.This chapter also uses the proposed method to study the relationship between democracy and income in 90countries in the world from 1970 to 2000.The results show that there is a positive cor-relation between democracy and income.Finally,this chapter gives a feasible model selection criterion for the number of factors,the number of subspaces,and the number of subspace dimensions. |