With the rapid development of information technology,the data not only accumulate in large quantities but also grow rapidly in recent years,which means that the era of big data has come.Big data is widespread in all fields and has become an important economic asset for human development.Effective data analysis and mining will promote the efficient and sustainable development of countries,enterprises and the whole society.Research plans about the application of large data have been carried out from country to country.As a result of continuous expansion of the observation angle and the depth of understanding,tens of thousands or even more high-dimensional big data continue to produce in actual environment.Confronting high-dimensional big data,classification,clustering and other data analysis methods are often unsatisfactory,inefficient or even completely unavailable due to the dimensionality disaster brought by high dimension and the processing load caused by large amount of data.This article analyzes the existing problems in high-dimensional big data analysis,and summarizes the domestic and foreign research results on dimensionality reduction,clustering,classification of high-dimensional data and big data processing techniques.It is pointed out that the feature extraction of high-dimensional data is an advantageous method to reduce the dimension of data and decrease the workload of artificial feature selection.In this article,the disadvantages of using deep neural network as learning model for feature extraction of high-dimensional data are pointed out.For the classification of high-dimensional data,another deep neural network called multilayer extreme learning machine is used as the basic model to construct multilabel classifier,and the classification experiment of multiple power quality disturbances is carried out.By contrast,it can not only obtain better classification results,but also reach a high efficiency.In addition,for solving the problem that although k-means clustering algorithm is easy to use and has many other advantages,it has a poor applicability for high-dimensional data,the unsupervised extreme learning machine is used to reduce the dimension of data before clustering.Compared with the experiments without dimensionality reduction or using other dimensionality reduction algorithms,it is concluded that the clustering result of this method is more consistent with actual law and its clustering efficiency is higher.Based on the random matrix theory,a feature extraction method for high-dimensional data called FEMPL is proposed.It is suitable for the analysis of ultra-high-dimensional data.In this article,random matrix and its Mar?enko-Pastur law theory are briefly described.From the discrepancy of eigenvalue limiting spectral distribution among random matrixs and non-random matrices,the idea that this discrepancy can be used for feature extraction is derived.In this article,the method to represent the data as a matrix and the specific feature constitution of FEMPL are given,and the steps of FEMPL feature extraction are described.The validity of FEMPL was validated by two cases that the classification of multiple power quality disturbances signals and embedded analysis of user's electric load data.The cases also shows that FEMPL has very flexible requirements on the data organization.Because there is no coupling among data samples in the feature extraction process,FEMPL is easy to parallelize.In order to alleviate the computational load of high-dimensional big data,a basic model of data analysis using parallel FEMPL method in distributed environment is given.Taking k-means clustering analysis as an example,a distributed parallelization cluster analysis process which uses MapReduce computation model to combine FEMPL with k-means was provided. |