Cluster Analysis Based On Panel Data

Posted on:2021-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2518306248955869

Subject:Applied Statistics

Abstract/Summary:

According to academic requirements,cluster analysis method is one of the most important methods in the process of statistical learning,and has been the focus of people’s learning and attention since its formation.From the usage frequency point of view,clustering analysis can not only be used as an independent analysis method,but also can preprocess data for other methods,which has a higher usage frequency in practical problems.In recent years,the attempt to do cluster analysis on multi-dimensional data has gradually aroused extensive discussion in academia.As the most typical three-dimensional data,panel data clustering has become a breakthrough in multi-dimensional data clustering.Based on panel data,this paper discusses the clustering analysis method of panel data.Firstly,from the perspective of traditional clustering analysis,this paper introduces the definition of clustering analysis and introduces and compares three common clustering methods: partition clustering,hierarchical clustering and density-based clustering.At the same time,the definition of distance involved in cluster analysis,including 3 sample distances(especially European distance)and 6 common inter-class clusters in system clustering,are briefly explained.Secondly,according to the three-dimensional characteristics of panel data,the time dimension of the data is weighted.Then,two clustering methods specifically for panel data are given: Ward method for panel data and grey whitening weight function method.In the clustering of grey whitening weight function,the time dimension of data is weighted and a weight determination method based on maximizing the sum of squares of deviations is given.After that,a method of using dependent variables as clustering labels is proposed,and the Rand_index method is used to judge the validity of clustering results.Finally,according to the selection criteria of "Environmental Air Quality Standard",7 air quality-related indexes of 40 cities in 50 days in 2019 are randomly taken as basic panel data,air quality index(AQI)is taken as dependent variable,panel data clustering analysis is carried out for the other 6 variables by panel data Ward method and grey whitening weight function method respectively,and the effectiveness of clustering results is compared and analyzed by Rand_index method.The results show that the grey whitening weight function clustering method is more effective in clustering panel data,and the obtained results can better explain the problems of the data,which is more conducive to our understanding and analysis of the data,and give a conclusion for the model.The biggest highlight of this paper is that Ward method of panel data and grey whitening weight function clustering method are compared in an example for the first time,and air quality index(AQI)is proposed as a sample label,thus it is proposed that dependent variable index related to clustering index can be used as an artificial label to judge the validity of clustering results by Rand_index method.

Keywords/Search Tags:

Panel Data, Ward Clustering Of Panel Data, Grey Whitening Weight Function Clustering, Cluster Validity Analysis, Air Quality

Related items

1	Research On Clustering Method Of Grey Panel Data And Its Applications
2	Study On The Panel Data Clustering Analysis Method And Empirical Analysis
3	The Clustering Method Research Of Multivariable Panel Data And Empirical Analysis
4	Research On Panel Data Clustering Method Based On Dimension Reduction
5	Research And Application Of Panel Data Clustering Based On Feature Extraction
6	Study On The Grey Panel Data Cluster Method And Its Application
7	Research On Clustering Algorithm And Application Of Panel Data
8	Research On Panel Data Clustering Based On Gaussian Mixture Model
9	Research On Weighted Cluster Ensemble Algorithm Based On Validity Evaluation
10	Improvement And Empirical Study Of K-means Ciustering Algoirthm On Panel Data Analysis