Font Size: a A A

The Clustering Method Research Of Multivariable Panel Data And Empirical Analysis

Posted on:2020-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z D WangFull Text:PDF
GTID:2428330623959564Subject:Statistics
Abstract/Summary:PDF Full Text Request
The world today is a big data era of data explosion.With the unprecedented development of information technology,massive data has become the most valuable asset.As a common technical means of data mining,clustering is widely used in all aspects of the data analysis process.Panel Data,also known as time series-section data,is widely used because of its multiple data characteristics.Compared to the data of a single structure,the special data structure of the panel data can not only provide more information,but also reflect the dynamics of the data.The two key steps of clustering are the characterization of sample similarity and the choice of clustering algorithm.The dynamics and complexity of the three-dimensional structure of panel data make it difficult to measure the similarity between samples and clusters in the clustering process.Based on this background,this paper starts from the perspective of similarity statistics design,and proposes two clustering methods suitable for multivariable panel data.The empirical results show that both methods can effectively complete the clustering of research objects,and The feature extraction angles of the panel data are each focused.The research content of this paper can be summarized as follows:(1)The research background,development history and research status of panel data clustering method are summarized.Some shortcomings and optimization areas in the existing methods are summarized.Based on this,the research content of this paper is given.Innovation points and technical routes.(2)Analyzed the structure of the panel data,and elaborated the principle and calculation steps of the data processing method involved in the construction of the panel data clustering model.The specific methods include principal component analysis,entropy weight method,and symbol of time series data.Method,trend distance principle and system clustering method.(3)Inspired by the trend symbolization research of time series,it is improved and extended and applied to the processing of panel data,subtly extracts the dynamic trend characteristics of the data,and then characterizes the similarity of the research object through the trend distance,and constructs A panel data clustering model based on integrated trend distance.(4)In view of some shortcomings of the previous panel data clustering method,taking the idea of the angle cosine distance,it is proposed to measure the deviation degree of the sample by the angle of the space vector,and combine the other three commonly used similarity metrics.The feature extraction of the panel data is carried out at four different angles,and the dynamics and statics,the degree of deviation and the similarity of the degree of fluctuation between cases are fully considered,and a panel data clustering model of multi-angle feature extraction is improved.(5)The two clustering models proposed in this paper were empirically analyzed by using the inter-provincial household consumption panel data and the main city real estate panel data.It is verified that both clustering models have good clustering effects,and the feature extraction of the data has its own emphasis.The panel data clustering model based on the comprehensive trend distance can accurately distinguish and extract the similarity of the dynamic development trend of the case.For the data with frequent dynamic fluctuations,this clustering model can be used;the panel data clustering model with multi-angle feature extraction can Feature extraction between cases is extracted from multiple perspectives.
Keywords/Search Tags:panel data, feature extraction, trend distance, entropy weight method, cluster analysis
PDF Full Text Request
Related items