Font Size: a A A

Data-driven Clustering Mining Methods And Applications

Posted on:2018-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M WangFull Text:PDF
GTID:1360330515497617Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Clustering analysis is the important direction of data mining,which aims to discover the clustering patterns of geographical phenoemena.Clustering analysis can not only be an independent data mining tool,but also can intergrate other data mining methods to mine the intrinsic knowledge.Clustering analysis as one of the hot research topics has been widely used in various domains,such as categraphic generalization,remote sensing image classification,image segmentation and crime hotspot analysis.Recently,many researchers have expolered data mining theories and methods.Combining the demands of users in real applications,existed studies still have several common shortcomings.First,existed methods merely judge whether the objects are judged as a cluster.The significant levels of the clusters are not judged and evaluated.Second,existed methods are generally highly dependent on priori knowledge to set a series of parameters.However,during the clustering mining procedure,priori knowledge is always lack of and the proper parameters are difficult to be set.Third,clusters in real applications generally exist arbitrary shapes and densities with the interference of noises and barriers,and exited algorithms generally can not simultaneously consider all of the complex features.To overcome these shortcomings,a scale-driven clustering mining strategy is proposed and a series of newly developed clustering algorithms and applications are conducted in this study as follows.First,a clustering mining schema is proposed to provide an analytical framework for mining clustering patterns.Second,a series of scale-driven clustering mining algorithms are explored,and the clustering algorithms include dual clustering algorithm,time series clustering algorithm and relationship clustering algorithm.Third,a seires of real applications are conducted using the proposed algorithms.The multi-salce driven clustering schema is proposed based on the clustering mining procedure as follows.First,based on the multi-source and multi-type dataset,a series of related multi-scale driven clustering mining variables and units are designed;Then,based on the designed multi-scale driven clustering mining variables and units,the dataset is reselected and preprocessed to provide essential data for calculating the clustering mining variables,and the dataset is stored in a database,that is named as "a clustering mining feature database”;The data in the multi-scale clustering mining feature database is further calucated to obtain the clustering mining variables and stored in a database,that is named as“a clustering mining information dababase";According to the characteristics of the mining purpose and the mining variables,a proper multi?scale driven clustering mining algorithm is selected and the clustering result is eventually evaluated and visualized.Based on the abovementioned multi-scale clustering mining procedure,a generalization multi-scale driven clustering mining schema is proposed.A scale-driven dual clustering algorithm is explored.This algorithm detect clusters with similar spatial and non-spatial attributes under the consideration of the following matters:the reliability of the clustering knowledge on the scales,the lack of the priori information and the interference of the noises and barriers.In addition,the proposed scale-driven dual clustering algorithm can detect clusters with arbitrary shapes and densities,and it is also suitable for dataset with uneven distribution attributes.To verify the feasi'bility of the proposed algorithm,both simulated datasets and real applications are conducted.In the real application,the clustering aim is to obtain the spatial patterns of the house prices.The results in the real application indicate that the prices in Wuhan exhibit radiation distribution characteristics,and the radial centers include the Yangtze River and East Lake.Currently,time series clustering algorithms fail to effectively mine clustering distribution characteristics of time series data without sufficient prior knowledge.Furthermore,these algorithms fail to simultaneously consider the spatial attributes,non-spatial time series attribute values,and non-spatial time series attribute trends.Hence,to meet the adaptive and efficient clustering demands of time series data,scale-driven time series clustering mining algorithms are conducted.These algorithms include the raster based scale-driven time series clustering algorithm and vector based scale-driven time series clustering algorithm.The clustering strategies of the two algorithms are almost the same.The multi-scale clustering results are obtained at first.Then the optimal scale clustering result is selected using the result evaluation method.The proposed scale-driven time series clustering algorithms can adaptively mine the clusters with similar spatial attributes,time series attributes and time series attribute trend.The feasibility of the proposed algorithms are verified using simulated datasets and real applications.In real applications,the rainfall data is used,and the results show that the rainfall in China exists significantly spatial heterogeneity.The rainfall in the northwest is little,and the volatility in years is small.However,the rainfall in the east and south areas is big,and the volatility in years is large.In addition,the simi-humid-simi-arid line can be extracted from the clustering results.In another application,the surface deformation detection data is the datasource.The results show several interesting patterns:(1)the surface slightly rises in the old town area;(2)The surface deformation in key zones for development and wai reclamation domain are unevenly rapidly changing;(3)Most of the constructed areas in 20 years continue to have subsidence.The results can provide reference for surface deformation machinism mining.A scale-driven realtionship clustering algorithm is explored.The hierarchical strategy is utilized in the proposed algirthm.First,the spatial proximity relationship is obtained by using partial swarm optimization method and Delaunay triangulation.Second,the significant multiple variable relationship clustering zones are obtained by integrating the Apriori algorithm and an improved density based clustering mining method.Simulated datasets is set to verifty the performance of the algorithm,and the result shows that the proposed scale-driven relationship clustering algortithm can adaptively detect the predefined clusters.Then,the algorithm conducts on a real application.It is used to mine the clustering zones where the soil element is significantly affected by the surrounding environment.The result indicates that the influence of the environment on the soil element is remarkable.Based on the result,a newly developed calibration set selection method is proposed to select representative samples,and prediction models are constructed to estimate the relationship between the visible near-infrared spectra and the soil element.In summary,the aim of this study is to adaptively mine clustering patterns by considering the reliability of clustering knowledge on mining scales.A scale-driven clustering mining schema is proposed to guide the mining operation of multi-scale driven clustering mining.Based on the multi-scale driven clustering mining schema,a series of clutering mining algorithms and applications are explored.The algorithms include multi-scale driven dual clustering mining algorithm,multi-scale driven time series clustering mining algorithm and multi-scale driven multiple variable relationship clustering mining algorithm.
Keywords/Search Tags:Multi-scale driven, data mining, clustering
PDF Full Text Request
Related items