Font Size: a A A

Data Scaling Theory And Method For Multi-scale Data Mining

Posted on:2020-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2428330575466738Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-scale analysis has been applied in graphics,geographic information,signal analysis,data mining and other fields.Multi-scale data mining has also been studied and applied in the fields of association rule analysis,clustering analysis and classification analysis.However,there is not research on how to conduct universal multi-scale division of data sets,nor on the construction of multi-scale data sets.Most of the existing relevant studies focus on data sets with obvious scales,and the method for the construction of multi-scale data sets has not yet formed a universal method.This thesis combines multi-scale science and data mining theory to further study the data scale division method for general data sets.It is based on multi-scale data mining tasks,builds a multi-scale dataset model and a benchmark scale scoring model.Most of the existing researches focus on data sets with obvious scales,and the general method for constructing multi-scale data sets has not been formed yet.According to the discretization method of probability density estimation,it proposes a multi-scale partitioning algorithm,which extends the data types that can be scaled,so that the partitioning result is closer to the multi-scale characteristics of the data,and this method has lower time complexity.This thesis proposes a multi-scale dataset method,including multi-scale dataset algorithm and benchmark scale selection algorithm.The dataset scale partitioning evaluation method is based on multi-scale entropy and information entropy.This method not only expands the multi-scale dataset,but also can effectively reduce the scale effect produced by the scale-based derivation of multiscale data mining.The main work of this thesis includes the following aspects:(1)Studied the theoretical basis of data scale division.Based on the granulation method of granular computing,the data types can be scaled expanded,and the definitions of scale and multi-scale data concepts are also perfected.The parameter-free probability density function is improved as the objective function for data scale division,and the multi-scale data set model is established and its rationality is verified.This part serves as the theoretical basis for the subsequent construction of a multi-scale dataset method framework.(2)Explored the theory and methods of constructing multi-scale data sets.According to the multi-scale data set model,this thesis establishes a scale division evaluation method based on information entropy and multi-scale entropy,and proposes a multi-scale data set construction method.According to the real population data,this part verifies the rationality of the algorithm,and the algorithm has lower time complexity.(3)Proposed a multi-scale dataset benchmark scale selection method.This part combines multi-scale data mining tasks to explore the benchmark scale selection methods for labeled data sets and unlabeled data sets.Based on the granular computing and three-decision theory,it proposes a benchmark scale selection method for tagged datasets.Based on the information gain method,it proposes a benchmark scale selection method for unlabeled data.Uses multiscale datasets to verify the rationality of the algorithm and the time complexity of the algorithm is also analyzed.(4)Experiment to verify the models and algorithms presented in this thesis.The experimental data used are as follows: H province real full population dataset and some urban datasets,the UCI dataset published by the University of California,and the IBM synthetic dataset.The experimental results show that the multi-scale partitioning method,the construction of multi-scale datasets and the application of the benchmark scale selection method make the indicators such as coverage rate,F1-measure and correct rate rise in different degrees in multi-scale mining,and have lower average support error.Experiments prove that the multiscale partitioning algorithm and the method of constructing multi-scale data sets are feasible.The proposed multi-scale data set and benchmark scale scoring model are effective.
Keywords/Search Tags:Multi-scale data mining, Scale division, Granular computing, Multi-scale entropy, Information entropy
PDF Full Text Request
Related items