Font Size: a A A

Associations Mining Research Based On Granular Computing

Posted on:2021-05-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H H ChengFull Text:PDF
GTID:1368330620463174Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,deluge of data has been accumulated in various fields,such as science and industry.There are abundant association structures in those data,identifying and filtering valuable associations among them is one of important tasks of big data complex association mining.Complex association mining is widely used in machine learning and data mining,and its development has a profound impact on the progress in related fields.However,the characteristics of complex and diversity of data types?uncertainty of data distribution?complex and diversity of associations?co-existence of different associations and existence of spurious correlation in deluge data,make the statistical association measures face many challenges.Association measures based on the principle of proportional reduction of error tend to identify linear correlations,recognizing of more complex associations depends on the choice of transform functions.Association measures based on statistical independent test depend seriously on the estimation methods of joint distribution and marginal distribution,different estimation methods or the same method with different parameters will lead to different results,it even cannot be calculated when joint distribution does not exist.Using these methods to conduct complex associations mining task,their defects and complexity will affect the accuracy of estimated results and increase the difficulty of associations recognition task.Therefore,it's urgent to develop a new simple yet effective association measure paradigm,which data-driven,independent of data distribution?parameter selection and association types.Granular computing is increasingly becoming an effective paradigm for dealing with complex problems in artificial intelligence,information processing,data mining and knowledge discovery.Based on the theory and methods of granular computing,new theory and methods of mining complex associations are expected to be born.We closely focus on the challenges to the new paradigm of complex association measures and carry out the complex association mining methods based on granular computing from three aspects: theoretical basis,method design and empirical application.The research contents and results of this thesis are as follows:1.Theoretical basis:(1)We summarize two structural mechanisms of statistical association measures,and analyze characteristics and challenges of representative methods of each structural mechanism.At the same time,combining the advantages of statistical methods and the requirements of complex association mining task in big data,we propose some properties that a new paradigm of associations measure may need to satisfy.This provides theoretical guidance to develop association measures for different tasks.(2)We analyze validity of a granular structure representing its sample information,unify knowledge representation of granular structure induced by different binary relations,propose the distance measure between different granular structures.We also propose grouping granular algorithm,which reveals a granule of sample can effectively describe and represent its information.This establishes theoretical foundation for designing uncertainty measure and association measures based on granular computing.2.Methods design:(1)For designing an association measure to conduct association mining task of multivariate inter-variables,we propose some uncertainty measures based on k-NN granule(including neighborhood entropy?neighborhood joint entropy?neighborhood conditional entropy and neighborhood mutual information),fuse the normalized neighborhood mutual information under different neighborhood granular structures,and design the maximal neighborhood coefficient(MNC),which satisfies comparability?generality?equitability?monotonicity and scalability.Theoretical and experimental results show that MNC can be used to identify and filter potentially complex associations in big data.(2)For providing more auxiliary information to identify an association of binary variables,we deeply analyze the working mechanism of MNC in the case of binary variables,and design three statistics: the degree of monotonicity,the degree of close to a function and the complexity of associations from granule computing perspective.(3)For designing an association measure to conduct the association mining task of multivariate intra-variables,we propose multivariate joint neighborhood uncertainty and neighborhood total association coefficient based on k-NN granule,fuse the normalized neighborhood total association coefficient under different neighborhood granular structures,and design the maximal neighborhood total association coefficient(MNA).Experimental results show that MNA can satisfy dimensionality unbiasedness and noise robustness.3.Empirical application: In order to overcome the shortcoming of classical fuzzy C-means clustering algorithm,which ignores the difference information between clusters,we introduce a multivariate association measure to identify diversity information between clusters,and design a diversity-inducted fuzzy clustering algorithm,which can improve clustering performance especially the dataset with cluster unbalance and large cross region.In a few words,the multivariate inter-variables association measure MNC,the multivariate intra-variables association measure MNA and the auxiliary information statistics MNNE of binary variables are preliminarily formed based on granular computing.The results from theoretical to empirical application show that association measure based on granular computing can effectively solve the challenge of statistical association measures.The association mining based on granular computing is expected to promote the development of theoretical analysis and technical methods of big data.
Keywords/Search Tags:Big data, Data mining, Complex associations, Association measure, Data-driven, Granular computing, Uncertainty, Fuzzy clustering
PDF Full Text Request
Related items