Font Size: a A A

Improvement And Application Of Density Peak Clustering Algorithm

Posted on:2018-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:T YuFull Text:PDF
GTID:2359330536959146Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In big data era background,the growthof data revealed a surprising speed,data accumulation gradually increased,the internal structure of data become unclear,which makes really understand the relationship between the data become particularly complex,based on this,the clustering mining technology arises at the historic moment,through the unsupervised learning,the actual internal relationships excavated from the vast data between data has become a hotspot of machine learning to explore.Clustering by fast search and find of density peaks(which could be called DP for short in this paper)is published in Science in 2014.The clustering algorithm is based on density,showed great competitiveness in the field of data mining,and DP algorithm has clustering advantages of high efficiency,easy operation,few parameters and simple algorithm principles.Therefore,DP algorithm is put forward by the broad attention and research,the majority of scholars in academia,business and other fields has been widely recognized and applied.However,DP algorithm still exist some defects:(1)the DP algorithm for high-dimensional data clustering analysis,due to the high-dimensional data has more dimension and the existence of redundant information,clustering quality will be affectedseverely,making clustering algorithmdifficult to find the real clustering structure of data;(2)the parameters of the DP algorithm need for human intervention,the researchers usually according to their own experience in the adjustment of the parameters,the lack of the choice of a certain basis;(3)DP algorithm clustering results cannot be automatically given,determine the need to manually.This paper studies the above issues,respectively puts forward different improvement planto solve the problems above:(1)In view of the difficulties of the density peak clustering algorithm to cluster high-dimensional data.Thedensity peak clustering algorithm-based onentropy weightand kernel principal component analysisis proposed.The algorithm firstly uses the entropy weight method toempower sample data,eliminate the influence of irrelevant attributes and then using kernel principal component analysis(KPCA)for high-dimensional data dimension reduction.At last,high-dimensional data is realized by using the density peak clustering algorithm in low dimensional space of clustering.(2)Forthemost of the current clustering researches do not consider the influence of different attributes to clustering results,they all think that all attribute's contribution to the clustering results are the same.In fact,the influence of different attributes to the clustering result is of great difference.For this,an improved attribute importance-based clusteringis put forward.At first,use the method of coefficient of variation to empowerattributes for different weights,then use kernel principal component analysis to reduce dimension reasonably in nonlinear way.Finally,usedensity peak clustering algorithm to achieve the final clustering results.(3)Fordensity peak clustering algorithms need to set parametersmanually,density peak-based on fruit flyoptimization algorithmis proposed.Utilizing the global optimization ability of fruit fly optimization algorithm,with information entropy as the evaluation function,finding the best distance parameter of the density peak algorithm.Put an end to artificially setparameter and use silhouetteeffectiveness indexto determine the best clustering numbers.At the same time,in view of theadvantages of the improved density peak clustering algorithm,we put the improved density peak clustering algorithmin the analysis of home appliance industry listed companies' stock andprovide theory basis for analyzing marketcorrectlyfor listed company.At last,put forward the objectiveand accurate investment schemefor listed company.
Keywords/Search Tags:Density peak clustering algorithm, Entropyweight method, Kernel principal component analysis, Coefficient of variation method, Fruit flies optimization algorithm
PDF Full Text Request
Related items