Font Size: a A A

Research And Application Of Density Clustering Algorithm Based On Kernel Principal Component And High Dimensional Distance

Posted on:2020-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:L J HuangFull Text:PDF
GTID:2428330623452527Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Cluster analysis aims to aggregate unordered and mixed data into different clusters according to a similarity measure.It is an indispensable part of intelligent analysis in the era of big data.However,the particularity of high-dimensional data and the emergence of dimensional disasters have caused traditional clustering algorithms to no longer process data efficiently.Therefore,this paper studies high-dimensional clustering.First,the characteristics of high-dimensional data are expounded,and its impact on traditional similarity measures is discussed.Aiming at this problem,the proximity measure function of various high-dimensional data is analyzed,and the functions and characteristics of different measure functions are discussed.The data sets of different dimensions are used for k-means clustering comparison,and the clustering results are combined to obtain the optimal distance.Metric function.Secondly,the existing high-dimensional clustering techniques based on dimensionality reduction are described,and the advantages and applicable data types of different dimensionality reduction techniques are compared.Finally,based on the above research,this paper proposes a density-based KGDBSCAN clustering algorithm based on kernel principal component(KPCA)dimension reduction and improved high dimensional distance(Gsimi)and its application.This paper uses the data sets of different dimensions in the UCI database to verify the actual effect of the KGDBSCAN clustering algorithm and compare it with the traditional DBSCAN clustering algorithm.The experimental results show that the improved clustering algorithm has the highest accuracy in three dimensions in high dimensional space,which effectively improves the quality and results of clustering.At the same time,the improved clustering algorithm is applied to the actual problem,and the customer's viewing information and TV product data collected by a broadcasting and television network operating company are used for cluster analysis.Firstly,the raw data is formed into two data tables of user viewing frequency and user on-demand frequency through pre-processing calculation.The processed data set is reduced by KPCA technology,and the similarity is calculated by using Gsimi function and DBSCAN is performed.Clustering,clustering forms four different types of users and two different types of programs.Then,the characteristics of different types of users and programs are analyzed,and the viewing behaviors and viewing preferences of different types of users are compared and summarized.Finally,the user results are given from the perspectives of historical behavior,similar program recommendation,similar user viewing,and comprehensive recommendation.The example of the recommended TV product scheme,the experimental results verify the effectiveness and feasibility of the improved high-dimensional clustering algorithm.
Keywords/Search Tags:Clustering, High dimensional data, Proximity metric, KPCA dimensionality reduction, BSCAN algorithm
PDF Full Text Request
Related items