Research And Application Of Density Clustering Algorithm Based On Kernel Principal Component And High Dimensional Distance

Posted on:2020-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:L J Huang

Full Text:PDF

GTID:2428330623452527

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Cluster analysis aims to aggregate unordered and mixed data into different clusters according to a similarity measure.It is an indispensable part of intelligent analysis in the era of big data.However,the particularity of high-dimensional data and the emergence of dimensional disasters have caused traditional clustering algorithms to no longer process data efficiently.Therefore,this paper studies high-dimensional clustering.First,the characteristics of high-dimensional data are expounded,and its impact on traditional similarity measures is discussed.Aiming at this problem,the proximity measure function of various high-dimensional data is analyzed,and the functions and characteristics of different measure functions are discussed.The data sets of different dimensions are used for k-means clustering comparison,and the clustering results are combined to obtain the optimal distance.Metric function.Secondly,the existing high-dimensional clustering techniques based on dimensionality reduction are described,and the advantages and applicable data types of different dimensionality reduction techniques are compared.Finally,based on the above research,this paper proposes a density-based KGDBSCAN clustering algorithm based on kernel principal component(KPCA)dimension reduction and improved high dimensional distance(Gsimi)and its application.This paper uses the data sets of different dimensions in the UCI database to verify the actual effect of the KGDBSCAN clustering algorithm and compare it with the traditional DBSCAN clustering algorithm.The experimental results show that the improved clustering algorithm has the highest accuracy in three dimensions in high dimensional space,which effectively improves the quality and results of clustering.At the same time,the improved clustering algorithm is applied to the actual problem,and the customer's viewing information and TV product data collected by a broadcasting and television network operating company are used for cluster analysis.Firstly,the raw data is formed into two data tables of user viewing frequency and user on-demand frequency through pre-processing calculation.The processed data set is reduced by KPCA technology,and the similarity is calculated by using Gsimi function and DBSCAN is performed.Clustering,clustering forms four different types of users and two different types of programs.Then,the characteristics of different types of users and programs are analyzed,and the viewing behaviors and viewing preferences of different types of users are compared and summarized.Finally,the user results are given from the perspectives of historical behavior,similar program recommendation,similar user viewing,and comprehensive recommendation.The example of the recommended TV product scheme,the experimental results verify the effectiveness and feasibility of the improved high-dimensional clustering algorithm.

Keywords/Search Tags:

Clustering, High dimensional data, Proximity metric, KPCA dimensionality reduction, BSCAN algorithm

PDF Full Text Request

Related items

1	A Research On Dimensionality Reduction Optimization For High-Dimensional Dataset
2	Dimensionality Reduction And Classification Of High-dimensional Data Using Cosine Metric
3	A Perception-Driven Approach To Supervised Dimensionality Reduction For Visualization
4	Research Of Method And Application On Dimensionality Reduction Of High Dimensional Data Based On Multivariate Chart
5	Neural Network Based Dimensionality Reduction And Its Application In High-dimensional Data Clustering
6	A Research Of Key Technology Of Dimensionality Reduction Of High Dimensional Data
7	Nonlinear Dimensionality Reduction Based On Stochastic Initialization
8	Research And Design Of Clustering Method Based On Large Data And High Dimensional Data
9	Research On And Design Of Dimensionality Reduction Algorithm For The High Dimensional Data
10	Research On Dimensionality Reduction And Indexing Algorithm Of Multimedia Database And System Implementation