Research On Personal Recommendation Algorithm Based On Clustering Ensemble Of High-dimensional Data

Posted on:2015-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:J W Liu

Full Text:PDF

GTID:2298330422982687

Subject:Computational Mathematics

Abstract/Summary:

Data mining refers to mine the data which is useful, and can be used for decision support,data analysis and other fields. Nowadays, with the development of information technology andthe continuously expansion of the scale of data, the cluster analysis of large-scale and highdimension data become a hot point and difficult point of research. The traditional algorithmoften uses the method of Euclidean distance and other functions to define data similarity, andthe clustering method is based on the similarity between data. Due to the sparseness of highdimensional data, with the increase of dimension, the difference of distance between the data isno longer obvious when promote the distance metric function in low dimensional space to highdimension space, and its effectiveness is greatly reduced.The general data mining algorithm would have "dimension disaster" in the treatment ofhigh dimensional data, clustering algorithm. Clustering ensemble technique makescomprehensive utilization of multiple generic algorithms to produce based-clustering, whichgreatly improves the accuracy and stability of clustering results. Clustering ensemble can bedivided into three stages generally: the generation of based clustering, the acquisition ofintegration relationship, the determination of final clustering scheme. Through the parametersetting for a clustering algorithm or using different clustering algorithm to generate based-clustering members, processing based-clustering members with the method of relation matrixto produce integrated relationship, this paper can determine the final clustering results by themethod of hyper-graph partitioning finally.Because of the spectral problems of data mining, how to provide consumer or user withbetter services? Such as: product recommendation, screening and so on are the problems thesupplier needs to solve.Many scholars have studied the benefits of sample weighting from the integration ofiterative clustering method. Based on the existing research, this paper discusses how to makesample weighting, how to confirm the weight, how to improve the clustering quality of K-means algorithm, how to apply the element clustering algorithm based on sample weighting into high dimensional data, personalized recommendation and other practical problems.Research and works are as follows:1. This paper proposes a k-means algorithm (W-O-k-means) of sample weighting. It hasthe advantage at the efficiency of one-time weighting task demonstrated by k-means.2. By introducing the updated integrated clustering algorithm and optimization algorithmbased on sample weighting, and comparing experiments, this paper could apply it into thepractical application of personalized recommendation.3. Through a large number of experiments, this paper compares the difference betweenthe performance of clustering ensemble algorithm based on sample weighting and other existingclustering ensemble algorithm.Through experiments, this paper applies proposed algorithminto personalized recommendation, and compares the results.

Keywords/Search Tags:

Clustering Ensemble, High-dimensional, Data Personal, Weighted-object

Related items

1	Study On Ensemble Clustering For High-dimensional Data
2	Research And Application Of Rough Clustering Algorithm For High Dimensional Data Sets
3	Research On Clustering Algorithms For High-Dimensional Data
4	Using A Weighted Network Graph Clustering And Subspace Ensemble Approach For High-dimension Data Classification
5	Research Of Clustering Methods On High Dimensional Data
6	Clustering Ensemble Algorithm Based On Mixed Data Representation
7	Research On Clustering Ensemble Methods And Their Applications
8	Adaptive Semi-supervised Clustering Ensemble For High Dimensional Data
9	Research On Key Technologies Of Clustering Ensemble
10	Research On Ensemble Clustering Algorithm