| Clustering is an important unsupervised learning method in machine learning.With the development of big data technology,clustering has attracted increasing attention and wide applications in many fields.Density peaks clustering(DPC)proposed by Rodriguez and Laio in 2014 is one of the hot topic of clustering anal-ysis in recent years.This dissertation focuses on the DPC algorithm from three aspects of the improvement based on multiple kernel learning,the extension to semi-supervised clustering and the application in combination forecasting,and devotes the following topic:1)multiple kernel learning based density peaks clus-tering algorithm;2)semi-supervised density peaks clustering algorithm;and 3)DPC-based combined model.Firstly,the performance of DPC depends on the reliable estimation of local density and the single kernel function,such as the Gaussian kernel function,is the commonly used method for local density estimation in the current research.But for the complexity of data,a single kernel function may not reflect the underlying patterns and interesting distribution of a specific dataset,and it is often difficult to determine the most suitable kernel and the associated parameters.Based on the kernel learning theory,this thesis proposes a multiple kernel learning-based density peaks clustering algorithm by introducing the multiple kernel function.By mapping the samples into a reproducing kernel Hilbert space via the kernel trick,the proposed algorithm estimates in kernel space the local density and distance between any two samples based on the multiple kernel function.To decide the weights of base kernels,the thesis takes the Beta CV index as the objective function and employs PSO for optimal solutions.The performance is illustrated upon simulated dataset Jain and image recognition datasets MNIST and Olivetti.Secondly,as an unsupervised learning method with fast clustering,DPC only uses unlabeled samples for learning but has no way to utilize prior label infor-mation to aid clustering.When a small amount of label information is available,DPC may cause the information of labeled samples wasted.In this thesis,a novel semi-supervised density peaks clustering algorithm is proposed to extend the clas-sical DPC to semi-supervised clustering.The algorithm integrates the labeled samples with identified centers by introducing virtual labels to mark a center that has the identical prior label to other centers,avoiding missing underlying cen-ters by consciously expanding the number of clusters.To keep the correctness of prior label information,the mergence strategy only merges the clusters marked by virtual labels with their nearest ones but does not merges two clusters with dif-ferent prior labels.Clustering results on eight UCI datasets demonstrate that the proposed algorithm can effectively incorporate the partial information to improve performance.Finally,considering the advantages of fast clustering capacity for arbitrary shape data and easy implementation without the need of iteration and optimiza-tion,this thesis introduces DPC to combination forecasting for the individual models selection and proposes a combination forecasting model named DPC-based combined model.The proposed model generates candidate individual models in ways of decomposition-simulation-reconstruction by applying CEEMD,SVM,GRNN,and optimization algorithm PSO,GWO,PSO-GSA.Five evaluation in-dexes are employed to describe the prediction performance of candidate individual models to construct the sample set for cluster analysis.By applying DPC,the candidate individual models can been clustered into different categories,the linear combination forecasting is modeled by selecting the individual model with min-imum MAPE from each category.The PM2.5concentration time series of four Chinese cities,Dalian,Chongqing,Xiamen,and Wuhan are employed to evalu-ate the prediction performance,and the data analysis results demonstrate that the proposed model can effectively select the individual models for high-precision forecasting. |