Font Size: a A A

Research On Transfer Learning Based Gaussian Mixture Model Clustering Algorithms

Posted on:2022-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:R R WangFull Text:PDF
GTID:2518306347473084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering,as a powerful tool for data mining,has been widely applied in various scenarios,for example,personalized recommendation,anomaly detection,feature learning,and image segmentation.Gaussian mixture model(GMM)-based clustering,due to its rigorous mathematical reasoning and powerful ability of fitting,is favored by scholars of computer science and statistics.However,with the rapid change of information environments and the development of advanced technologies,GMM-based clustering is still faced with various new challenges.On the one hand,it is difficult to collect a large number of reliable data in many emerging fields.When data is insufficient,it is hard to obtain satisfactory clustering results by leveraging the traditional GMM-based centralized clustering methods.On the other hand,with the continuous development of high performance computing and distributed networks,data are scattered to different nodes,which is a challenging work for the centralized clustering methods.Therefore,distributed clustering algorithms are required to address the issue.However,the existing GMM-based distributed clustering methods do not obtain a closed-form solution when solving the covariance matrix,which consume more iteration time to achieve the global consistency of shared knowledge.Moreover,the information interaction is hidden in the iterative process of clustering,which cannot be well explained and expressed.Fortunately,transfer learning provides a new way to tackle these two challenges.Transfer learning refers to utilize the knowledge learned from some related or similar domain to guide the accomplishment of the target task.In this thesis,motivated by the idea of transfer learning,the novel centralized and distributed transfer clustering models are proposed to meet the demand of new application scenarios.The main works of the thesis are as follows.Firstly,for the insufficient data,a general transfer centralized GMM-based clustering framework is designed in this thesis.The important knowledge such as the cluster mean and the covariance matrix are extracted by the conventional GMM-based clustering methods on the source domain.Then,the knowledge is transferred to guide and improve the data clustering on the target domain.Based on the framework,three classical GMM-based methods,i.e.,expectation maximization(EM),entropy-type classification maximum likelihood(ECML),and entropy penalized maximum likelihood estimation(EPMLE),are extended to the corresponding transfer clustering versions.Besides,to avoid the negative transfer issue,the maximum mean discrepancy metric is introduced to measure the similarity between the source domain and the target domain,so as to search the most matched source domain to provide more positive guidance for data clustering on the target domain.The algorithms are tested on synthetic datasets and real-world datasets,and experimental results illustrate that the clustering performance of the proposed transfer clustering approaches are greatly improved in comparison with the corresponding traditional clustering algorithms.In addition,compared with the existing transfer clustering methods,the presented GMM-based algorithms show better clustering accuracy.Furthermore,for the distributed data in P2 P networks,a general transfer distributed GMM-based clustering framework is constructed in this thesis.Each node is deemed as a source domain and a target domain at the same time.They can learn from each other to promote the performance of distributed clustering.Based on this framework,the distributed EM(DEM)algorithm is redesigned and a transfer learning term is added to the objective function to accelerate the global convergence of clustering.Additionally,the intermediate variables are further simplified and the consistent constraint term regarding inverse covariance matrix is defined to obtain the closed form of the gaussian parameter.What's more,the new transfer DEM algorithm is further improved with the adaptive-learning-rate strategy,in which adaptive learning rates instead of fixed values are adopted to achieve the stable clustering accuracy.Finally,to exhibit the generality of the proposed framework,the classical ECML clustering algorithm is further extended to the transfer distributed version.Experiments on both the synthetic dataset and real-world datasets demonstrate the efficiency of the presented algorithms compared with the existing GMM-based distributed clustering methods.
Keywords/Search Tags:clustering, transfer learning, Gaussian mixture model, distributed P2P networks
PDF Full Text Request
Related items