Font Size: a A A

Research On And Application Of Clustering Algorithms Based On Deep Learning

Posted on:2022-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:H FeiFull Text:PDF
GTID:2518306608959389Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Clustering algorithm is an unsupervised learning algorithm,which is commonly used to process massive data and extract valuable cluster information from it.With the explosive growth of data information,a series of data-driven research fields such as computer vision,natural language processing and computational bioinformatics have emerged a large amount of data,and the corresponding cluster analysis tasks are also increasing.The deep learningbased clustering algorithm(deep clustering algorithm)takes advantage of the inherent characteristics of the highly non-linear transformation possessed by deep neural networks,and maps the original data to a new feature space to better complete the clustering analysis.The deep clustering algorithm solves the problem of insufficient processing capacity of the traditional clustering algorithm for massive high-dimensional data to a certain extent,but still has the shortcomings of high model complexity and unstable training,which greatly limits the application of deep clustering algorithms range.The continuous deep clustering algorithm is designed to solve the multi-task situation in practical application problems and process continuous information flow,while being able to resist the catastrophic forgetting of new tasks that will cover old tasks.In summary,the study of deep clustering algorithms and continuous deep clustering algorithms has important theoretical significance and practical application value.The topic of the thesis comes from the National Natural Science Foundation of China.The author of this thesis proposes a deep clustering algorithm and a continuous deep clustering algorithm,and implements the two proposed algorithms into clustering tools to apply to image clustering,face clustering,multi-task news text clustering and multi-task web page clustering issues,and verify the practical application effect of the algorithm.The main work and innovations of this thesis are as follows:(1)In order to improve the accuracy and efficiency of the deep clustering algorithm,the author of this thesis proposes a deep clustering algorithm: Enhanced Cluster GAN(ECluster GAN).The overall training GAN generate adversarial loss and backprop decoding to form a discrete continuous clustering loss network structure to achieve latent spatial clustering;for the shortcomings of insufficient model training stability,a Dynamic Gradient Penalty(DGP)Generative Adversarial Nets WGAN-DGP is proposed.Further improves the model's ability to fit the original data distribution;at the same time,the L2 loss is used for the backprop decoding algorithm,which improves the model's ability to reconstruct the latent space in high-dimensional situations,with a smaller time cost to complete cluster analysis with higher accuracy.(2)In order to make the deep clustering algorithm have lifelong learning ability in multitask continuous clustering analysis,the author of this thesis proposes a continuous deep clustering algorithm based on model expansion: Related Task Model Selection Continuous Clustering(RTMSCC).Use the latent space clustering algorithm ECluster GAN as the problem solving model,and use the gating autoencoder to realize the related task recognition function,help the current task identify and activate the problem solving model of the related previous tasks,and further complete the matching based on the knowledge retained by the model.The clustering of the current task achieves the effect of resisting catastrophic forgetting while ensuring high-precision clustering.(3)In order to verify the actual application effects of the above two algorithms in single-task clustering analysis and multi-task clustering analysis,the author of this thesis firstly applies the ECluster GAN single-task clustering tool to the actual application of image clustering and face clustering.The clustering effect is evaluated in the task.Secondly applies the RTMSCC multi-task continuous clustering tool to the practical application tasks of multitask news text clustering and multi-task web page clustering and evaluate its ability to resist catastrophic forgetting and the clustering effect.Although the two algorithms proposed by the author of this thesis can efficiently implement single-task clustering analysis and multi-task continuous clustering analysis,there is still room for improvement.The ECluster GAN algorithm needs to specify the number of task clusters in advance.The next step of the research will improve the adaptability while maintaining the clustering performance,so that the algorithm no longer needs to specify the number of task clusters.The model of the RTMSCC algorithm is relatively complex and needs to store the solution model for each task.The next step of the research will further simplify the model while maintaining the performance of the algorithm,and reduce the algorithm's requirements for computing resources and storage space.
Keywords/Search Tags:Deep Clustering, Generative Adversarial Networks, Lifelong Machine Learning, Continuous Deep Clustering, Model Expansion
PDF Full Text Request
Related items