Research On And Application Of Clustering Algorithms Based On Deep Learning

Posted on:2022-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:H Fei

Full Text:PDF

GTID:2518306608959389

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Clustering algorithm is an unsupervised learning algorithm,which is commonly used to process massive data and extract valuable cluster information from it.With the explosive growth of data information,a series of data-driven research fields such as computer vision,natural language processing and computational bioinformatics have emerged a large amount of data,and the corresponding cluster analysis tasks are also increasing.The deep learningbased clustering algorithm(deep clustering algorithm)takes advantage of the inherent characteristics of the highly non-linear transformation possessed by deep neural networks,and maps the original data to a new feature space to better complete the clustering analysis.The deep clustering algorithm solves the problem of insufficient processing capacity of the traditional clustering algorithm for massive high-dimensional data to a certain extent,but still has the shortcomings of high model complexity and unstable training,which greatly limits the application of deep clustering algorithms range.The continuous deep clustering algorithm is designed to solve the multi-task situation in practical application problems and process continuous information flow,while being able to resist the catastrophic forgetting of new tasks that will cover old tasks.In summary,the study of deep clustering algorithms and continuous deep clustering algorithms has important theoretical significance and practical application value.The topic of the thesis comes from the National Natural Science Foundation of China.The author of this thesis proposes a deep clustering algorithm and a continuous deep clustering algorithm,and implements the two proposed algorithms into clustering tools to apply to image clustering,face clustering,multi-task news text clustering and multi-task web page clustering issues,and verify the practical application effect of the algorithm.The main work and innovations of this thesis are as follows:(1)In order to improve the accuracy and efficiency of the deep clustering algorithm,the author of this thesis proposes a deep clustering algorithm: Enhanced Cluster GAN(ECluster GAN).The overall training GAN generate adversarial loss and backprop decoding to form a discrete continuous clustering loss network structure to achieve latent spatial clustering;for the shortcomings of insufficient model training stability,a Dynamic Gradient Penalty(DGP)Generative Adversarial Nets WGAN-DGP is proposed.Further improves the model's ability to fit the original data distribution;at the same time,the L2 loss is used for the backprop decoding algorithm,which improves the model's ability to reconstruct the latent space in high-dimensional situations,with a smaller time cost to complete cluster analysis with higher accuracy.(2)In order to make the deep clustering algorithm have lifelong learning ability in multitask continuous clustering analysis,the author of this thesis proposes a continuous deep clustering algorithm based on model expansion: Related Task Model Selection Continuous Clustering(RTMSCC).Use the latent space clustering algorithm ECluster GAN as the problem solving model,and use the gating autoencoder to realize the related task recognition function,help the current task identify and activate the problem solving model of the related previous tasks,and further complete the matching based on the knowledge retained by the model.The clustering of the current task achieves the effect of resisting catastrophic forgetting while ensuring high-precision clustering.(3)In order to verify the actual application effects of the above two algorithms in single-task clustering analysis and multi-task clustering analysis,the author of this thesis firstly applies the ECluster GAN single-task clustering tool to the actual application of image clustering and face clustering.The clustering effect is evaluated in the task.Secondly applies the RTMSCC multi-task continuous clustering tool to the practical application tasks of multitask news text clustering and multi-task web page clustering and evaluate its ability to resist catastrophic forgetting and the clustering effect.Although the two algorithms proposed by the author of this thesis can efficiently implement single-task clustering analysis and multi-task continuous clustering analysis,there is still room for improvement.The ECluster GAN algorithm needs to specify the number of task clusters in advance.The next step of the research will improve the adaptability while maintaining the clustering performance,so that the algorithm no longer needs to specify the number of task clusters.The model of the RTMSCC algorithm is relatively complex and needs to store the solution model for each task.The next step of the research will further simplify the model while maintaining the performance of the algorithm,and reduce the algorithm's requirements for computing resources and storage space.

Keywords/Search Tags:

Deep Clustering, Generative Adversarial Networks, Lifelong Machine Learning, Continuous Deep Clustering, Model Expansion

PDF Full Text Request

Related items

1	Research On Clustering Methods Based On Deep Learning
2	Clustering Algorithm Analysis Based On GAN And VAE
3	Gan-based Enhanced Deep Subspace Clustering Networks
4	Research On Small Sample Radar Signal Recognition Based On Unsupervised Deep Clustering
5	Feature Learning Methods Based On Deep Generative Networks
6	Research And Application Of Image Recognition Method Based On Deep Generative Adversarial Networks
7	Research Of Adaptive Facial Image Beautification Based On Generative Adversarial Networks
8	Design Of Optimization Scheme For Deep Learning Model In Image Field Under Visual Guidance
9	Research On Practical Adversarial Examples Generation Based On Deep Learning
10	Research On Adversarial Generation Network Based On Similarity Evaluation