Font Size: a A A

Research On Incomplete Data Clustering Algorithm Based On Improved GAN And Fireworks Algorithm

Posted on:2022-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2518306773981309Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In data clustering,many factors such as data collection equipment failure and imperfect data storage can cause incomplete data,i.e.,the data is missing several attributes.If the missing data are ignored directly,the effective information will not be fully utilized,resulting in the inability to mine the potentially important information in the missing data and eventually affecting the clustering accuracy,so it is of great significance and value to study incomplete data clustering.In this paper,we propose a fuzzy clustering algorithm for incomplete data by improving Generative Adversarial Network(GAN)to fill incomplete data and optimizing fuzzy C-mean(FCM)initial clustering center using the improved fireworks algorithm for the problem that incomplete data cannot be directly used for fuzzy clustering and the initial clustering center is randomly selected.First,to address the problem that incomplete data sets cannot be directly FCM clustered,this paper proposes an improved incomplete data filling algorithm for generating adversarial network(IGAN-FCM).First,to make the model get more potential information of the missing data and improve the model filling accuracy,to achieve the prediction filling of the missing values of incomplete data.The mean values of the nearest neighbour sample attributes of the missing samples are added to the generator of the GAN model.Second,to increase the similarity measure between the real data and the generated data and force the generator to generate as much real data as possible,the loss function of the generator of the GAN model is reconstructed.To improve the model training speed,the adaptive weighting strategy is proposed to be added to the loss function of the GAN model generator to obtain the IGAN model.The improved GAN model is trained using the complete attributes in the incomplete dataset so that the model generates as realistic data as possible to fill the missing attributes.Second,cluster analysis is performed on the complete dataset filled by the IGAN model.Since the FCM clustering algorithm is sensitive to the initial cluster centres,inappropriate initial cluster centres can easily make FCM fall into a local optimum.To this end,a fuzzy clustering algorithm(IFWA-FCM)that improves the fireworks algorithm to optimize the initial clustering center of FCM is proposed.Since the fireworks algorithm can achieve a good balance between local and global search,the fireworks algorithm is used to optimize the initial cluster center of the FCM algorithm.Although the fireworks algorithm has strong local and global search capabilities,the fixed explosion radius coefficient and a single firework mutation method may also make it fall into a local optimum.Therefore,the dynamic explosion radius coefficient is proposed in the explosion stage of the algorithm,Cauchy variation is introduced in the mutation stage,and the improved fireworks algorithm is used to optimize the FCM to complete the fuzzy clustering analysis.Finally,this paper uses the Blood,Breast,and Bupa datasets in the UCI,KDD public air quality,and MIMIC-derived datasets for comparative experiments.The experimental results show that the proposed IGAN-FCM has improved clustering accuracy compared with the four classical incomplete data clustering algorithms under different missing rates.Furthermore,the proposed IFWA-FCM and IGAN-FCM have higher clustering accuracy and better generalization performance in different datasets.
Keywords/Search Tags:Missing attribute, Generative Adversarial Network, Fireworks Algorithm, Fuzzy C-means Clustering
PDF Full Text Request
Related items