Font Size: a A A

Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Improved VAEGAN

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2428330611453107Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The amount of data in the information age is increasing rapidly.There is a huge amount of data that needs to be analyzed and used.Data clustering is widely used as an efficient data analysis method.Incomplete data may be caused by factors such as abnormal sensors,unstable data transmission,and incomplete data storage,that is,incomplete data sets with missing data attributes.Traditional incomplete data processing methods such as mean filling,expected value filling,and difference Filling etc.still cannot meet the requirements of accurate clustering.Therefore,cluster analysis for incomplete data has high practical significance and application requirements,and has become the research focus of scholars at home and abroad.Firstly,in view of the problem of missing incomplete data attributes and the inability to perform fuzzy C-means(FCM)clustering directly,this paper proposes an improved incomplete data fuzzy clustering algorithm(IVAEGAN-FCM).In order to extract more effective information,the generation model generates more accurate data,and the VAE and GAN network structure are merged.VAE is used as the GAN generator to generate data.The GAN discriminator combines the real data to discriminate the generated model and compare the difference.Feedback to the generator to get the IVAEGAN model.According to the nearest neighbor rule,the nearest neighbor sample set is constructed for incomplete data.The median value of the nearest neighbor sample is used as the feature marker.The feature marker is introduced as a condition variable into the generator model to construct a conditional generator to improve the accuracy of model estimation.The weighted reconstruction of the IVAEGAN loss function is combined with the Warristen distance to improve the model convergence speed and stability.This paper trains the IVAEGAN model through complete data in the incomplete data set to obtain the sample attribute distribution of the entire data set,and then uses the completed generator to estimate and fill in the missing attributes of the incomplete data to obtain the complete data set.Perform fuzzy cluster analysis.Secondly,the incomplete dataset is filled with IVAEGAN estimation to get the numerical complete dataset.However,there are still some errors in the valuation data.In the fuzzy theory,the numerical data can not accurately describe the uncertainty of incomplete data.In order to solve this problem,this paper proposes the fuzzy clustering algorithm of incomplete data(IVAEGAN-IFCM).In the process of filling in the estimation,the absolute value of the average error between the real value of the complete data attribute and the estimated value is taken as the interval size of the estimation.At the same time,the evaluation interval is constrained by the attribute range of the nearest neighbor samples,and the numerical evaluation is transformed into interval evaluation.In order to improve the accuracy of each sample evaluation interval,the density of the nearest neighbor samples in the local area of each sample is calculated and used as the interval factor to dynamically control the size of the interval.The complete data is also transformed into interval data,so that the complete interval data set is obtained.Then interval fuzzy clustering analysis is carried out on interval data set.Finally,the algorithm of this paper is simulated,and the validity of the algorithm is verified by UCI dataset and artificial dataset.The experimental results show that the incomplete data is estimated and filled through the IVAEGAN model to obtain a complete numerical data set.The accuracy of the clustering results is improved compared with the comparison method.And the results of clustering using interval valuation are more accurate than the clustering results of numerical valuation,and the robustness and generalization are better.
Keywords/Search Tags:Incomplete Data, Fuzzy C-means, Variational Autoencoder, Generative Adversarial Network, Estimates Interval
PDF Full Text Request
Related items