Font Size: a A A

A Three-way Clustering Filling Method For Incomplete Data

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:F C ZhuFull Text:PDF
GTID:2428330611456086Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,China has entered the information age.Data has exploded in recent years,and data storage and acquisition capabilities have greatly improved.How to obtain valuable information from these data has become a subject that cannot be ignored in the field of scientific research.Cluster analysis has a long research history and has been applied in many fields.But in the actual environment,due to human factors,historical reasons,data acquisition,data storage,data transmission,etc.,data is often missing,which makes clustering difficult.Because the missing data cannot be directly clustered by traditional clustering methods,and the lack of data cannot be avoided.The proper processing of the missing data has become an inevitable technical problem,Will directly affect the quality of the clustering effect.Aiming at this problem,this paper proposes a three-way clustering method for incomplete data.First,the missing data is initialized and filled with the average filling method,and then the k-means algorithm is used to optimize the selection of k clustering center points to cluster the data.This method avoids the situation that the traditional k-means clustering algorithm easily falls into the local optimal solution,and speeds up the algorithm speed.Finally,consider using the complete data in the same cluster in the clustering results to refill the previous missing data to minimize the data error caused by the mean filling method.Traditional data filling often only fills the missing data a single time,and does not make full use of known data information,and the initial selection of k points in the above process still has an impact on the k-means clustering results.In view of this problem,this article chooses to repeat the single clustering step many times,and aligns the members in the clusters of each cluster,and finally gets a more accurate clustering division based on three-way decision.Compared with the three-way decision,the traditional two-way decision is often accompanied by more risks brought by the failed decision.Using this article method can greatly reduce the error brought by the initial point selection.Under the three-way clustering algorithm,different padding strategies are adopted for different regions based on the missing data.Finally,through experimental analysis,it can be concluded that the method in this paper is superior to the comparative method.
Keywords/Search Tags:incomplete data, three-way decision, clustering
PDF Full Text Request
Related items