Font Size: a A A

Research On Improved K-means Algorithm Based On Anomaly Aetection

Posted on:2020-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:C J XueFull Text:PDF
GTID:2428330572495840Subject:Agricultural Extension
Abstract/Summary:PDF Full Text Request
In recent years,data mining technology has attracted great attention from the information industry.The reason is that we are in an era of explosive data growth.Big data is one of the hottest words nowadays.How to get more quickly from massive data? Valuable information has become a growing concern.As a clustering algorithm widely used in the field of data mining,K-means algorithm divides the data set multiple times according to the degree of similarity between the elements in the data set to be processed,and finally makes the similarity of internal data sets of the same cluster reach.The largest clustering analysis method that maximizes the difference between different cluster data.However,the K-means clustering algorithm has obvious defects: the clustering result is highly sensitive to abnormal data,and the performance of the algorithm is highly dependent on the initial clustering center selection.In view of the shortcomings of the above two K-means algorithms,this paper makes the following corresponding improvements:(1)After the height abnormal data is isolated,the initial cluster center is selected.After inputting the data,the abnormality of each data element is calculated first,and the abnormality coefficient threshold calculation formula is formulated,and filtered according to the selected outlier filtering ratio.The corresponding proportion of outlier data points solves the defect that the K-means algorithm is more sensitive to outliers.(2)The average difference algorithm calculates the initial centroid,and calculates the initial cluster center by the difference method for the normal set of outlier data,and ensures that the initial cluster center is as close as possible to the center of the cluster,thus solving K.The clustering result of the-means algorithm has a strong dependence on the initial cluster center selection.The improved algorithm is combined with multiple sets of real datasets taken from UCI for experimental simulation.By comparing the evaluation results of clustering results by comparing multiple criteria functions,the accuracy of clustering results obtained by the improved K-means algorithm based on anomaly detection is improved by about 12%,the clustering time is reduced by about 8%,which fully proves the feasibility and effectiveness of the improved algorithm.Finally,the experimental results under a variety of outlier filtering ratios are compared,and the optimal clustering effect is obtained when the filtering ratio is close to 10%.
Keywords/Search Tags:cluster analysis, anomaly detection, K-means algorithm
PDF Full Text Request
Related items