Research On Improved K-means Algorithm Based On Anomaly Aetection

Posted on:2020-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:C J Xue

Full Text:PDF

GTID:2428330572495840

Subject:Agricultural Extension

Abstract/Summary:

In recent years,data mining technology has attracted great attention from the information industry.The reason is that we are in an era of explosive data growth.Big data is one of the hottest words nowadays.How to get more quickly from massive data? Valuable information has become a growing concern.As a clustering algorithm widely used in the field of data mining,K-means algorithm divides the data set multiple times according to the degree of similarity between the elements in the data set to be processed,and finally makes the similarity of internal data sets of the same cluster reach.The largest clustering analysis method that maximizes the difference between different cluster data.However,the K-means clustering algorithm has obvious defects: the clustering result is highly sensitive to abnormal data,and the performance of the algorithm is highly dependent on the initial clustering center selection.In view of the shortcomings of the above two K-means algorithms,this paper makes the following corresponding improvements:(1)After the height abnormal data is isolated,the initial cluster center is selected.After inputting the data,the abnormality of each data element is calculated first,and the abnormality coefficient threshold calculation formula is formulated,and filtered according to the selected outlier filtering ratio.The corresponding proportion of outlier data points solves the defect that the K-means algorithm is more sensitive to outliers.(2)The average difference algorithm calculates the initial centroid,and calculates the initial cluster center by the difference method for the normal set of outlier data,and ensures that the initial cluster center is as close as possible to the center of the cluster,thus solving K.The clustering result of the-means algorithm has a strong dependence on the initial cluster center selection.The improved algorithm is combined with multiple sets of real datasets taken from UCI for experimental simulation.By comparing the evaluation results of clustering results by comparing multiple criteria functions,the accuracy of clustering results obtained by the improved K-means algorithm based on anomaly detection is improved by about 12%,the clustering time is reduced by about 8%,which fully proves the feasibility and effectiveness of the improved algorithm.Finally,the experimental results under a variety of outlier filtering ratios are compared,and the optimal clustering effect is obtained when the filtering ratio is close to 10%.

Keywords/Search Tags:

cluster analysis, anomaly detection, K-means algorithm

Related items

1	Research Of Anomaly Detection Based On Flower Pollination And Cluster Analysis Algorithm
2	Research On Data Flow Anomaly Detection Algorithm Cluster-based
3	Research On Intrusion Detection Technology Based On Improved Fuzzy C-means Clustering Algorithm
4	Research And Application Of K-means Clustering Algorithm
5	IoT Platform Anomaly Detection System And Method Based On Log Analysis
6	Research And Application Of Improved K-means Algorithm In Multivariate Analysis System
7	Research On Log-based Anomaly Detection
8	The Research On Fuzzy C-Means Cluster Analysis And Its Applications
9	Research On Intrusion Detection Model And Method Of Unsupervised Learning Based On K-means Algorithm
10	Research Of Intrusion Detection Method Based On Improved K-means Algorithm