| With the rapid development of informatization and the emergence of massive data,how to extract important information from the production process has become the primary task of data mining.In the field of machine learning,neural network and deep learning techniques have made significant progress in anomaly detection of high-dimensional sparse data.This paper combines the advantages of deep learning and traditional machine learning to study anomaly detection methods for high-dimensional sparse data,solve the problems of data imbalance and parameter tuning complexity and low accuracy in high-dimensional sparse data anomaly detection,and avoid the impact of dimensionality disaster on traditional machine learning Anomaly detection impact.This dissertation combines deep learning with traditional machine learning,and proposes an anomaly detection method based on improved K-means clustering as well as an anomaly detection model based on autoencoder and data augmentation.The main research content of this paper is as follows:This dissertation proposes an abnormal detection method based on improved K-means clustering(IK-means)based on robust principal component analysis(RPCA)data reconstruction strategy to address the issue of low accuracy of abnormal detection in highdimensional sparse data.First,the IK-means anomaly detection method utilizes RPCA algorithm to extract features from high-dimensional sparse data.Then,the initial center of mass was selected based on both distance and density,and the K-means algorithm was improved using iterative updating of the center of mass to overcome the influence of edge points on the clustering effect.Finally,calculate the value of each cluster for anomaly detection and identify outliers in the sample.The results show that the average accuracy score of the method is 0.9361 on the UCI datasets.compared with other methods,the method not only improves the accuracy,but also identifies normal and abnormal data more precisely.This dissertation proposes an anomaly detection model based on autoencoders and data augmentation(Smote Attention Autoencoder Outlier Detection,SEAOD),to address the sample imbalance and parameter tuning complexity issues in high-dimensional sparse data anomaly detection,where missing values and noise exist.The SEAOD model consists of three modules: data augmentation,attention mechanism,and encoding-reconstruction detection.The data augmentation module generates high-quality training data by filling up minority-class samples using the weighted SMOTE algorithm and ENN algorithm.The attention mechanism module determines the structure of the neural network by calculating the feature weights of the data,allowing the model to better learn the feature information during training and further solve the problem of complex parameter tuning.Lastly,the encoding-reconstruction module reduces the dimensionality of data based on autoencoders and implements high-dimensional sparse data anomaly detection using a weighted KNN algorithm to address the impact of high dimensionality on the accuracy of the detection results.The experimental results show that the model outperforms other comparative algorithms on 15 publicly available datasets,and has been validated on datasets in different fields.The model has achieved the expected results and is practical. |