Font Size: a A A

Research On Disk Failure Prediction Method Based On Unsupervised Learning

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhaFull Text:PDF
GTID:2568306914460074Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As the main storage device for data storage in data centers in the era of big data,the failure of disk will have a serious impact on the whole system,and even the loss of data will occur,which will cause a serious impact on the stable operation of data centers.At present,the accuracy of disk self-warning mechanism is low,which cannot meet the actual operation requirements of data center.As a relatively stable storage medium,disk has a low probability of failure in a certain period of time,which leads to fewer samples of fault disk available,high data dimension and complex data distribution.Existing disk anomaly detection methods have the following problems:the detection accuracy is not high when there are local anomalies,wrapped anomalies and other special anomalies;under the condition of high-dimensional sparse data distribution,the anomaly cannot be effectively located and the interaction between positive and abnormal samples is ignored;it is easy to ignore the local structure of data distribution and difficult to identify abnormal clusters.In view of the above problems,this paper studies the method of disk fault prediction based on unsupervised learning.The research results are of great significance for predicting disk fault and ensuring the stable operation of disk storage system.The main research works are as follows:1)An unsupervised disk anomaly detection method based on local density measurement is studied.An incremental disk anomaly detection method based on edge sample density measurement is proposed to solve the problem that the existing methods are difficult to detect special anomalies.Firstly,the isolation region was established by looking for the nearest neighbor of each sample.For each test point with non-global anomalies,the nearest training point and the nearest training point of the training point were found in combination with Euclidean distance,and the ratio of the radius of the two points in the region was taken as the global measure of the anomaly of the test point.Secondly,the ratio of the nearest distance between the test point and the edge of the nearest training point to the radius of this area was taken as the local measure of the anomaly at this point,and the anomaly score of the test point was obtained by combining two measurements.Finally,mutation point detection was carried out on the samples judged to be abnormal to screen out the SMART attribute which was obviously related to disk failure and increase its weight in the next iteration.Several artificial data sets and public data sets were used as test data to test the method,and compared with many existing typical methods,the advantages of the method in anomaly detection performance were analyzed.2)An unsupervised disk anomaly detection method based on partition refactoring measure is studied.Aiming at the problem that the existing methods are difficult to accurately locate the sample points in the highdimensional sparse environment,a disk anomaly detection algorithm based on neighborhood partitioning and isolation reconstruction was proposed.Firstly,the disk SMART information was collected and the effective disk attribute features were selected to form the data set.Then,the stable disk training set was obtained by exponential smoothing process.Secondly,combined with Euclidean distance,the disk isolation region is constructed and the global exceptions are classified.Finally,for the test points with non-global anomalies,combined with the relative region locations of samples before and after the reconstruction of the isolated region,the anomaly measurement values before and after the reconstruction in this region were calculated,and then the anomaly score was obtained.By comparing with the existing classical anomaly detection methods on artificial data sets and open data sets,the experiment verifies the ability of this method to effectively locate samples in the high-dimensional sparse environment,and proves the effectiveness of this method.3)An anomaly detection method for unsupervised disk based on complex space partition is studied.Aiming at the problems that the existing methods are easy to ignore the local structure of data distribution and difficult to identify clusters of anomalies,a disk anomaly detection method based on the potential degree of spread is proposed.Firstly,the concept of potential degree is proposed,and a directed potential degree chain is established based on the nearest neighbor principle,and the sample space is divided by chain structure.Secondly,the potential peak point of the highest potential degree and the potential isolation point of the lowest potential degree were determined on the potential degree chain.The potential peak point was extracted iteratively to generate potential degree circles of different levels.The degree of abnormality of the sample points was measured according to the spreading difficulty of the position of the sample points and the difference of the potential degree.Finally,an anomaly detection experiment is designed on multiple artificial data sets and public data sets in the case of complex data distribution,and the results verify the effectiveness of the proposed method.
Keywords/Search Tags:disk failure prediction, unsupervised anomaly detection, the density metric of edge samples, partition reconstruction, the potential degree of spread
PDF Full Text Request
Related items