Research And Application Of Anomaly Detection Technology For Disk Failure Prediction

Posted on:2021-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:X Jing

Full Text:PDF

GTID:2518306308963739

Subject:Mechanical engineering

Abstract/Summary:

PDF Full Text Request

As the main data storage device in current cloud services,big data and other environments,disk has an extremely wide range of applications.Disk failure will affect system services and even cause data loss,which will have a serious impact on ensuring data security and stable business operations.Although the defense and early warning mechanisms such as RAID and SMART are used in the disks,the accuracy of the existing mechanisms is low,and the reliability of the system is still seriously affected.In actual operation scenarios such as data centers and dispatch centers,disks,as a relatively stable storage medium,have fewer fault history data,many models,and different numbers.The existing anomaly detection methods for building disk failure prediction models have the following problems:Due to the high dimension of disk SMART attributes and weakly related attributes,the predictive ability of the anomaly detection model is reduced;For local anomalies,parcel anomalies,or dense data distribution,the existing algorithms have low detection accuracy;Most disk transfer learning algorithms only use single-source domain data for offline transfer learning.The effect of model transfer learning is greatly affected by the correlation between the source domain and the target domain.Existing multi-source domain transfer learning algorithms often have anomaly samples in the target domain.In view of the above problems,this paper conducts research on anomaly detection technology for disk failure prediction.The research results are of great significance to improve the reliability of disk storage systems and ensure their safe and stable operation.The main work of the thesis is as follows:(1)The methods of disk data preprocessing and attribute filtering are studied.According to the characteristics of the disk SMART attribute data,based on the relevant statistical analysis and preprocessing.Aiming at the problem that many disk attributes are prone to the dimensional curse interference model's prediction ability and the need to sort the attribute correlation in the early stage,an algorithm for calculating the attribute isolation degree is proposed.The algorithm constructs the optimal isolation tree based on the attributes of dimensioned data,and calculates the isolation degree of the attribute according to the different depth of fault data in the leaf node of the isolation tree.The deeper the depth is,the more difficult it is to isolate the fault data from the normal data.The lower the isolation degree is,the more difficult it is to distinguish the positive and anomaly data in this dimension.Based on this,the attributes are sorted to reduce the impact of weakly correlated attributes on the model's prediction effect.(2)The method of disk failure prediction based on unsupervised is studied.In order to solve the problem of low detection accuracy when the existing methods solve local anomalies,parcel anomalies,or a data distribution,this paper proposes an anomaly detection method based on the isolation tree backtracking neighbor data extraction and distribution probability similarity measure.Firstly,the forest architecture is obtained based on the idea of isolation tree integration.Then,the leaf node of each tree where the test data is located traces back to the ancestor node of the depth threshold,and takes out all the normal training data under the node to form a data set to measure the anomaly degree of the test data.Next,taking the test point and a point in the data set as the endpoint,the probability of other data points appearing between the two points is calculated in each attribute dimension,and the anomaly value of the point is obtained by calculating the dissimilarity between the test point and all points in the data set with min's distance.Compared with the existing typical anomaly detection algorithms on UCI open dataset and three composite data sets,the effectiveness and the advanced nature of the proposed method are verified.(3)A disk anomaly detection method based on multi-source domain transfer learning and incremental learning is studied.Aiming at the characteristics of the SMART attribute data distribution of different models of disks of the same manufacturer with certain cross-distribution,a disk anomaly detection algorithm based on dynamic re-weighted multi-source domain transfer learning and incremental learning of a sub-model was proposed.The algorithm consists of two core components:Multi-source domain transfer learning component and incremental learning component Aiming at the problem that a small number of positive labeled samples in the target domain can't effectively evaluate the detection performance of the model,a transfer learning algorithm based on data distribution similarity and dynamic index weighted integration is proposed.Firstly,the nearest neighbor data of the negative data of each target domain is extracted in hyperspace.Then according to the number of positive data in the target domain,we use different test standards for each sub model,and finally we reweigh the sub model exponentially according to the detection ability of the model In the incremental learning component,an incremental learning algorithm based on online automatic mark is proposed,which automatically mark the pop-up data of the current sliding window and automatically updates the model By comparing the pre and post transfer learning,robustness of multi-source and single source models,and comparing the existing algorithm with the existing advanced non incremental learning methods,the effectiveness and advancement of the proposed method are illustrated.

Keywords/Search Tags:

disk failure prediction, SMART attribute, similarity measure, multi-source transfer learning, incremental learning

PDF Full Text Request

Related items

1	Research On Disk Failure Prediction Method Based On Multi-dimensional Features
2	Disk Failure Prediction In Data Centers Via Online Learning
3	Research On Hard Disk Failure Prediction Method Based On Improved Random Forest Algorithm
4	Research On Cross-Project Software Defect Prediction Based On Multi-Source Transfer Learning
5	Analysis And Research Of Disk Failures In Data Centers
6	Design And Implementation Of Disk Failure Prediction System Based On Machine Learning
7	Predicting Disk Failures For Large-scale Datacenter By Machine-learning Method
8	Research On Disk Failure Prediction Based On Cost-sensitive Learning
9	Transfer Learning From Multiple Source Domains
10	Study On Source Domain Selection Strategy Of Transfer Learning Based On Similarity