Font Size: a A A

A Disk Failure Prediction Method Based On XGboost Optimization

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y J XuFull Text:PDF
GTID:2518306725984749Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development and application of cloud computing technology,a great number of user data are transferred from local to cloud storage and stored in the data center of Internet enterprises.Most data centers generally use disk for storage,so there are a large number of disks in the data center,disk failure is inevitable.Disk failure will bring data loss and bring huge losses to enterprises and users.Therefore,enterprises attach great importance to the research of disk failure.Predicting disk failure in time and replacing disk in advance can reduce the risk of data loss and keep business stable.At present,the popular disk fault prediction method mainly uses the data detected by S.M.A.R.T to predict by classification model,which has high false alarm rate.In general,the fault disk has less data than the normal disk,and the imbalance of positive and negative samples is one of the main reasons for the high false alarm rate of the classification model.At the same time,the super parameter optimization method of classification model is also an important factor affecting the false alarm rate.To solve these problems,this thesis proposes and implements a disk anomaly prediction method.(1)Aiming at the problem of imbalance between positive and negative samples of disk data set,a PKM algorithm based on K-means algorithm is proposed.By undersampling healthy disks,the imbalance ratio of disk data set can be reduced.According to the characteristics of disk data set time series,the k-means algorithm is optimized.Finally,experiments show that the algorithm can balance positive and negative samples effectively.(2)Based on the shortcomings of grid search algorithm and random search algorithm,the SGD algorithm based on grid search algorithm aims to find the optimal solution of hyperparameters.The experimental results reveal that SGD algorithm is able to reduce the time cost significantly while ensuring the prediction rate.(3)This thesis proposes a disk exception method based on the improved xgboost model.The method obtains features through feature engineering,balances the positive and negative data sets through PKM algorithm,finds the optimal solution of hyperparameters through SGD algorithm,and finally trains the model.The experimental results show that the model has high prediction rate and low false alarm rate.
Keywords/Search Tags:disk anomaly prediction, S.M.A.R.T, K-means algorithm, grid search algorithm, XGBoost model
PDF Full Text Request
Related items