Research On Disk Failure Prediction Method Based On Multi-dimensional Features

Posted on:2022-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Yang

Full Text:PDF

GTID:2518306572990889

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In the era of big data,the scale of data center is growing day by day,and the requirements for the reliability of storage medium are higher and higher.The frequency of disk failure in large data center is higher and higher,which reduces the reliability of storage system.In order to reduce the impact of disk failure,researchers at home and abroad use machine learning and statistical methods to establish a disk failure prediction model based on SMART(Self-Monitoring Analysis and Reporting Technology)attributes,by actively predicting disk failure to avoid the impact of disk failure,and achieve good results.However,there are still some problems in the existing research.The previous research is based on the existing SMART attributes for modeling,and in the process of disk running,the later datas no longer conform to the previous change pattern,which makes the accuracy rate and recall rate be lower and lower,and the false alarm rate be getting higher and higher in the actual running of the model,finally leading to the occurrence of model failure.Aiming at the problem of model failure,a disk failure prediction method based on multi-dimensional features is designed by this dissertation.The method is composed of two parts: a correlation analysis algorithm between disk I/O and disk failure and a sampleweighted random forest algorithm based on time weights.First,the paper analyzes the changes and limitations of the SMART attributes of disks in the actual data center,and introduces the disk I/O attributes as an auxiliary prediction data set,which improves the adaptability to the changing disk data in the real data center and compensates for the SMART attributes.Furthermore,a sample-weighted random forest algorithm based on time weights is proposed,which makes it have higher weight for new samples,so as to adapt to the changing laws of new samples in time.In order to overcome the problem of model timeliness,a model iteration plan is designed,which can select the best iteration point and data amount when the model is re-iteratively trained,so that the model iteration can be trained according to the most reasonable data set.Experiments show that the disk failure prediction method based on multi-dimensional features has an accuracy rate of 4.1% to 6.8% higher than other machine learning algorithms on the basis of a failure alarm rate(FAR)of less than 1%,and the recall rate is improved.3.9%?5.7%.After the introduction of disk I/O attributes,the accuracy of the model increased by 4% to 6%,and the recall rate increased by 4.1% to 5.2%.The model iteration scheme proposed in the paper can keep the false alarm rate of the disk failure prediction model below 1%,and the accuracy rate of the model is increased by 3% to 18%.Compared with other iterative schemes,the false alarm rate is reduced by 3.2% to 5%..

Keywords/Search Tags:

Disk failure prediction, Machine learning, SMART attributes, Feature selection, Random forest

PDF Full Text Request

Related items

1	Research On Hard Disk Failure Prediction Method Based On Improved Random Forest Algorithm
2	Research On Disk Failure Prediction Based On Cost-sensitive Learning
3	Research And Application Of Anomaly Detection Technology For Disk Failure Prediction
4	Design And Implementation Of Disk Failure Prediction System Based On Machine Learning
5	Predicting Disk Failures For Large-scale Datacenter By Machine-learning Method
6	Research On Feature Selection Method Based On Random Forest
7	Research On The Application Of Weighted Random Forest In Employee Turnover Prediction
8	Random Forest Feature Selection
9	Disk Failure Prediction In Data Centers Via Online Learning
10	Analysis And Research Of Disk Failures In Data Centers