Font Size: a A A

Predicting Disk Failures For Large-scale Datacenter By Machine-learning Method

Posted on:2016-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y K LiuFull Text:PDF
GTID:2348330479954728Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, data becomes more and more valuable. Preventing the loss of data is more important than ever before. More than 90% data of the world is stored in disks. As disk failure prediction shows great value for providing high reliability to datacenter storage system, many previous works focus on improvement of the prediction performance.Researchers explore various methods to make better prediction performance, and the Failure Detection Rates(FDRs) of these methods are up to 90%~95% with False Alarm Rates(FARs) as low as 0.5%. However, they pay too much attention to classify disks into health and fail soon, and ignore lead time which indicates how much time is left before a failure occurs. Without the consideration of lead time, those previous works are hard to be adopted in disk management of large-scale datacenter because of too long lead time. When applying previous prediction methods to real-world datacenter management, we find that it is beneficial to datacenter management with a precise disk failure time. On one hand, long lead time increases cost of disk replacement indirectly; on the other hand, precise lead time makes detailed failure recovery available. It is necessary to predict disk failure in our prediction system, at the meantime, predict the failure time.With the consideration of more attributes, the performance of our failure prediction is more stable and higher. FDR is up to 96% with a 0.02% FAR. Then, we present a method to predict failure time with a logistic regression model. Only when disk failure will occurs in a given time, we raise failure alarm. In evaluation, we test whether real disk failure will occur in next seven days after the failure alarm. We obtain a result that FDR is 71.32% with a 6.32% FAR. And we decrease WDN(Wasted Disk Number) of disk failure prediction by 83.7%.
Keywords/Search Tags:Disk Failure Prediction, Failure Time Prediction, Machine Learning
PDF Full Text Request
Related items