Font Size: a A A

Disk Failure Prediction In Data Centers Via Online Learning

Posted on:2019-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z XiongFull Text:PDF
GTID:2428330563492492Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid expansion of storage systems in data centers makes the originally accidental component failure become the norm.Disks are among the most frequently failed components.Based on SMART(Self-Monitoring,Analysis and Reporting Technology)attributes,many researchers derive disk failure prediction models using machine learning techniques and have achieved noticeable improvements.However,the majorities rely on offline training and fail to adapt to the future pattern of data,since the underlying distribution of SMART attributes changes over time.Consequently,the existing offline models suffer from the model aging problem and lead to declining performance in practice.To address this problem,a novel way for disk failure prediction using the online learning method is presented.There are two main challenges one may encounter when training disk failure prediction models in online mode: 1)How to label the sequentially gathered samples on-the-fly? 2)How to overcome the impact of data imbalance on prediction performance? An automatic online label method and an improved online bagging method are respectively proposed to solve these challenges.Based on Online Random Forests(ORFs),an adapative learning model is actually built to predict disk failure.The ORF-based model can automatically evolve with sequentially arrival data on-the-fly and thus is highly adaptive to dynamic distribution of SMART data.Moreover,it has favourable advantages against the offline counterparts in terms of parallelizability,low memory requirements and superior prediction performance.Experiments on real-world datasets show that the ORF model converges rapidly to the offline random forests and achieves stable failure detection rates(FDRs)of 93-99% with low false alarm rates(FARs).Compared with offline RF models updated with offline updating strategies,the ORF-based models can maintain reasonably lower FARs while achieve comparable FDRs with these periodically updated RF models,however,without the need of model retraining after the initial deployment.Furthermore,we demonstrate the ability of our approach on maintaining stable prediction performance for the long-term usage in data centers.
Keywords/Search Tags:Disk failure prediction, Online learning, SMART, Storage system reliability
PDF Full Text Request
Related items