Font Size: a A A

Research On Method For Hard Drive Failure Prediction In Massive Storage System

Posted on:2015-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:B P ZhuFull Text:PDF
GTID:2298330467979726Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently, large-scale storage systems generally use multi-replicas or erasure code technologies to provide high reliability. With the growing scale and complexity of storage systems, the traditional redundancy mechanism is difficult to provide sufficient reliability. It has become a big challenge to build highly reliable storage systems. Nowadays, almost all hard drives support SMART failure prediction technology. SMART is the abbreviation of "Self-Monitoring, Analysis and Reporting Technology", which monitors some internal health attributes of the drive and will raise a failure alarm information if any attribute exceeds a certain threshold. However, the prediction accuracy of SMART technology is very limited. It can only predict3-10%of the drive failures at a false alarm rate of0.1%. Some researchers explored various statistical and machine learning methods to build the drive failure prediction models based on SMART attributes. However, while maintaining a low false alarm rate, these models can only predict about60%of the drive failures.This paper attempts to improve and optimize the drive failure prediction model based on Support Vector Machine (SVM). The paper also proposes an Artificial Neural Network (ANN) failure prediction model which is trained by Backpropagation (BP) algorithm. We also optimize the BP-ANN model by using AdaBoost algorithm. The experimental dataset is collected from a real-world datacenter, including SMART records from up to23,395hard drives. In this paper, new sample preprocessing, selection and feature construction methods are used to improve the accuracy of the prediction models. The paper also proposes a voting-based failure detection algorithm, which can effectively reduce the false alarm rate of the prediction models. In order to describe the drive’s health status more precisely (i.e., the failure probability), the paper also tries to use BP algorithm to establish the health degree model for the drive. Experimental results show that the models proposed in this paper have achieved very high performance. Although the optimized SVM model can achieve the lowest false alarm rate (0.03%), the BP-ANN model can get a failure detection rate of more than 95%when keeping a low false alarm rate. In this paper, Markov model is used to calculate the reliability of different storage systems and the result indicates that a drive failure prediction model can significantly improve the reliability of storage systems. In addition, the paper also initially explores the problem of applying proactive fault-tolerant mechanism in real-world massive storage systems.
Keywords/Search Tags:Drive Failure Prediction, Support Vector Machine, Artificial NeuralNetwork, Reliability Analysis, Proactive Fault Tolerance
PDF Full Text Request
Related items