Font Size: a A A

Hard Disk Failure Prognosis Based On Ensemble Learning

Posted on:2023-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q LuoFull Text:PDF
GTID:2558307046465434Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Hard disks are the basic equipment of modern data centers and play an important role in storing and managing massive data.Their reliability largely determines the reliability of storage systems.The prognosis methods that can alert system managers before hard disks fail have become a research hotspot.Hard disks generally collect daily operating data through self-monitoring,analysis,and reporting technology(S.M.A.R.T.)systems,and the failure prognose can be achieved by monitoring the changes of S.M.A.R.T.attribute values.However,this method has a very low accuracy in predicting hard disk failures.The most existing methods define hard disk failure prognosis as a time series data classification or prediction problem.Ensemble learning has the advantages of strong generalization ability and high stability in time series data analysis,which is better than the single classification model.An integrated prognosis model is proposed to predict hard disk failure by utilizing the Stacking method to integrate Multivariate Fully Convolutional-Long Short Term Memory Network(MFCN-LSTM)and random forest.The hard disks are divided into two classes:normal and imminent failure,and an alert will be triggered if a hard disk is categorized as imminent failure.The S.M.A.R.T.data have a very large number of attributes.Dimensionality reduction process adapts the mean precision reduction algorithm to select a subset of S.M.A.R.T.data attributes that have the greatest impacts on performance as features.These selected features are used in the following training and classifying processes.Generally,the number of normal hard disks is much larger than the number of failed hard disks,which brings unbalanced data problem.A balance method based on ensemble learning is proposed to make the number of normal and failed samples relatively balanced.The S.M.A.R.T.data sets coming from the different types of hard disks have different characteristics.Utilizing the prognosis model trained by the data coming from one certain type of hard disk to predict the failure of the other type of hard disk will lead to poor performance.To solve this problem,a domain-adaptive model is proposed to transfer the model training by the data coming from one type of hard disk to the other type of hard drives.The distribution difference between the source and target hard disk data is extracted through the hybrid spectrum kernel network,and the difference is reduced through adversarial training.The experimental results show that,the F1 score of the integrated prognosis model improves by 0.0461 and 0.1109 on average compared with MFCN-LSTM and the random forest,respectively.Compared with the non-adaptive model,the domain-adaptive model based on the hybrid spectral kernel network has an average improvement of 0.6911 in the F1 score.Compared with the adaptive model using the multi kernel-maximum mean discrepancies to measure the distribution difference,the F1 score improves by 0.0609 on average.
Keywords/Search Tags:Hard Disk Failure Prognosis, Ensemble Learning, Feature Selection, Data Balance, Domain Adaptation
PDF Full Text Request
Related items