Font Size: a A A

Research On Large-scale Disk Failure Prediction Method

Posted on:2020-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:W R XieFull Text:PDF
GTID:2428330590458323Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Disk failure prediction in large-scale storage systems is critical,and once disk corruption information is lost,it will bring irreparable damage to the enterprise.Based on the disk S.M.A.R.T data,the machine learning method is used to predict the disk failure,and better prediction results are available.However,due to the limitation of the S.M.A.R.T ability,only the information contained in a single time point leads to poor prediction,which cannot reach the industrial application level.In this paper,the timing information is introduced into the disk prediction model,including data timing feature processing and timing model optimization.At first,a data timing allocation strategy is proposed to dynamically allocate time series data.A time series feature processing algorithm is proposed to extend the S.M.A.R.T features.In the prediction part,a timing weight random forest is proposed.Based on the time series characteristics of the sample data,different weights are initialized for different decision trees in the random forest.The mean and variance are used to fit the time series data.A negative feedback update model is proposed.A posteriori decision tree is introduced in the random forest;the negative feedback information is added based on the historical prediction data,and the weight update algorithm is proposed.Based on the above,the disk failure prediction prototype system was designed and developed,which is now deployed in large-scale data centers.Through analysis and experiments,the time series prediction model proposed in this paper is superior to the traditional model.Compared with the traditional model,the recall rate increased by 11.13% under the same false alarm rate,and the false alarm rate was reduced by 52.0% under the same recall rate.Applying prediction to the disk scrub,the mean time to detection is decreased by152.6%,when the accelerated status time is 5.0%,and the mean time to detection is decreased by 217.3% when the disk scrub load increases by 4.8%.It can greatly reduce the disk scrub overhead and reduce the scrub cost.
Keywords/Search Tags:Disk failure prediction, S.M.A.R.T technology, disk scrub, decision tree
PDF Full Text Request
Related items