Font Size: a A A

Machine Learning Based Log Analytics And Bug Forecast For EMC Storage Systems

Posted on:2015-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2308330461957929Subject:Software engineering
Abstract/Summary:PDF Full Text Request
EMC is an information storage technology consulting company whose main business is sales storage equipment, service and solution. With the rapid development of the business, more and more enterprises or individuals choose to purchase EMC storage devices to do data storage and backup. Storage devices run daily logs and upload them to EMC data center. Once the devices get an exception, the log is the only way used by support engineers to maintain and locate the bug. With the increasing number of log files, the difficulty for support engineers to do artificial maintenance and bug location is greatly increased. Also, more bugs also greatly affect the user satisfaction of EMC products.As the logs report the general information and runtime condition everyday, an occurrence of bug can be indicated by some extraordinary data or data undulation. This project of the paper is about log analytics and bug forecast based on machine learning which is aimed to discover the bugs of the devices forwardly and to prevent the happening of the bugs. This paper used the bug about disks as the subject of the study, and splited the project into two parts to do disk bug forecast. The first part of this project is to do log data extraction, to use regular expressions to match and extract data of the paragraph about disks of the log, and then to store the data in Greenplum database. The second part of the project is to do machine learning with the data extracted by Weka, and to build a model to do bug forecast. This study used the decision tree and bayes algorithm of Weka to learn the training data and build models. Also, it compared with merits and demerits of the models, and then chose the greatest one and used the algorithm to build a model and do bug forecast with real data. The main contributions of the thesis are as follows:(1) Propose a solution for log extraction and log forecast, make accurate description and definition of log extraction and forecast requirement about disk bug.(2) Design and implement a framework for log extraction and storage witch supports on-demand log content for expansion.(3) Use Weka to train machine learning models and chose the bayes net algorithm to build the final model with a correctness of about 88% and a false negative rate less than 13%.
Keywords/Search Tags:Machine Learning, Log Analytics, Weka, Decision Tree, Bayes, Greenplum
PDF Full Text Request
Related items