Font Size: a A A

Research And Application Of Log Classification Based On Machine Learning

Posted on:2020-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:J X PeiFull Text:PDF
GTID:2428330626950744Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the increasing scale of modern production,more and more log texts will be produced in the production process.Therefore,log analysis is essential in the production process.At the same time,these log texts have the characteristics of large amount of data complex log content and high analysis cost.Introducing machine learning technology to log analysis can greatly reduce the workload and analysis difficulty of actual analysts.The main research results include the following four points:(1)According to the practical application scenarios and the characteristics of log text,the thesis summarizes the problems of unstructured,unbalanced and over-fitting of single classification algorithm for log text.To solve these problems,a set of log analysis model based on ensemble learning is proposed.(2)In view of the unstructured and unbalanced characteristics of log text,this thesis proposes an improved unbalanced algorithm KS-SMOTE,which uses Word2 vec to represent the primary vector,Bi-LSTM is used to obtain the feature vector.SVM is used to process the data set.Classification,recognition and elimination of noise samples,then SMOTE algorithm is used to process classification samples,and a new set of samples is combined according to clustering algorithm.The experimental results show that the classification effect of KS-SMOTE is better than that of SMOTE.(3)Aiming at the over-fitting problem of the traditional single classification algorithm,this thesis proposes an improved three-level Stacking algorithm.By changing the input attribute representation,the sample number is enlarged and the feature dimension is reduced.By comparing with classification algorithm and the original Stacking algorithm,the experiments show that the three-tier Stacking algorithm is superior to the other two algorithms in accuracy,precision and F1 value.(4)Log analysis method is applied to practical engineering.In order to improve the accuracy and make the text analysis model better meet the project requirements,this thesis proposes a log analysis system based on integrated learning to solve the imbalance,so as to show the engineering effectiveness of log analysis method.
Keywords/Search Tags:Machine Learning, Text Classification, Imbalance, Integrated Algorithms, Stacking
PDF Full Text Request
Related items