Font Size: a A A

Design And Implementation Of Log Classification Algorithm Based On Template Extraction

Posted on:2021-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y W LiFull Text:PDF
GTID:2518306308472974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Logs record system events and are widely used for anomaly detection in modern systems.Traditional method of anomaly detection mainly rely on manual checking logs,thus can be highly time-consuming,labor-intensive and extremely difficult.The methods of template extraction,generating log events,and log classification are widely studied in recent years,and have achieved great research results,among which clustering-based template extraction has become a mainstream.However,most clustering methods rely on manually setting the number of clusters in advance,which is difficult to achieve and very inconvenient in real production environment.Methods that do not need to set the number of clusters in advance are usually simple in clustering rules,resulting in poor clustering results.Aiming at above problems,this thesis designs and implements a log classification method based on template extraction.The method is divided into two processes.First,implement log template extraction process based on clustering.In order to solve the problem of dealing with large number of logs with slow clustering speed,a log compression method based on word frequency is used,which effectively reduces the number of logs entering the clustering process and improves clustering efficiency.Then,the log potential function based clustering method is designed.This thesis finds out that,the number of clusters and the number of successfully extracted logs are positively related,so a normalized feature can be used to determine if the number of clusters can meet with the requirements:if so,the clustering process is successfully completed;if not,the number of log clusters is adjusted by using binary search and then redo clustering.In this way,the process of clustering can be completed automatically.Finally,the log template is extracted via clustering results.In addition,a log classification process based on the log template is designed.A log classification template composed of a log template and a keyword dictionary is used to implement the classification process by hash method.Moreover,the online log classification mechanism is designed.If a log can match with an existing log template,the classification is successfully completed;if not,the template mining process is performed,then add the new log template to the existing log template collection,and then reclassify the log.Experiments on real datasets show that the log template extraction algorithm has higher extraction accuracy than the benchmark method,and it has great generalization performance.In the meanwhile,the template extraction-based log classification algorithm has good classification accuracy and efficiency.This thesis also implements the log template extraction and log classification algorithm into a prototype system,consisting of a log access module,a log template mining module,a log classification module,and a web service module.It can implement reading and writing log data,log template mining,log classification,data visualization and user interaction.
Keywords/Search Tags:template extraction, log analysis, log clustering, log classification
PDF Full Text Request
Related items