Font Size: a A A

A Log Statement Level Recommendation Method Based On Large-scale Source Code Mining

Posted on:2020-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2428330590477061Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Logging is a common coding practice in the software development process.It's to record important runtime information during the program executing.Such information is indispensable for software post-modem analyses,for example: tracking and debugging software defect,performance optimization or document critical business transactions,etc.Therefore,logs need to ensure that all the necessary runtime information be fully covered,with on unintended problems occurring(such as performance overhead,log redundancy,etc.).To achieve such effect,a primary way is to set a log level when inserting log statements into the code.However,determining an appropriate log level not only depend on developer's domain knowledge,experience and other personal capabilities,but also be influenced by various objective conditions during the actual development process.Choosing appropriate log levels is a challenging task especially for novice developers.To assist developers in solving this problem,in this paper,we propose a method to recommend the most appropriate levels for log statements based on large-scale source code mining and program context analysis.Main contributions of this paper are as follows:(1)In this paper we utilize the data mining way to analyze various real projects from open source software repositories,and obtain large amount log statement instances with level annotation.Then we conduct static program analysis on the obtained logging instances based on the abstract syntax tree,and extract five kinds of program context features related to log level determination from multiple perspectives of software system architecture.These program features imply abundant factors that developers may consider when determining the level of a log statement.(2)In this paper,we discuss the most common third-party log tools of horizontal comparison.According to the level definition of different Logging tools,all obtained log statement instances are marked with a unified level Label.At the same time,for the unstructured program context features in the dataset,we combine the natural language text process and feature selection techniques to integrate these unstructured context features into structured feature representations.In addition,we utilize the oversampling technique based on SMOTE algorithm and the noise detection technique based on K-Nearest-Neighbor algorithm to solve the data imbalance problem and data quality problem effectively.(3)This paper applies the machine learning algorithm of random forest to the problem domain of logging practice.It is hoped that when a new log statement is added into the source code,we can analyze the program context efficiently and give actionable suggestion for log level determination at once.By the way of grading log messages at source code perspective,we hope to avoid log redundancy effectively while improving the quality of logging.In the end,we conduct serials of experiments to verify the feasibility and effectiveness of this method.
Keywords/Search Tags:Log Statement, Level Recommendation, Source Code Mining, Machine Learning
PDF Full Text Request
Related items