Log Enhancement For Large-Scale Open-Source Software

Posted on:2016-06-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Jia

Full Text:PDF

GTID:2348330536967728

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With software scaling up continuously,logging mechanism has become an indispensable part in the failure diagnosis area,since a pretty similar symptom might be caused by various software bugs,and the most obvious evidence is always logging messages.Meanwhile,most pieces of large-scale software are developed by multiple person,accounting for the fact that logs are mostly written casually depending upon their personalities,instead of being guided by certain conventional specification.Currently,although tremendous attention has already been paid to automated log management,the existing solutions only comply with certain code patterns.Semantic bugs gradually dominate as software becomes mature;these bugs,however,cannot be well handled by simplysummarized code pattern.In this paper,we design and implement SmartLog,an automated log-inserting tool,which is capable of learning program-specific logs,and finding out error-prone locations by statistic information of logging behavior.Our work includes:1.We characterize system logs in five pieces of widely-used software such as MySQL,Subversion,Apache Http,PostgreSQL and Wireshark,and find six log-related observations including log behavior can be affected by context,semantic bugs gradually dominate,multi-developer of software results in file-specific log style,the wide use of errorreturn code influences system log,developers have not always logged for error-prone program point and test module has an effect on log density.2.SmartLog proposes a machine learning method to recognize logging functions automatically,releasing the limitation of existing log tools.Through reasonable method on feature extract and filter,the ability of recognition logging function is 76 X than keyword method,and the F-score reaches 0.93.3.During the log enhancement process,SmartLog recognizes logged snippets based on logging model,having a significant accuracy boost compared with existing method.Sample test shows the false positive rate and false negative rate are 4% and 13%,respectively.Additionally,SmartLog proposes binary checking tree to determine the semantic equivalence of different logging context.The evaluation illustrates that BCT has a decent scalability as well as an high recognition accuracy reaching 97%.Based on the statistics of logging times under equivalent context,SmartLog enhances system log automatically with the consider of performance overhead,code readability and maintainability.Based on little-weight statistic analysis and machine learning method,the complexity of SmartLog scales linearly with lines of code,and costs about 45 seconds per million lines.SmartLog adds 5% additional logs compared with existing logs,and contributes less than 1% performance overhead at the same time.The validity evaluation shows that86% of new logs are considered error-prone by evidences from developers.

Keywords/Search Tags:

large-scale software, log enhancement, code quality, machine learning, static analysis

PDF Full Text Request

Related items

1	Quality Analysis And Improvement Of The Code Clone On Large Software Systems Maintenance
2	Research On GPU-based Parallelization Method Of Large-scale Program Static Analysis
3	Research On Machine Learning-Based Software Defect Identification
4	Research On Static Code Defect Detection Based On Multi-Engine Fusion
5	Research And Implementation Of An Anomaly Detection Platform For Large-scale Software Systems Based On Large Collections Of Log Messages
6	Research On Large-scale Regularized Machine Learning Algorithms
7	Research On Relationship Between Code Quality And Software Defects For Open Source Software
8	The Research Of Automation Technique Of Software Globalization To Large-Scale Legacy System
9	The Static Structure Measurement And Evolution Analysis Of The Large-scale Software In Complex Network Perspective
10	Methods for large-scale machine learning and computer vision