Font Size: a A A

Research On Code Representation Learning Towards Software Defect Mining

Posted on:2020-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2428330575955158Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Code representation learning is one of the key parts in software defect mining which leverages machine learning and data mining techniques to help identifying the software defects.Recently there are many studies about code representation learning.But these models dependent on a huge amount of defect data for learning code rep-resentation,which ignores the fact that defect data is heavily insufficient in practice.Additionally,most of them aim at modeling the static code and there are no model can capture the difference features from code revision,which causes it hard to identify the defect inducing by code revision.Under this circumstances,this thesis focuses on the problem of code representation learning towards defect mining from both static code and dynamic code,and achieves the following innovations:1.For the defect mining from static code,this thesis proposes a novel model reuse framework based on code functional representation to alleviate the burden of insuffi-cient defect data in specific tasks.We first collect a huge amount of free source code and textual comments from open source software repositories to train a general code functional representation model,then we adapt this model to different software defect mining tasks.We get two well trained text-enriched code functional representation models via both directly regression learning and adversarial learning,called RUM and RAM respectively.The experimental results on different software defect mining tasks show that reusing both RUM and RAM for task-specific models can achieve a better performance comparing to the counterpart trained from scratch,especially when the training data is insufficient.2.Different from the static code,the dynamic code contains both source code and modification markers which imply the code revision process.For the defect mining from dynamic code,the difference from code revision are the core features for defect identification.Based on the characteristic of dynamic code,this thesis proposes a novel deep model,called DeepReview,via multi-instance learning to capture the revision feature and can successfully make the prediction(approved or rejected)for code revision in automatic code review tasks.The experimental results on real world datasets verify the effectiveness of proposed approach compared to both traditional models and the state-of-the-art deep models.
Keywords/Search Tags:Software defect mining, Code representation learning, Model reuse, Multi-instance learning
PDF Full Text Request
Related items