Research On Code Representation Learning Towards Software Defect Mining

Posted on:2020-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Li

Full Text:PDF

GTID:2428330575955158

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Code representation learning is one of the key parts in software defect mining which leverages machine learning and data mining techniques to help identifying the software defects.Recently there are many studies about code representation learning.But these models dependent on a huge amount of defect data for learning code rep-resentation,which ignores the fact that defect data is heavily insufficient in practice.Additionally,most of them aim at modeling the static code and there are no model can capture the difference features from code revision,which causes it hard to identify the defect inducing by code revision.Under this circumstances,this thesis focuses on the problem of code representation learning towards defect mining from both static code and dynamic code,and achieves the following innovations:1.For the defect mining from static code,this thesis proposes a novel model reuse framework based on code functional representation to alleviate the burden of insuffi-cient defect data in specific tasks.We first collect a huge amount of free source code and textual comments from open source software repositories to train a general code functional representation model,then we adapt this model to different software defect mining tasks.We get two well trained text-enriched code functional representation models via both directly regression learning and adversarial learning,called RUM and RAM respectively.The experimental results on different software defect mining tasks show that reusing both RUM and RAM for task-specific models can achieve a better performance comparing to the counterpart trained from scratch,especially when the training data is insufficient.2.Different from the static code,the dynamic code contains both source code and modification markers which imply the code revision process.For the defect mining from dynamic code,the difference from code revision are the core features for defect identification.Based on the characteristic of dynamic code,this thesis proposes a novel deep model,called DeepReview,via multi-instance learning to capture the revision feature and can successfully make the prediction(approved or rejected)for code revision in automatic code review tasks.The experimental results on real world datasets verify the effectiveness of proposed approach compared to both traditional models and the state-of-the-art deep models.

Keywords/Search Tags:

Software defect mining, Code representation learning, Model reuse, Multi-instance learning

PDF Full Text Request

Related items

1	A Research On Fine-grained Software Defect Prediction Method
2	Specific Instance Detection Based Multi-Instance Learning And Its Applications To Virtual Props Recommendation
3	Research On Software Defect Prediction Based On Code Representation
4	Research On Multi-instance Multi-labe Learning Based On Feature Learning
5	Research On Machine Learning-Based Software Defect Identification
6	Research On Machine Learning Based Software Defect Prediction
7	Research On Multi-Instance Learning Based On Covering Algorithm
8	Research On The Application Of Multi-instance Learning In Computer Vision
9	The Study Of Chinese Text Representation And Classification Based On Multi-Instance Learning
10	Software Defect Modeling And Prediction In Resource-constrained Scenarios