Research On Data Drought Key Techniques For Software Effort Data Based On Machine Learning

Posted on:2019-08-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F M Qi

Full Text:PDF

GTID:1368330545999871

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Software Effort Estimation(SEE)is the key step of developing a software project,which had attracted a lot of attentions of many researchers.Although there exist many studies to solve the problems that are contained in the SEE,and the existing methods have obtained interesting results,there still exist many practical problems to be solved in the process of SEE.One of the biggest obstacles in the study of SEE is the data drought issue.This thesis putted forward a plan to alleviate data drought issue from different aspects,and has obtained some valuable research results:(1)The effort data with missing values usually contains a lot of useful information,which can be used for helping the training of an estimator.This thesis from the perspective of making full use of data to propose a new imputation method based on low-rank recovery and semi-supervised regression.The thesis first divides the missing data into three scenarios,including missing data only occurs in the independent variables,missing data only appears in the dependent variables,and both the independent variables and dependent variables are occurring missing data.Then,for different missing data scenarios,low-rank recovery and semi-supervised regression techniques are introduced for imputing the missing values.In addition,in order to making the effort data can be effectively utilized for the introduced methods,this thesis designs a data structurization strategy which can transform the u unstructured data into the structured data with class labels.Experiments are conducted on seven different datasets,and the experimental results demonstrate that the proposed method can get better performance than the traditional methods.(2)Sharing data is one of main ways to relieve the data drought issue,yet privacy disclosure has become the main obstacle during the data sharing procedure.Hence,this thesis from the perspective of privacy-preserving to propose an Interval Covering Based Subclass Division and Manifold Learning Based Bi-directional Obfuscation(ICSD&MLBDO)method.In the procedure,this thesis designs a subclass division method based on interval covering theory to create the ’class labels’ for the effort data,and then introduce the ideals of the classical privacy-preserving methods to protect the privacy of the SEE data.In addition,we design a new bi-directional obfuscation method which further enhances the privacy of the obfuscated data as well as remains the utility of the data.Experiments are performed on seven different datasets,and the experimental results shown that:the proposed approach can protect the data privacy during the procedure of data sharing.(3)Missing data imputation and data sharing are passive methods to relieve the data drought issue.For this reason,this thesis proposes a method based on open-source projects,which is an active method.This thesis from the aspects of filtering the OSPs,designing the cost metrics,and increasing the effort data online to extract the effort data from the open source projects.In the procedure,we propose an effort data incrementation method based on AdaBoost,namely AdaBoost based estimator with CART(ABCART).ABCART modifies the AdaBoost according to the characteristics of the effort data and makes it more suitable for the case of increasing the effort data.

Keywords/Search Tags:

Machine Learning, Software Engineering, Software Effort Estimation, Data Drought Techniques, Open-source SEE Data

PDF Full Text Request

Related items

1	Early Software Effort Estimation Supported By Semantic Analysis Of Requirement Documents
2	Predicting open-source software quality using statistical and machine learning techniques
3	Research On Software Crowdsourcing In Open Source Community
4	FPA-Based Software Effort Estimation Research And Practice
5	An Experience-Based Model For Test Execution Effort Estimation
6	Research On Learning Models For Three Prediction Problems In Open-source Software
7	Extraction, integration, and analysis of software measurement and bug fix data from open source software projects
8	Study And Inplemtation Of Big Data Processing And Management Platform Based On Open Source Software
9	Research And Design On Open Source Community Data Mining Key Technologies
10	Mining software quality data from a large-scale open-source software system