Font Size: a A A

Analysing Correctness Of Implementations Of Machine Learning Algorithms By Machine Learning

Posted on:2021-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:M JiangFull Text:PDF
GTID:2428330647451046Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine Learning(ML)programs play an important role in today's society.Like other kind of programs,bugs widely exist in machine learning programs.As machine learning programs are so widely-used and bugs in them can cause severe loss,it is important to detect bugs in them efficiently.However,identifying bugs in those im-plementations of ML algorithms remains a difficult task.As the output of a machine learning program is a model which is learned from the data,the developer always finds it hard to produce test oracles for checking behavior correctness.Due to the absence of the test oracle,conventional software testing technologies cannot be employed to improve the quality of machine learning programs.This problem is also known as the”no-oracle problem”.Though many researchers have proposed alternative methods such as N-Version Testing or Metamorphic Testing,skilled developers or heavy prices are still needed to detect bugs in implementations of machine learning algorithms.In this thesis,we propose a novel solution based on machine learning to address the no-oracle problem of machine learning programs.The main achievements are listed as follows:1.We proposed a novel generic machine learning approach,named(IBM)~2,to ad-dress the no-oracle problem of ML programs.Our inspiration is that implementations of different ML algorithms can be utilized via machine learning to form a pseudo oracle.Our approach learns to gauge the common behavior of referenced correct implementa-tions on the targeted data sets.We extract behavior features for implementations from their output model's predictions and establish behavior standards that all the referenced correct implementations have in common.Implementations that don't conform to the learned behavior standards are identified as buggy by our approach.The evaluation on the implementations created based on the widely used machine learning algorithms database WEKA demonstrates the effectiveness of our approach.2.To improve the efficiency of our proposed approach(IBM)~2when the behavior feature space is high-dimensional,we proposed a behavior feature selection approach based on meta features of behavior.By selecting the most relevant and least redundant subset of behavior features,the cost to learn the oracle or to apply the learned oracle is reduced.Our approach estimates redundancy of each behavior feature by the meta feature of that behavior,utilizing the property that machine learning models often out-put similar predictions for similar examples.A simple one-class linear model is trained to estimate relevance of each behavior feature.We evaluate our approaches based on our proposed dataset of ML algorithms implementations.Compared to competing ap-proaches,our approaches is more effective.
Keywords/Search Tags:Machine Learning, Software Mining, Test Oracle, Feature Selection
PDF Full Text Request
Related items