Font Size: a A A

Reward Mechanism Research Of Reinforcement Learning-based Continuous Integration Test Case Prioritization

Posted on:2022-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:1488306602957849Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Continuous integration test case prioritization(CITCP)continuously adjusts the execution order of test cases according to the difference of each integration code,which is a continuous decision-making process.Reinforcement learning(RL)is a machine learning method based on the reward mechanism.The core of RL is to evaluate the current behavior effectively through the reward function and feedback to the agent,which will choose the appropriate follow-up behavior to achieve the maximum expectation.RL method can effectively solve the sequential decision-making problem of CITCP and was applied to achieve rapid feedback in continuous integration testing.The reward and agent are two important components to determine the quality of RL.Reward function design method in reward and reward object selection strategy in agent constitute the reward mechanism of RL.In the combination of RL and CITCP,the reward function is designed based on the current execution result of the test case,and the reward object selection strategy is to only reward the failure test case.The reward mechanism is simple in general.The reward mechanism lacks not only the basic research on the theoretical level of reward function design method and reward object selection strategy but also the practical application plan for continuous integration testing in industrial programs.The reward mechanism is the key to apply RL to solve specific problems.There are great theoretical significance and practical value to systematically study the reward mechanism of RL for CITCP.This paper studies the reward mechanism of RL-based CITCP and proposes a reward mechanism based on the historical execution information of test cases.The reward mechanism takes the historical execution information as the benchmark to calculate the reward function and select the reward objects.The test case execution information is updated along with the continuous integration process,and the whole historical execution information can be constructed as a time sequence,which can more effectively measure the potential fault detection capability of the test case than the current execution result.Based on the time series model,the feature extraction method of the historical execution information is studied with three reward functions proposed.In empirical research,there is high-frequency integration but low-failure test in continuous integration testing of industrial programs,which poses new challenges to CITCP.On the one hand,high-frequency software integration will accumulate a large amount of historical execution information,which not only intensifies the demand for rewarding computing resources but also affects the rapid feedback of continuous integration testing.Based on the correlation between failure test and test failure,the reward function is further designed based on time-series sliding window,and the research is carried out from the static sliding window and dynamic sliding window respectively.On the other hand,with the low-failure test,there are the low number of failure test cases,which will lead to rare reward objects in RL.It is the sparse reward problem of RL.The research of reward object selection strategy is studied based on failure effect.On the basis of rewarding failure test cases,the pass test cases with high failure effect are rewarded to increase the reward objects,which effectively solves the sparse reward problem of RL.The main research contributions of this paper are as follows:1.The CICTP reward function is designed based on the historical execution information of the test case.The historical information with time sequence can obtain the complete historical execution information,and then measure the fault detection ability more accurately.Based on the time series model,we focus on the feature extraction method of historical information sequence,and further the reward functions are studied with historical failure count,historical failure density,and average historical failure distribution respectively.Compared with the reward function based on current execution result,the reward function based on historical information can effectively improve the CITCP effect under the finite increasing of computational overhead.2.The CITCP reward function is designed based on the sequential sliding window.In the continuous integration environment,especially the actual industrial software development,the high-frequency integration will accumulate huge historical information,which increases the demand for reward computation resources.In order to improve the efficiency of the reward calculation in RL,based on the correlation of failure test and test failure,the sliding window-based reward function is proposed with the static sliding window and the adaptive dynamic sliding window respectively.With the effective improvement of the reward calculation,the effect of CITCP is further improved.3.The CITCP reward object selection strategy is studied based on the failure effect.With only failure test cases rewarded,the sparse number of failure test cases leads to the sparse reward problem of RL.The reward object selection strategy is studied based on failure effect analysis.On the basis of rewarding the failure test cases,the pass test cases with high failure effect are further selected as reward objects to be rewarded.There are the test-based total reward object selection strategy,the failure rate-based fuzzy reward object selection strategy,and the test frequency-based additional reward object selection strategy respectively proposed,which realize the effective convergence of RL.4.The empirical studies are carried out on industrial programs.This paper collects 14 industrial data sets,including 4 public data sets used in the previous study and 10 data sets collected from the actual continuous integration test logs.On the one hand,the actual industrial data sets put forward challenges for empirical studies.On the other hand,the empirical research based on the actual industrial data sets provides practical application cases for the reward mechanism and provides support for the effectiveness of the research results.
Keywords/Search Tags:test case prioritization, continuous integration testing, reinforcement learning, reward mechanism, sliding window, sparse reward
PDF Full Text Request
Related items