Research On Host Penetration Test Of LAN Based On Partially Observable Markov Decision Process

Posted on:2023-12-26

Degree:Master

Type:Thesis

Country:China

Candidate:B W Jia

Full Text:PDF

GTID:2558306845499694

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

As a method of evaluation of cyberspace security key penetration testing environment faced by the network topology and critical assets is more and more complex,expend resources growing exponentially.To reduce the professional knowledge in the process of penetration test and artificial experience,improve the degree of automation,how to make use of new technologies such as artificial intelligence planning and decision making means to realize the automation of penetration testing,become the focus of domestic and foreign scholars and industry,has the vital significance.This paper mainly carries out research from the following two aspects:First,the hierarchical penetration planning based on Partially Observable Markov Decision Process(POMDP)is proposed to solve the state space explosion problem of POMDP in large-scale network penetration testing.At present,the penetration test method based on POMDP ignores the constraint effect of the network connection between LAN hosts on the state space,thus affecting the overall penetration test planning of POMDP.In this paper,host vulnerability exploitation and network connectivity are integrated in state space modeling.On this basis,four-granularity hierarchical network topology division including partition,subnet,host and state action is designed to achieve global penetration planning method based on recursive mechanism.and the global optimal solution of POMDP is obtained by dynamic programming.The operability of penetration test based on POMDP is improved to a certain extent.Second,the penetration planning of reinforcement learning based on partially observable Markov decision process is proposed to improve the automation degree of penetration test based on POMDP.At present,reinforcement learn-based penetration test automation methods ignore the dynamics of vulnerability exploitation time window and the uncertainty of vulnerability exploitation result,thus affecting the effective evaluation of state transition probability.In this paper,based on the hierarchical penetration planning of POMDP,reinforcement learning is introduced to construct automatic penetration and its optimization method.Double DRQN(DDRQN),an automatic penetration programming model for reinforcement learning based on Long Short Term Memory(LSTM)and Double DQN(DDQN),was proposed to solve the problems of instability and overestimation in training process.In terms of optimization,bootstrap sampling is used to carry out in depth exploration mechanism based on DDRQN model,so as to explore a better infiltration planning scheme,which provides certain reference for improving the performance of infiltration test automation.In view of the method proposed above,this paper uses Tensorflow to build LAN penetration simulation environment,and uses APPL toolkit to generate POMDP model to carry out sufficient experimental verification.In the hierarchical penetration planning based on POMDP,the experimental results verify the effectiveness of hierarchical decomposition.The running time of POMDP solver is about 63 seconds under the scale of 80 units,and the attack success rate is about 80%.In the infiltration planning of reinforcement learning based on POMDP,the experimental results prove that the running time of the DDRQN model is about 95 seconds on 145 hosts.The DDRQN model based on bootstrap sampling can converge earlier when the K value is 20 in the infiltration test,and the DDRQN model has good performance.

Keywords/Search Tags:

Penetration Testing, Partially Observable Markov Decision Process, Bootstrap Sampling, Reinforcement Learning

PDF Full Text Request

Related items

1	Heuristic Learning Model Based On Partially Observable Markov Decision Process
2	Deep Value Iteration Network For Partially Observable Markov Decision Process
3	Research On Optimization Of Service Composition Based On Partially Observable Environment
4	Research On Partially Observable Coordination Based On Reinforcement Learning
5	Research And Implementation Of Penetration Testing System Based On Reinforcement Learning
6	Deep Reinforcement Learning For Partially Observability
7	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
8	Hierarchical learning and planning in partially observable Markov decision processes
9	Markov Theory Based Planning And Sensing Under Uncertainty
10	Learning partially observable Markov decision processes using abstract actions