The wave of industrialization,informatization and intelligentization has swept the world,and there is an urgent need to realize bounded low-delay network communication in the delaysensitive fields of vehicle,airborne and industrial control.The Time Sensitive Network(TSN)developed by IEEE 802.1 Working Group appears to be the most likely solution.At present,the TSN standard defines the relevant mechanism of traffic scheduling,but does not give the specific algorithm and routing policy implemented by the mechanism.The traffic scheduling of TSN still relies on manual calculation to a large extent.In this thesis,the time synchronization standard of TSN and traffic shaper and other related theories and protocols are introduced.The gating scheduling mechanism in the standard of time-aware shaper is analyzed and studied,which is one of the core of time-sensitive flow to be able to carry out bounded and low delay transmission.The main content of TSN traffic scheduling research in this thesis is to plan the transmission path for data streams with different priorities and determine the transmission time slot of each port on the path,so as to generate the gated scheduling table.Therefore,two methods are proposed in this thesis.The first is a traffic scheduling method based on satisfiability mode theory.After determining the transmission path of the flow in advance,the problem of determining the transmission time slot of the flow is defined as a constraint solving problem,and then the method of satisfiability mode theory is used to solve the constraint.The solution contains the transmission time slots for each port of the stream on its transmission path.However,when the number of streams increases and the scale of the network increases,the performance of this method will decline sharply,and the solution cannot be solved in a reasonable time.The second is the traffic scheduling method based on reinforcement learning,which will determine the transmission path and time slot of data flow for joint solution.An agent contains two policy networks,the edge selection policy network and the slot selection policy network,to output the edge and the slot respectively.The time-slot selection policy network depends on the action output by the time-slot selection policy network,and the reward obtained after the action output by the time-slot selection policy network is selected.The action output of the edgeselection strategy network is not a complete path,but an edge on the path,which is the next edge that the stream will pass through.In order to make the input state of the strategy network include the global information of the whole TSN network,the graph neural network is used to extract the state of each edge and all the reachable edges.Then,the edge selection strategy network is updated by using the method based on the policy gradient,and the time slot selection strategy network is updated by using the PPO algorithm.Finally,a simulation environment was built to verify the above two methods,which proved the feasibility and good performance of the above two methods.However,when the scale of the network increases or the number of data streams to be scheduled increases,the scheduling method based on reinforcement learning will have less solving time and better scheduling results. |