Logic synthesis is a process of converting register transfer level codes into gate level netlists.As the input process of back-end physical design,the delay and area of netlists largely determine the delay and area of back-end layout.In combinatorial logic circuit,the goal of logic optimization is the minimization of the Conjunctive normal form.This optimization is regarded as a Non-deterministic Polynomial hard(NP-hard)problem,which is usually carried out by heuristic algorithm driven by the designer’s experience.Because the optimization of logic synthesis can be abstracted into a sequential decision problem,which needs to search a huge solution space to get the optimal solution,it is difficult to find the optimal solution through the traditional heuristic algorithm.This paper introduces a method based on reinforcement learning--direct policy search,which makes optimization decisions by analyzing massive data flows generated by synthesis tools,independently searches design space,and iteratively optimizes design to find optimal solutions for logic synthesis.Combinatorial logic problems can be represented by graphs,and the depth and number of nodes represent the features of logic.Therefore,this paper uses the characteristics of the graph as the optimization objective to optimize the overall logic.The evaluation of EPFL’s benchmark circuits showed that the scheme reduced the number of nodes by an average of 19.60% over the original design,reduced the depth to the graph by 13.76%,reduced the area after mapping by approximately 29.65%,and reduced latency by 18.93%.Compared with ABC universal script resyn2,the number of nodes in script resyn2 is reduced by 6.70%,the depth by 7.76%,the mapped area by 14.49%,and the delay by 11.67% compared with the original design.The experimental results are not as strong as the reinforcement learning scheme optimization.At the same time,one of the main challenges facing the policy search iterative reinforcement learning algorithm is that the algorithm is sensitive to the amplitude of policy update,with large variance,which is not conducive to convergence.Therefore,the amplitude of parameter update in the training process should be moderate,otherwise the new policy and the original policy are too different in the training process,which will affect the policy learning.Furthermore,the performance of the overall optimization scheme cannot be improved.Therefore,the improved algorithm of policy gradient is used to perform the experiment of Proximal Policy Optimization.The evaluation of the benchmark circuit of EPFL showed that the scheme reduced the number of nodes by 20.25%,the depth of the graph by 13.89%,the area by about 29.96%after mapping,and the delay by 18.70% compared with the original design.Compared to ABC generic script resyn2,the number of nodes is reduced by13.55%,the depth is reduced by 5.89%,the mapped area is reduced by15.47%,and the latency is reduced by 7.03%.The results are not particularly different from those of direct policy search,and the reason is further analyzed.In addition,we also analyzed the combinatorial logic in the sequential circuit and optimized it.The results obtained were compared with those without reinforcement learning optimization.The mapping was carried out in ABC,and the results showed that the scheme area was improved by 21.6%and the delay information was improved by 27.5%.As a complete design of digital circuit,the circuit is optimized by reinforcement learning scheme and then entered into back-end physical design.The experimental results show that the power consumption of the layout optimized by reinforcement learning is slightly lower than that of the original design when there is no timing violation. |