With the rapid development of artificial intelligence technology,algorithms have increasingly higher requirements on the computing power of chips.The slowdown or even failure of Moore’s Law makes it more and more difficult for chips to rely on advanced manufacturing processes to improve performance.The resulting new forms of computing pose greater challenges to the power consumption and performance of data center servers.As a special heterogeneous computing method,CPU+FPGA has huge advantages over traditional computing architectures in terms of computing performance,real-time performance,and energy efficiency ratio.However,as an external hardware device,the traditional FPGA mainly communicates with the central processor through the PCIe bus interface,and usually requires a lengthy process of establishing a connection to the device,thereby causing a problem of reduced data transmission efficiency.On the other hand,with the development of deep learning,people began to try to use deep learning methods to solve advertising recommendation problems.The Deep FM algorithm model is a recommendation algorithm that combines deep neural networks.This algorithm can target users based on their characteristics.Recommend the content of interest,with better accuracy than traditional recommendation algorithm.With the increase of available information data,the amount of data to be processed by the Deep FM algorithm has increased sharply.In large-scale data calculation,the traditional central processor has the problems of slow calculation speed and high delay,which is difficult to meet the realtime requirements in actual application scenarios.In view of the above problems,this paper proposes a FPGA heterogeneous acceleration method based on the Deep FM algorithm model of CAPI technology.Based on the principle of CAPI technology cache consistency,a hardware and software collaborative data communication framework is proposed.This framework is used for data transmission between CPU and FPGA,which greatly reduces the communication delay between CPU and FPGA.At the same time,developers can replace algorithms as needed Nuclear acceleration logic,with scalability.Based on the above software and hardware collaborative data communication framework,a hardware parallel architecture is designed to accelerate the inference stage of the Deep FM algorithm.The FPGA systolic array structure is used to accelerate the internal core calculation of Deep FM in parallel,and the index matrix-based factorization machine FPGA calculation method is used to optimize the calculation.Form,increase the calculation rate.The final test shows that the software and hardware cooperative data communication framework based on CAPI technology has higher transmission bandwidth and lower communication delay in data transmission between CPU and FPGA.The inference speed of the Deep FM algorithm based on heterogeneous implementation of CPU+FPGA has significant hardware acceleration performance and robustness compared to the central processor. |