| The construction of UHV interconnected power grid in China resulted the coupling of crossregional and cross-provincial power grid to become more and more closely.At the same time,in order to avoid the abandonment of renewable energy sources,it is necessary to carry out unified optimization control within the whole network.Therefore,the technical requirements for unified analysis of the whole network has been put forward in Power System Dispatching.The unified analysis of the whole network is bound to greatly increase the computational complexity and time.In order to ensure the real-time of power system control,it is necessary to use highperformance computing technology to accelerate all aspects of network analysis.As the basis of online network analysis,power system state estimation needs to form a set of high-performance parallel computing methods under the background of unified analysis of the whole network.The traditional partition and block parallel algorithms are limited by the degree of parallelism of the algorithm and the parallel computing capability of the CPU,and cannot achieve the desired acceleration effect;the Graphics Processing Unit(GPU)is a new type of parallel processing with high bandwidth and super floating-point computing capability.It has successfully accelerated application scenarios such as electromagnetic transient simulation and power flow optimization,and has the potential to accelerate state estimation.Based on the difference of hardware pipeline working mode between GPU computing intensive and communication intensive computing tasks,this paper establishes a roof model for the macro design of GPU algorithm,and analyzes the irregular characteristics and task attributes of computing tasks such as graph search in power system topology analysis and sparse matrix operation in power system state estimation in parallel;Then,a general optimization criterion for GPU parallel algorithm design is proposed,including maximizing the extraction parallelism,minimizing the number of thread bundle branches,improving thread utilization to increase the roof limit of the roof model,improving memory merge access mode,and making full use of merged memory to increase the slope rate of the roof model.Based on the above algorithm design optimization principles,this paper makes a detailed optimization design for the three main links of topology analysis,least square state estimation and bad data detection in state estimation,and makes innovations on the following contents:1.In order to fully extract the degree of parallelism in the power system topology analysis,according to the characteristics that the plant topology can be carried out independently,a general strategy of GPU-accelerated computing in two steps of plant topology and network topology is proposed.In the design of the plant site topology analysis algorithm,the adjacency list storage format of the irregular graph is firstly reconstructed,and the plant site information compression array data structure matching the GPU access mode is designed.This algorithm fully exploits the parallelism between elements of the adjacency list of physical nodes in the plant,the parallelism between physical nodes at the same layer in the plant,and the natural parallelism between the plants in the power grid.Waiting for three degrees of parallelism,combined with the actual number of physical nodes in the plant,the thread configuration method is optimized and designed,which greatly improves the utilization efficiency of GPU computing resources.In the network topology step,an algorithm model of the breadth search algorithm of precursor array based on thread synchronization is designed.The results of the calculation example show that the above-mentioned plant topology analysis algorithm achieves a 6-fold speedup compared to the 8-thread CPU algorithm on a 30,000-node system,and can achieve a speedup of more than4 times in the entire topology analysis process.2.According to the feature that the plant and network topology can be carried out independently in the power system topology analysis,a parallel plant and station topology analysis idea based on the natural division of the plant and station is proposed.The discontinuous storage format of adjacency list of irregular graph of topology problem is optimized,and a compressed array structure containing all plant and station system information is designed.On this basis,the parallel breadth search algorithm based on the traditional precursor array is improved,a kernel function is used to realize the parallel plant topology,and the parallelism between the elements of the adjacency table of the physical nodes in the plant,the parallelism between nodes and the parallelism between power grid plants and stations are fully exploited.Through reasonable thread configuration,thread utilization increases,load balancing is achieved,and overall performance is improved.In the network topology analysis,since the parallel between plants and stations no longer exists,a GPU algorithm based on thread synchronization for breadth search of precursor arrays is designed.The experimental results show that the method proposed in this paper achieves a 6-fold speedup compared to the 8-thread graph partition algorithm in the plant topology analysis in a 30,000-node system,and also achieves a speedup of more than 4 times in the entire process.3.A GPU-accelerated sparse matrix inversion and sparse matrix dense matrix multiplication calculation method are proposed for the frequently occurring sensitivity matrix calculation in bad data identification of power system state estimation.On the basis of GPU-accelerated single sparse linear equation solving algorithm,a GPU matrix inversion algorithm based on two modes of matrix scale expansion and thread-task cooperation is proposed.Data tests show that the design mode based on thread-task cooperation has a better memory access effect,improving the performance of irregular algorithms.At the same time,this mode is also used in the acceleration design of sparse matrix multiplication by dense matrix.The experimental results show that the numerical acceleration method designed in this paper achieves an acceleration effect of more than 30 times compared with an 8-thread CPU,and achieves a second-level calculation effect in the robust state estimation algorithm for cases with a scale of more than 10,000 nodes,which has extremely high practical application value. |