| LoongArch Instruction Set Architecture(ISA)is China’s first instruction set architecture with complete independent intellectual property rights.The software ecosystem of this architecture is still in its early stages,and many foundational software components have not been optimized for the architecture,resulting in the inability to fully utilize the processor’s performance and affecting the user experience on the platform.This hinders the commercialization and market promotion of the new architecture.On the other hand,glibc,as the lowest-level library in the Linux system,plays a crucial role in the overall system due to the runtime efficiency of its functions.The main research work of this article is as follows:(1)In order to address the issue of low efficiency in the general library functions of the LoongArch foundational architecture,a study was conducted focusing on the string and memory manipulation functions within the glibc library.The analysis involved optimizing various aspects of the implementation process,resulting in a series of optimization methods for low-level functions.These optimization methods primarily take into consideration the CPU’s optimization mechanisms and make effective use of hardware optimization features such as caching,branch prediction,and out-of-order execution.Scalar assembly optimization was performed on the string and memory manipulation functions,based on two platforms,namely Loongson 3A5000 and Loongson 2K1000 la.(2)To address the issue of general functions in the glibc library not fully leveraging the advantages of vector instructions,an exploration of optimization algorithms from other architectures was conducted.Combining the key issues of vectorization,a vector optimization algorithm for string and memory manipulation functions on the LoongArch was designed.This algorithm primarily focuses on memory alignment analysis and improves the performance of functions by ensuring aligned memory access and efficient instruction execution within the core loop.Additionally,it utilizes shuffle instructions to minimize the number of executed instructions,thereby reducing the pressure on memory access and computational instructions.(3)To evaluate the performance of the optimized library functions,two testing suites,namely microbenchmark and unixbench,were used for performance analysis.The microbenchmark suite was employed to assess the actual performance of individual functions,while unixbench was used to evaluate the overall system performance improvement of the optimized glibc library.The optimized functions were tested using these two suites on the Loongson 3A5000 and 2K1000 la platforms.The experimental results indicate that the optimized functions achieve an optimization rate of over 80%compared to the general library functions.In the case of the Loongson 3A5000 platform,the unixbench test showed an average single-core improvement of around 106 points and an average multi-core improvement of around 252 points.(4)To address the issue of performance discrepancies in library functions due to different microarchitectural designs on LoongArch processors,the IFUNC feature of the compilation toolchain was utilized to provide IFUNC support for the optimized library functions.By analyzing the support for library function IFUNC on other architectures,two approaches for implementing IFUNC functionality in string functions were proposed:Processor Identification(Pr ID)and Hardware Capabilities(HWCAP).The Pr ID approach involves reading the processor identification to select specific optimized functions for a particular processor.The HWCAP approach involves obtaining the supported hardware capabilities of the processor to choose the best function implementation for function calls. |