| As the Internet has gradually penetrated various industries since the 21 st century,we have entered a data-driven era.In this era,the data generated by the Internet is experiencing explosive growth,with social-related graph-structured data occupying a significant portion.Efficient access to a large amount of graph data poses a highly challenging task.In current mainstream graph databases,the storage of graph data is mostly built on keyvalue storage to construct graph storage systems or native graph storage architectures built on file systems.The core requirement of a typical database system is to successfully and efficiently persist data to disk.Many functionalities in file systems are not essential for database systems and even conflict with some functionalities of database systems.When using file systems,coordination with the operating system’s cache is required,which may overlap with the database system’s internal caching strategy.The file system’s I/O call chain is relatively long,which also leads to lower efficiency in database systems based on file system storage.To address the aforementioned issues,this paper proposes a hybrid graph storage engine that bypasses the file system for graph data storage based on the disk I/O theory of the Linux operating system.Acceleration optimization is applied to the push-down operators,and a design is made for the upper-layer distributed traversal framework.The main contributions of this paper are as follows:1.Based on the structure and characteristics of graph data storage,this paper designs distributed transactions and achieves snapshot isolation level for database reading.To address the problem of high communication cost between push-down graph traversal operators and the computation layer,this paper designs a distributed graph traversal framework at the storage layer,reducing the communication cost between the storage layer and the computation layer,and achieving efficient distributed transactions and graph traversal modules.2.To address the inefficiency of building storage systems based on file systems,this paper designs a native hybrid graph storage engine that bypasses the file system for graph data storage.It allows direct interaction with the block device layer during I/O and is accompanied by the design of an object cache and block cache system to improve cache hit rate during graph data traversal.3.For the push-down filtering operators in storage layer,this paper abstracts them as set intersection and union operations.Considering the sparse set characteristics often present in data filtering in graph databases,this paper designs an operator that efficiently utilizes CPU parallel resources for computation and optimizes memory allocation during operator computation.4.In the experimental stage,this paper conducts detailed performance testing on the hybrid graph storage engine and storage layer push-down operators,and compares the overall traversal performance of the graph database with mainstream graph databases.Experimental results confirm that the proposed hybrid graph storage engine effectively improves the storage and retrieval performance of the graph database,and the optimization of push-down operators significantly enhances the performance of graph database filtering calculation. |