Font Size: a A A

Design And Implementation Of Distributed Storage System For Social Network Userdata

Posted on:2022-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q YangFull Text:PDF
GTID:2518306764980179Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the development and progress of the Internet,the amount of data generated by various social network applications has grown exponentially.This kind of data is an intuitive representation of human natural social relations,and has the characteristics of complex mesh topology and cross-media heterogeneity.deeper semantic information.However,there are still some key problems that must be solved in the storage and processing technology of such social network user data,including: 1)How to efficiently crawl the mesh topology and user attribute information of a large number of users in social networks to form Ability to import graph data;2)How to design an efficient storage engine for mesh topology data to improve high-performance querying of social graph data;3)How to divide and optimize large-scale mesh topology data to achieve efficient distribution Graph data is divided according to the graph,thereby reducing the cost of cross-node communication in graph node relationship query.This thesis proposes solutions by analyzing the requirements and existing problems of social network user data acquisition and distributed storage.The main contributions and innovations are as follows:1)The overall architecture of the system is designed,and the system is divided into two modules: data crawling and importing and distributed data storage.It effectively decouples large-scale social network data acquisition,preprocessing,storage and query.Based on distributed scalability The architecture preprocesses and stores social network data;2)An efficient crawling method for social network data is designed.This method mainly uses random user nodes as the starting point for crawling,evaluates the popularity of user relationships based on the content interaction behavior between users,and performs breadth-first crawling on the user's attention list.Finally,the data crawling and import module is realized,which has the full amount of social network data.With incremental crawling,user data global unique ID generation,graph division and data import functions;3)A social network graph partitioning algorithm is proposed.In order to improve the query efficiency after data import and reduce the communication cost between storage nodes,the large-scale social network graph topology is divided based on the idea of minimizing subgraph cut points;4)A graph storage module for social network data is designed.This module includes a graph data physical storage model,a multi-copy fault-tolerant mechanism combined with graph partitioning,a distributed consensus protocol optimized for graph computing,and a graph traversal operator delivery adjacent computing mechanism.Through the above key designs,the distributed graph data proximity storage and hierarchical storage requirements are met,and the graph traversal query performance of the system is further improved.Through the function and performance test of the distributed storage system of social network user data,the system has met the requirements of social network user data storage management.Compared with similar systems,the graph traversal query performance is better,the graph partitioning algorithm has a better balance in the division and distribution of graph data,the communication cost between subgraphs during graph traversal query is effectively reduced,and the graph traversal performance of the system is improved.
Keywords/Search Tags:Social Network, Graph Partitioning, Proximity Storage
PDF Full Text Request
Related items