Font Size: a A A

Research On Sub - Library Strategy Of Social Network Database Based On Cluster Analysis

Posted on:2017-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:S LiangFull Text:PDF
GTID:2278330488466904Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of science and technology and society, social network plays a more and more important role in people’s daily life. Many people use social networks as the main way to gain and share information and communicate with others. Database, as the underlying data carrier for social network, along with the development and the expansion of social networks, it occurs with the rapid growth of the massive data storage and access, which has become the bottleneck of system design. To improve the performance of the system, the horizontal expansion of the data layer has become the preferred way of the major Internet Companies.The conventional horizontal database sharding is mainly physical, including the segmented sharding and "hash" sharding, which can achieve load balancing of the databases, and keep a good scalability in the large-scale distributed server array. So it is widely used in major companies.But the social network is a special subject, many events are showing some similarities, namely the similar users are more prone to certain behavior. In accordance with the conventional horizontal database sharding method, the similar subjects is evenly dispersed to various databases, so that, in the execution of these events information query, it will spend too much time in order to access multiple databases, and caused a certain amount of waste on the database connection resources. For huge database, this time and connection resource depletion must not be ignored.By cluster analysis, the data is aggregated together according to the similarity, which makes the actors of the events in the social network exist in one or a few databases as much as possible, thus it reduces the number of accesses to the databases, improves the efficiency of query; In addition, sharding based on clustering analysis of data is performed from the aspect of data storage to improve query efficiency and it only need the corresponding calculation on the attributes instead of the whole detailed grasp of the business logic of the system, which is relatively simple to operate; Finally, the clustering analysis has a lot of mature algorithms, which provide the necessary support for the efficient processing of massive data.In this paper, the clustering analysis is used in the database sharding, not only achieving load balancing, non data migration, but also enhancing the database query efficiency. The main work is as follows:Firstly, it gives the thought and the process of the conventional database of the vertical and horizontal sharding as well as the theory of cluster analysis and common algorithm; then, cluster analysis of initial data by K-MEANS clustering algorithm, and load the data into databases; then, based on "cluster centroid distance" and a new definition of the "weighted databases distance", set the "fuzzy distance limit" on the new data to select the target database based on clustering effect and the load balancing; then, achieve non expansion of database migration by using the "authentication table" and the maintenance of "classified information table" and "databases information table"; finally, through the experiment it shows:database sharding based on cluster analysis does not only realize load balancing and non data migration, but also improve the similarity of the data in the same database (clustering), which reduces the database connection and improves the efficiency of data processing.
Keywords/Search Tags:social network, database sharding, clustering analysis
PDF Full Text Request
Related items