Font Size: a A A

Research On Data Storage And Optimization In Decentralized Online Social Networks

Posted on:2015-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L FuFull Text:PDF
GTID:1108330479479557Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the last decade, Online Social Networks(OSNs), such as Facebook, Twitter, Sina Microblog and Tencent MicroMsg, have gained extreme popularity with more than a billion users worldwide. OSNs allow a user to publish the data to the friends in his friend circle.Currently, the OSN platforms are typically centralized, where the users store their data in the centralized servers deployed by the OSN service providers. The service providers can utilize and analyze these data to know the users’ private information, such as interest and personal affairs, and in the worst case may sell these information to the third party. Therefore, people have raised the serious concerns in privacy about the current Centralized Online Social Networks(COSNs).Therefore, Distributed Online Social Networks(DOSNs) have been proposed recently as a promising solution to protect data privacy. Although the DOSN products are not as popular and mature as the OSN products, DOSN is indeed under active research and development. In DOSNs, in order to protect the data privacy the centralized servers are bypassed and the data published by a user are stored and disseminated only among the friend circle of the user. Although DOSNs can help protect the data privacy, maintaining Data Availability(DA) becomes a big challenge. This is because if a friend of the user is offline, the data stored in the friend cannot be accessed by other friends.DOSNs have the following characteristics: 1) The users churn highly; 2) The accessing devices have limited storage capacity; 3) Most users have a moderate number of friends(less than 200); 4) The small data dominate and are seldom modified. According to our deep investigation, we found that the existing work mainly focus on the effect of churn on data availability, and ignore the other features.The main contributions of this thesis are as follows.1. Modeling and Predicting the Data Availability in Decentralized Online Social NetworksMaintaining data availability is one of the biggest challenges in DOSN. In the existing work of improving data availability in DOSN, it is often assumed that the friends of a user are always capable of contributing sufficient storage capacity to store all the data published by the user. However, this assumption is not always true in today’s OSNs. In order to protect privacy, user data are only stored in friend circle. The total amount of storage capacity in friend circle is limited due to the following reason: 1) The friends in DOSN are highly volatile, and the number of online friends is limited; 2) nowadays the users often use the smart mobile devices to access the OSNs and the mobile devices typically have limited storage capacity compared with the desktops. The limitation of the storage capacity in friend circle may jeopardize the data availability. Therefore, it is desired to know the relation between the storage capacity contributed by the OSN users and the level of data availability that the OSN can achieve. How to build the data availability model over storage capacity is the first research challenge this thesis is going to address.In this thesis, the data availability model over storage capacity is established. The data availability model can be used by the OSN designers to determine the storage capacity for the published data in order to achieve the desired data availability. Further, the users churn highly, and the number of online friends varies over time, which has an impact on the amount of the storage capacity. This thesis also aims to tackle this issue. A novel method is proposed to predict the data availability on the fly in this thesis. The on-the-fly prediction method can help understand deeply the relation between data availability and storage capacity. Finally, Extensive simulation experiments have been conducted to evaluate the effectiveness of the data availability model.2. Cadros: Cloud-Assisted Data Storage Optimaztion in Decentralized Online Social NetworksIn DOSNs, the data published by a user and the data replicas are only stored in the friend circle of the user. Unfortunately, the total amount of storage capacity in friend circle is limited. Although data replication can improve the data availability, pure DOSNs may not be able to deliver sustainable data availability. How to further improve the data availability in DOSNs is the second challenge to be tackled in this thesis.In this thesis, a Cloud-Assisted Data Storage Opimization scheme, called Cadros, is proposed. In Cadros the Cloud is integrated into DOSNs to improve the data availability, and the erasure code is used to protect the data privacy in the Cloud. When the friend circle cannot meet the data storage requirement, the data will be migrated into the Cloud after being splitted and encoded into many segments using the erasure code. The number of data segments is less than needed to reconstruct the original data. Therefore the data privacy is protected. This thesis conducts the quantitative analysis about the storage capacity of Cadros as the result of integrating the Cloud into DOSNs, predicts the data storage capacity and the storage requirement in friend circle, and further models the level of data availability that Cadros can achieve.3. Research on Data Storage Optimization in Decentralized Online Social NetworksThe prediction results of future data availability only indicate that Cadros has the capacity to achieve such a certain level of data availability. It still depends on the underlying data storage strategy to realize the data availability. The data storage strategy determines how to store the user data. If the data storage strategy is poor, the desired level of DA will not be realized even if Cadros has such ability based on our probabilistic analysis. How to optimize the data storage in DOSN is the third research challenge in this thesis.In order to address this issue, this thesis proposes a cost-aware data partition approach, which partitions the user data into two part: one part is stored using full replication while the other using the erasure code. The data are partitioned in such a manner that the overhead caused by erasure coing is minimized. This thesis also develops the placement strategy for data replicas so that the predicted data availability can be realized. Under the condition of satisfying the data availability, this work further proposes a number of heuristic placement strategies to optimize other performance metrics in Cadros, such as data availability repair cost and load balance.4. Research on Data Storage Optimization in CloudThe Cloud can be regarded as being available permanently. The data availability in the Cloud can thus be regarded as 100%. However, it delivers low performance when the users access the DOSN data stored in the Cloud, because the OSN data are often small data and seldom modified. The DOSN data are often stored and managed in the Cloud by tradition distributed file systems, which deliver low performance when handling massive small data. How to improve the access performance of massive small DOSN data in the Cloud is the fourth challenge in this thesis.In order to address this issue, this thesis analyzes the bottleneck of handling massive small data in the Cloud first. A Flat Lightweight File System(iFlatLFS) is then proposed to manage small data, which is based on a simple metadata scheme and a flat storage architecture. We have implemented i FlatLFS in CentOS 5.5 and integrated it into an open source Distributed File System, called Taobao FileSystem(TFS). We have conducted extensive experiments to verify the performance of iFlat LFS. The results show that iFlatLFS can improve the access performance greatly.To summarize, this thesis studies the data storage optimization problem, aiming to improve the data availability. We build a data availability model over storage capacity, and propose a Cloud-Assisted Data Storage Optimization Scheme(Cadros) to improve the data availability in DOSN. Last, we also study the data storage optimization issues in the friend circle and the Cloud.
Keywords/Search Tags:Decentralized Online Social Network, Friend Circle, Data Storage Optimization, Data Privacy Protection, Data Availability, Data Replica Placement, Erasure Code, Cloud Server
PDF Full Text Request
Related items