Font Size: a A A

A Thesis Submitted To University Of Science And Technology Liaoning

Posted on:2016-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z H QinFull Text:PDF
GTID:2308330470980894Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With people’s increasing demands for Internet, many major Internet enterprises take all aspects into consideration from functions to usages,even to users’ habits, resulting in that all the services from product interfaces to users’ experience approach to perfection.Examples such as Amazon can recommend books, Google can recommend related websites, taobao knows our favorite products, QQ can guess whom we know, and recently the popular wechat can add friends through contact list and QQ friends recommendation. What’s more, some software can even predict stock market performance through the data information of social network “Twitter”. However, all of these can only be realized by abundant data analyses, whereas the abundant data, from MB, GB even to today’s PB, still face the problem of data storage. Traditional databases are incompetent of storing such abundant, diversified and decentralized data. Industry giants home and abroad, such as foregin Google, Microsoft, Amazon, domestic BTA(Baidu, Tencent, Alibaba), also take the research of massive data processing as their backend core technology. How to provide higher stability and greater availability services have already become the bottleneck of all enterprises. The question of how to solve data missing, damage and delay needs to be solved imminently.This paper, based on Level db, designs a clustering system, which is applicable to enterprise class data. An enterprise data storage system, applicable to abundant data storage with high realiability and availability, can be realized through using the high-efficiency and stability of Level db, cooperating with Zookeeper and Twemproxy. Master-slaver deployment has been adopted in order to avoid concurrency single points due to high request pressure. This system uses a proxy server to divide a large database into many databases, and then store them in different servers respectively. As a result, every sub-database can be stored in different servers. If some sub-database shut down, then only part of the data will be missing. “One main and two secondary” method has been adopted to deploy sub-clusters, and redundancy storage will be used to prevent three back-ups of each data storage from missing. In other words, if two servers can not work properly, they can still provide completed data set to gurantee the whole cluster work properly. It adopts high efficient and realiable Zookeeper cooperative working system, which based on Fast Paxos, to maintain configuration information, select leader to gurantee the file writing consistency in distributed environment.The throughput capacity and stability of the system has proved to meet expectations through online environment test on a well-known Internet enterprise and data analysis. Every coin has two sides. Using Twemproxy can promote HA, but instead Twemproxy can loss some Level db properties due to the fact that Twemproxy needs some support from hardware resources. Though the experimental data meets expectations, further researches and studies are expected to reduce the loss of properties of Level db to perfect the whole system.
Keywords/Search Tags:Distributed storage, Level db, Twemproxy, Clusters
PDF Full Text Request
Related items