Font Size: a A A

Parallel Access Strategy For Big Data Objects Based On RAMCloud

Posted on:2019-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChuFull Text:PDF
GTID:2428330566467029Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous deepening of the era of big data,the amount of data has continued to grow exponentially,and the data dimension has also continued to increase.Therefore,the demand for rapid and real-time analysis of the data faces the new challenges.Enhancing the real-time,accuracy,and interactivity of data storage services is one of the important requirements in the field of the data analysis.However,most of the current big data storage technologies are designed based on the HDFS-based distributed file system,and the application performance on many storage layers is also restricted by the high latency of the file system and the disk interaction.RAMCloud is a new type of storage system that can store all data in the cluster server's memory.It can speed up data access and effectively solve the bottleneck of large disk access latency.In addition,the continuous decline in memory prices in recent years has made it possible for RAMCloud to mature and become commercially available.Using the RAMCloud to access data objects,the size of the object must be limited because the RAMCloud only supports the small-block data objects with a storage capacity of up to 1 MB.Once the size of the data object exceeds 1 MB,the object cannot be directly stored in RAMCloud clusters.However,most objects that need to be accessed in real life are large data objects larger than 1 MB.In order to make full use of the fast access characteristics of the RAMCloud and to solve the problem of the limited memory size,a parallel access strategy for big data objects based on RAMCloud was proposed.This method is divided into two modules,which are the storage and reading modules of large data objects.Among them,the storage module makes full use of the idea of data segmentation.First,the large object is divided into a number of small-sized data objects that can be directly stored by the RAMCloud.The segmentation process is performed at the client,and the corresponding data digest is generated at the same time.Then the parallel computing strategy is used to store all the segmented data objects in the RAMCloud cluster.The process of reading the module corresponds to the stored process in the opposite direction.First,all the small-block data objects are read from the RAMCloud using the parallel strategy.The reading process needs to be performed according to the data digest,and then all the read data is merged to obtain the original chunk data objects.The experimental analysis shows that the method has a storage speed of 16-18 microseconds and the read speed can reach 6-7 microseconds under the original RAMCloud cluster architecture.And under the InfiniBand network architecture,the parallel strategy of this method can make the access of large data objects have the same level of speed as the access of small objects.At the same time,the linearly increasing speedup ratio indicates the high efficiency of this method.With the rapid development of mobile Internet,how to get effective description information from a large number of mobile applications becomes urgent.This description information can provide effective and accurate recommending strategy for users.At present,recommending strategy is relatively traditional,mostly based on the single attribute,such as downloads,application name and application classification,etc.In order to resolve the problem that the granularity of recommended applications is too coarse and is not accurate,a method that combines RAMCloud with the LDA topic model was proposed.The method started from the application labels,structure topic model distribution matrix of application,and utilized the topic model distribution matrix to structure mobile application similarity matrix.Meanwhile,a method which can convert the mobile application similarity matrix to the viable storage structure was also proposed.Extensive experiments demonstrate the feasibility of this method.And this method achieves 130 percent higher similarity comparing with other applications which are recommended by the existing application market.The proposed method solves the problem that the recommended granularity is too coarse in the mobile application recommendation process,so that the recommendation result is more accurate.At the same time,it also proves that the RAMCloud accelerates the training speed of machine learning and provides online application hot-switching capabilities,making it possible to combine the RAMCloud with the machine learning applications.
Keywords/Search Tags:RAMCloud, big object, parallel algorithm, topic model, label
PDF Full Text Request
Related items