Font Size: a A A

Research On Big Data Service In Cloud Environment And Its Key Technologies

Posted on:2016-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:W M LinFull Text:PDF
GTID:1228330461460561Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Attracted by its rich inherent value, there is an increasing attention for big data from both academia and industry in recent years. To manage, utilize big data and build big data services are key ways to mine the value from big data. Due to the elasticity, flexibility, and efficiency of the computing model, cloud computing provides a great technical support for developing big data services. On one hand, resources are provisioned on-demand and charged in a pay-as-you-go manner, cloud computing holds the promise to save the huge cost of processing big data, with respect to infrastructure investment and maintanence cost. On the other hand, a bunch of big data processing techniques based on cloud computing (e.g., cloud storage tools, big data analysis tools, etc) provide technical support for developing big data services efficiently.With support of cloud computing technology, researchers have gained great achievements in the filed of developing big data services. Meanwhile, factors such as the dispersibility of data resources, uncertain QoS values of cloud services, and various application requirements pose new challenges for developing big data services in cloud environment. For example,1) in current related work, there is a lack of an application model, which works as a technical reference for developing big data services efficiently; 2) when data resources to be processed for a big data service are distributed in the dispersible cloud environment, a scalable management mechanism of data resource nodes is required, as well as an efficient searching algorithm for collecting those data resources; 3) to deploy a big data service on several cloud services (such as cloud based data centers, Hadoop platforms, etc), we need to select a QoS optimal cloud service composition plan, since there are many candidate cloud services providing the same functionality for a given functional requirement. Traditional QoS-aware service optimization methods conduct service optimization, by using cloud services’ QoS value published by service providers. However, factors such as dynamic cloud environment and providers’possible commercial speculation, have a great affect on the credibility of the composition plan, which is produced by traditional service optimization methods.In view of these challenges, we propose our solutions for developing and deploying big data services in cloud environment, with respect to an application model for developing big data services, a scalable management of data resource nodes and efficient data resource searching method, as well as a set of credible service optimization methods. More specifically, the main contribution of our work is four-fold:1) To build a big data service efficiently with existing data sources in cloud environment, cloud services and big data processing tools, we propose an application model for developing big data services in cloud environment efficiently. The application model contains five levels, which are data resource level, data resource collecting level, task planning level, credible service optimization level, and big data analyzing algorithm’s implementation level, respectively. Concretely, the data resource level refers to data resources distributed in cloud environment. Usually, data resources can be encapsulated as services, and users can get access to data resources through matching service description. The data resource collecting level is responsible for collecting data resources distributed in cloud environment, which are working as data inputs for developing a given big data service. Moreover, the task planning level is designed to divide a complex big data processing task into several functionally independent sub tasks. With the composition plan consisting of functionally independent sub-tasks, credible service optimization level is responsible for mapping each sub task into a cloud service. Each cloud service provides necessary IT resources (in the form of IaaS, PaaS, or SaaS) for the sub task. In addition, it is also the credible service optimization level’s responsibility to select a QoS-optimal cloud service composition plan, among the huge number of candidate cloud services. Finally, in the top level (i.e., the big data analyzing algorithm’s implementation level), we need to design and implement algorithms for big data processing and analysis, so as to finish the implementation and depoyment of a given big data service.2) To meet the needs of providing a scalable management of data resource nodes and efficient data resource searching method in cloud environment, we study how to apply P2P technique into the management of data resource nodes in cloud environment for developing a big data service. Concretely, we adopt an unstructured P2P network as the topology for organizing data resource nodes in cloud environment. To make it easier to conduct searching for data resources, we use services to manipulate data resources. Also, to improve search efficiency, we propose a resource information replicating protocol among neighbor nodes in the unstructured P2P network. With the multiple resource information replicas, we propose a resource searching method in a probabilistic random walk way, to implement a scalable mechanism for searching data resources in cloud environment.3) To improve the credibility of a cloud service composition plan, which works as an infrastructure or platform for deploying a big data service, we propose a credible service optimization method, by taking advantage of a cloud service’s history QoS records, rather than using the tentative QoS values published by the service provider. To improve the computation efficiency, we use a part of QoS records based compostion plans to conduct service optimization, which is denoted as History QoS Records based Service Optimization Method (i.e., HireSome-Ⅰ). With HireSome-Ⅰ, the computation space is reduced, so as to reduce time cost to compute a QoS optimal service composition plan. Based on HireSome-Ⅰ, we briefly introduced an improved vesrion of HireSome-Ⅰ proposed by Dou et al., which is denoted as HireSome-Ⅱ. In HireSome-Ⅱ, K-means algorithm is adopted to select representative QoS records for each cloud service. By using representative QoS records to conduct service optimization, it could greatly reduce the computation complexity, so as to improve the computation efficiency.4) To verify the feasibility of our proposal, we study how to apply our proposal (i.e., the key techniques of big data service in cloud environment) into medical area. More specifically, we study how to devleop a big data service named self-diagnosis service, by processing big medical data (i.e., medical records), with respect to analyze the relations between diseases and symptoms. Firstly, with the application model for developing a big data service proposed in this paper, we obtain the application requirement of the self-diagnosis service; According to the credible service optimization method, we select the QoS optimal cloud service composition plan, to provide required storage resources and computing resources of the self-diagnosis service. Moreover, we design a self-diagnosis service framework, in response to online user request; Furthermore, a formal concept analysis based big data processing method is proposed to analyze big medical records. By searching similar medical records, and computing a self-diagnosis model, we implement the self-diagnosis service, so as to help users to judge which kind of disease he/she suffers from.
Keywords/Search Tags:cloud computing, big data service, credible service optimization, big medical data
PDF Full Text Request
Related items