Research On Cloud Computing For Massive Data Process And Its Key Technologies

Posted on:2014-08-01

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C G Ren

Full Text:PDF

GTID:1228330467980194

Subject:Computer application technology

Abstract/Summary:

With the rapid development of information technology, in many scientific areas, data explosion has become a prominent problem. Although massive data can provide wealthy information, and at the same time, extend the peopleâ€™s horizons, they also bring many problems for data processing and storage in real-world applications. That is, there mainly are the following aspects:the data from heterogeneous sources usually have different structure or measures; lack of a unified data standardization organization; huge amounts of data in the form of a large number of small files exist in some areas; Also neededsolve the problems of efficient mass data storage. In recent years, cloud computing technology continues to develop, and for massive data processing provides a new and effective method.This paper studies in depth cloud computing theory for massive data. Together with the related frontier ideas, some key technologies have been broken through for cloud computing methods of massive data processing. On this basis, we finally build a set of effective technologies for massive data processing. The main work and results are as follows:(1) Integration of open-source platform for processing and storing massive amounts of data in existing cloud platforms on the basis of their own characteristics, proposed a new cloud-based computing environment massive document processing model C-MSFPM (Cloud computing-Massive Small Files Process Model). The characteristics of the model for small document processing, document classification based on the improved KNN algorithm create a document indexing mechanism, as well as the nearest value similarity principles and the right to file merging algorithm to handle the massive small files.(2) In the the massive small file processing model C-MSFPM based on processing and content for the the file query process complex mapping proposed improvement MapReduce model based on XML and Value value. The model uses the contents of the XML markup data, coordinates, and the action mapping information. Complicated for mass data processing, mapping query time positioning you can check all the information associated with the data, greatly improves the efficiency of data processing by the processing of XML tags and Map process multi Value value. On this basis, for the contents of the mass PDF file query mapping, sorting, and vehicle information data processing, the contrast of the test platform, multiple sets of data, and the tests showed that the algorithm of the model is correct, reliable performance.(3) Coordination mechanisms and virtualized cloud storage for cloud storage, analysis, the concept of virtual storage node storage efficiency value be derived from the performance of the virtual node, and discuss the cloud storage mechanism and task scheduling. We proposed a memory task allocation mechanism based on improved genetic algorithm and cloud storage data allocation strategy based on improved dynamic programming. These two algorithms significantly improve the utilization of the storage nodes and optimize the system load balancing.

Keywords/Search Tags:

Massive data processing, cloud computing, task scheduling, load balancing, cloud storage

Related items

1	Research On Load Balancing And QoS Oriented Multi-objective Cooperative Task Scheduling In Cloud Environment
2	Research Of Task Scheduling And Link Load Balancing In Cloud Data Centers
3	Research On Load Balancing Of Task Scheduling In Cloud Service System
4	Research On Load-balanced Strategy In Cloud Computing
5	Research On The Cloud Computing Task Scheduling Strategy Based On Multidimensional QoS Constraints
6	Research And Implementation Of Task Scheduling Algorithm In The Cloud Environment
7	Research On Cloud Task Scheduling Strategy Based On Min-Min And Max-Min Algorithm
8	Research On Task Scheduling Algorithm Based On Load-Balance Aware In Cloud Environment
9	Research And Implementation Of Optimized Load Balancing Algorithm In Task Scheduling System
10	Research Of Task Scheduling Strategy On Cloud Cmputing Evironment