Font Size: a A A

Research On Cloud Computing For Massive Data Process And Its Key Technologies

Posted on:2014-08-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:C G RenFull Text:PDF
GTID:1228330467980194Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, in many scientific areas, data explosion has become a prominent problem. Although massive data can provide wealthy information, and at the same time, extend the people’s horizons, they also bring many problems for data processing and storage in real-world applications. That is, there mainly are the following aspects:the data from heterogeneous sources usually have different structure or measures; lack of a unified data standardization organization; huge amounts of data in the form of a large number of small files exist in some areas; Also neededsolve the problems of efficient mass data storage. In recent years, cloud computing technology continues to develop, and for massive data processing provides a new and effective method.This paper studies in depth cloud computing theory for massive data. Together with the related frontier ideas, some key technologies have been broken through for cloud computing methods of massive data processing. On this basis, we finally build a set of effective technologies for massive data processing. The main work and results are as follows:(1) Integration of open-source platform for processing and storing massive amounts of data in existing cloud platforms on the basis of their own characteristics, proposed a new cloud-based computing environment massive document processing model C-MSFPM (Cloud computing-Massive Small Files Process Model). The characteristics of the model for small document processing, document classification based on the improved KNN algorithm create a document indexing mechanism, as well as the nearest value similarity principles and the right to file merging algorithm to handle the massive small files.(2) In the the massive small file processing model C-MSFPM based on processing and content for the the file query process complex mapping proposed improvement MapReduce model based on XML and Value value. The model uses the contents of the XML markup data, coordinates, and the action mapping information. Complicated for mass data processing, mapping query time positioning you can check all the information associated with the data, greatly improves the efficiency of data processing by the processing of XML tags and Map process multi Value value. On this basis, for the contents of the mass PDF file query mapping, sorting, and vehicle information data processing, the contrast of the test platform, multiple sets of data, and the tests showed that the algorithm of the model is correct, reliable performance.(3) Coordination mechanisms and virtualized cloud storage for cloud storage, analysis, the concept of virtual storage node storage efficiency value be derived from the performance of the virtual node, and discuss the cloud storage mechanism and task scheduling. We proposed a memory task allocation mechanism based on improved genetic algorithm and cloud storage data allocation strategy based on improved dynamic programming. These two algorithms significantly improve the utilization of the storage nodes and optimize the system load balancing.
Keywords/Search Tags:Massive data processing, cloud computing, task scheduling, load balancing, cloud storage
PDF Full Text Request
Related items