Research On Management Of Logistics Massive Data And Its Application In Cloud Environment

Posted on:2015-08-01

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Pan

Full Text:PDF

GTID:2298330467472389

Subject:Logistics engineering

Abstract/Summary:

In recent years, Internet, mobile Internet and Internet of things have been rapid development.increasing the number of Internet users also makes the increasing amount of data. A single loadcapacity of the machine cannot have good store huge amounts of data, now how to build large scale,high efficiency and good scalability storage system is particularly important. Cloud computing hasbecome a focus of research, and derived a cloud storage, cloud computing also started to cloudstorage at home and abroad made in-depth research. Study of cloud computing and cloud storagestandard reference model is based on the Google File System of the open source implementation ofHDFS Hadoop File System, but there are a lot of shortcomings, Outstanding is a single NameNodeeasy to cause the entire cluster of performance bottlenecks. In this paper, based on the existingresearch of HDFS, proposed a based on directing the NameNode solution, the solution can be a verygood solve the HDFS single NameNode performance bottlenecks. Experiments show that thisscheme can expand on HDFS cluster namespace.At the same time, With the development of social large logistics enterprises,how to dig outthe useful information from these massive amounts of information has become the key to theresearch in this field. Cloud computing has the ability to calculate the flexibility, storage capacity ofquantitative, cost savings, improve efficiency etc, therefore, cloud computing has become aneffective one method of dealing with the problems faced by data mining technology. This paperfrom the two aspects of analysis graphs programming model and Hadoop platform, then dive intothe Mahout, detailed the Mahout internal data representation model and makes further discussion,the K-Means algorithm, parallel analysis was carried out on the K-Means algorithm, detailedelaborated the K-Means clustering in graphs programming realization and application in Mahout.Finally, focuses on the specific situation of the logistics industry in our country, put forward theparallel and serial two modes of data mining, mainly for K-Means algorithm in both cases thecomparison of efficiency of solving the problem of huge amounts of data mining, this article fromthe different distance measure, running time and number of iterations, etc, to assess the K-Meansalgorithm clustering results, and finally found the efficiency difference, can have very goodguidance of huge amounts of data mining.This paper based on directing the NameNode HDFS cloud storage technology and K-Meansalgorithm based on graphs programming model and introduces the data mining technology very good deal with the logistics industry of information storage and computing problem, by calling theHDFS to store huge amounts of data, abundant data with the upper Mahout parallel data mining,digging out the useful information for the logistics industry.

Keywords/Search Tags:

Huge amounts of data, Cloud storage. The distributed file system, Hadoop.Analysis of the logistics, K-Means

Related items

1	Huge Amounts Of Digital Image Processing Platform Based On Hadoop
2	Research On The Key Techniques For Parallel File Storage System
3	The Design And Implementation Of Small File Storage System Based On FastDFS Architecture
4	Research And Appliction Of Multidimensional Data Analysis Based On The Cloud Platform
5	Based On The Hadoop Mass File Storage System Analysis And Design
6	Design And Implementation Of Big Data Cloud Storage Platform Based On Hadoop And SSM
7	The Design And Implementation Of The Traffic Data Management Platform Based On HBase
8	Design And Implementation Of Massive Audio File Storage System Based On HADOOP
9	Huge Amounts Of Sensing Data Management System Based On Hadoop
10	Research Of Digital Museum Architecture Based On Hadoop