Research On Carpooling Method Of Massive Location Data Based On Hadoop

Posted on:2017-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:M Q Tan

Full Text:PDF

GTID:2348330503460607

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Nowadays, we are living in an age with an explosion of information. With the development of the Internet ?the mobile devices and the internet of vehicles, the massive amounts of data in the people's living life was recorded. The rapid growth of data has given a data perspective to know the rule of the world, which is also a big challenge for the computer technology. As a consequence, big data technology has been one of the hottest techniques in recent years. The big data technology not only can store hundreds of TB data or even hundreds of PB data; but also have a few of different computing framework including the off-line computing framework(hadoop), the flow computing framework(storm) and memory computing framework(spark) to deal with different demands about data processing. The big data which means massive data inludes typical features: the massive size of data, a variety of data types, the fast data transfer and dynamic system data, and the greate value of data. What's more, with the number of the cars increasing fast, massive vehicle location data was recorded everyday. How to take full advantage of those massive vehicle location data to discover the rule hiding in those data has become an important research direction of the big data.At first, the paper did research on the hadoop distributed file system(hdfs)?distributed computing framework(mapreduce)?ressurce management system(yarn); And then introduced the distributed data warehouse hive which can store massive data; At last presentation the mahout technology which has encapsulated various machine learning methods. Through the research on big data technology, I installed the Big Data Analytics Platform including 5 computers. This paper analyzed the car driving location data with the big data analytics platform. I chose the data warehose- hive to store the massive location data,and took means of mapreduce and hql to clean the massive location data; Through the reasearch on the vehicle location data in workdays, the paper used the mapreduce to get the home location and the company location of the car owners,and then chose the kmeans clustering method which was the classic methods in mahout to made recommends about the scheme which the passenger and the car owner have the same path from home to company; Through the reasearch on the vehicle location data,this paper employed the cluster method based on Hausdorff distance to assign travel demands to specific vehicle, and after that the cluster method based on matching degree was proposed to choose the most appropriate travel demands for each car.

Keywords/Search Tags:

Hadoop, mahout, location data, kmeans clustering, carpooling

PDF Full Text Request

Related items

1	Research Of Clustering Algorithm Based On Mahout
2	Kmeans Analysis Of Massive Book Circulation Data Based On Hadoop
3	K-Means Algorithm Design And Implementation Based On Hadoop And Mahout
4	Design And Implementation Of Data Mining Algorithm Under Big Data Platform
5	Oneof Text Clustering Algorithm Based On Big Data
6	The Research Of Parallel Clustering Algorithm Based On Hadoop Platform
7	Application Of Improved Clustering Algorithm Based On Hadoop In Web Log Clustering
8	The Research Of Clustering Mining Based On Logistics History Data On The Hadoop
9	A Research And Implementation Of Recommender System Based On Mahout And Hadoop
10	Research On Parallel Clustering Algorithm On Hadoop Platform