Font Size: a A A

Research On Carpooling Method Of Massive Location Data Based On Hadoop

Posted on:2017-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:M Q TanFull Text:PDF
GTID:2348330503460607Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, we are living in an age with an explosion of information. With the development of the Internet ?the mobile devices and the internet of vehicles, the massive amounts of data in the people's living life was recorded. The rapid growth of data has given a data perspective to know the rule of the world, which is also a big challenge for the computer technology. As a consequence, big data technology has been one of the hottest techniques in recent years. The big data technology not only can store hundreds of TB data or even hundreds of PB data; but also have a few of different computing framework including the off-line computing framework(hadoop), the flow computing framework(storm) and memory computing framework(spark) to deal with different demands about data processing. The big data which means massive data inludes typical features: the massive size of data, a variety of data types, the fast data transfer and dynamic system data, and the greate value of data. What's more, with the number of the cars increasing fast, massive vehicle location data was recorded everyday. How to take full advantage of those massive vehicle location data to discover the rule hiding in those data has become an important research direction of the big data.At first, the paper did research on the hadoop distributed file system(hdfs)?distributed computing framework(mapreduce)?ressurce management system(yarn); And then introduced the distributed data warehouse hive which can store massive data; At last presentation the mahout technology which has encapsulated various machine learning methods. Through the research on big data technology, I installed the Big Data Analytics Platform including 5 computers. This paper analyzed the car driving location data with the big data analytics platform. I chose the data warehose- hive to store the massive location data,and took means of mapreduce and hql to clean the massive location data; Through the reasearch on the vehicle location data in workdays, the paper used the mapreduce to get the home location and the company location of the car owners,and then chose the kmeans clustering method which was the classic methods in mahout to made recommends about the scheme which the passenger and the car owner have the same path from home to company; Through the reasearch on the vehicle location data,this paper employed the cluster method based on Hausdorff distance to assign travel demands to specific vehicle, and after that the cluster method based on matching degree was proposed to choose the most appropriate travel demands for each car.
Keywords/Search Tags:Hadoop, mahout, location data, kmeans clustering, carpooling
PDF Full Text Request
Related items