Font Size: a A A

Bike-sharing System Usage Analysis Based On Apache Spark

Posted on:2018-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z L JiaFull Text:PDF
GTID:2322330536465892Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Urban transportation is facing unprecedented pressure as the advancement of urbanization in the world,especially in China.On the one hand,the traffic jam has caused a great waste of time,which leads to tremendous economic loss;on the other hand,too many vehicles on the roads have brought serious environmental pollution problems.In recent years,bike sharing system(BSS),a newly-emerged way of carbon-free public transportation,has been widely used in the cities,helping to solve the “last mile” problem in the urban transportation.However,due to the “tide phenomenon” of urban transportation,the bike station always in a situation of full or empty,which means one cannot pick up or drop off the bicycle because of no available docks or bikes in the bike station.This may lead to the loss of the subscribers,even the bankrupt of the program.Considering the large amounts of the BSS history trip data,the complexity of the data visualization and so on,we are going to build a big data platform based on Apache Spark to analyze the third-generation BSS.The patterns of BSS and the prediction of the daily usage will be mainly analyzed in this paper.The main work of this paper is showed as following:(1)The background and its significance of BSS research is elaborated in the first.And then we discuss the necessity and feasibility of using big data technology for the research of BSS.(2)A big data analysis platform based on Apache Spark is built for BSS research after the study of the big data architecture including distribute storage system,data warehouse,workload management system,the engine for big data processing,machine learning library and so on.The data visualization is realized by D3.js,Carto,Python,R and other tools.(3)Using Citi Bike BSS open data as the raw data of this paper,the patterns of BSS users and BSS stations are analyzed by using Spark SQL and Spark Dataframe.The patterns of BSS users are processed by separating the user into male annual,female annual and customers.And the patterns of BSS stations are processed by K-Means algorithm.(4)The prediction of daily usage of BSS is conducted by using Random Forest(RF)and Gradient Boosting Decision Tree(GBDT)based on Spark MLlib machine learning library.The history weather data of New York City is used as the input features of decision trees.And the results indicate that both algorithms have a good fit to the training data.
Keywords/Search Tags:Bike sharing system, Big data analysis, Apache Spark, Data visualization, Cluster analysis, Decision tree regression analysis, Random Forest, Gradient Boosting Regression Tree
PDF Full Text Request
Related items