The popularity of GPS systems and the widespread use of smart devices have produced huge amount of data which contains spatial-temporal attributes.According to statistics,90% of big data in real-world contains spatial and temporal information.Analyzing and mining such dataset is a challenging task,which has been explored for decades and has received extensive research and attention in field of big data analytics and data mining.The main challenge of spatial-temporal data analysis is to discover relationships and patterns among spatial and temporal dimensions,big data analytics solves these problems well.To explore big data analytics with spatial-temporal attributes,this dissertation presents our work by using two types of spatial-temporal data i.e.crime data from three US cities and traffic accident data from the UK,which explores big data analytics from four aspects.Exploring crime and traffic accident data with Spatial-temporal information also has great meanings and importance.Crime and traffic accidents are major problems in society.They are the main factors of traffic congestion,human death,health problems,environmental pollution,economic loss,and social stability.Facing these fatal and unexpected emergencies,knowing what happened and discovering the factors related to it,and then making alarms in advance is vital to maintaining social stability and reducing losses.This dissertation explores and mines these two types of data by utilizing key technologies of big data analytics and mining.The main contributions are as follows:(1)In terms of the visualization of spatial-temporal data,state-of-the-art visualization techniques are used to effectively visualize and display crime and traffic accident data.First,an interactive map is designed to cluster events based on geographic location information so as to highlight hotspots.Then narrative visualization and techniques are applied to visualize each attribute in detail,combined with high-order Markov chain to calculate transitions between events and areas,thus we accomplished multi-scale exploration of the data in spatial and temporal dimension.Finally we integrates the models and algorithms and accomplished interactive visualization of multivariate huge big data on multiple scales;(2)For spatio-temporal data classification,the datasets are highly imbalanced and classification accuracy is low.To overcome this,we first preprocessing the data i.e.we first merged similar categories and performed re-sampling methods to make the data balanced.Due to heavily coverage of the data in spatial and temporal,rough set theory is utilized to simplify the attributes of the dataset and reduce the coverage rate of the data.Then we explored different classification algorithms on the datasets and picked up tree-based methods to predict crime and traffic accident severity.Thus we proposed a prediction model which ensembled rough set and tree-based classification algorithms.By using the proposed method,we solved classification problems of highly imbalanced data,improved prediction accuracy and saved time as well due to we simplified the dataset;.(3)For association analysis of spatial-temporal data,we first performed Apriori algorithm on the two dataset,and then visualize the rules in terms of high-support and high-confident rules respectively to evaluate them.As the datasets are skewed heavily,which caused a series of problems when performing association rules,we proposed a K-Means based association rules algorithm.By using clustering algorithm,we obtained small sample of the data which contains attributes of interest,then we performed association rules on the data.We obtained some rules hidden in the data which can not be found by using the whole dataset.For example we found that fatal accident often happen in rural when the road is covered with ice and the light is dark.(4)For the number of crimes and accidents trends forecasting,we explored neural networks,time series modes,and deep neural network model.According to MSE(mean square error)and spearson correlation we trained the optimal parameters of each model.We found that Prophet model and LSTM outperform neural network model.Besides,we also found the best amount of training sample is 3 years.Based on these discoveries,we proposed a method called PL-GAN(Prophet and LSTM based generative adversarial network),in which LSTM was used as generative network while Prophet model was the adversarial network,which accurately predicts the number of crimes and traffic accidents to some extent.From the research above,with the novel methods and models proposed in this dissertation,we have effectively visualize,analyze and predict huge datasets with time and space elements.These promising results will help government agencies and law enforcement organizations to better understand crime and traffic accident issues and provide insights that will enable them to track activities,predict the likelihood of incidents,effectively deploy resources and assist decision-making. |