Font Size: a A A

Research On Machine Learning Based Multi-source Heterogeneous Data Mining For Risk Prediction

Posted on:2019-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2428330566977940Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The fast developing Internet technology and innovative equipment are changing our lives.As result,large amount of data was generated every day.However,the infinite value contained in these data were not fully exploited.One of the most important challenges is how to exploit the heterogeneous data value and combine the data with artificial intelligence technology to improve our lives.This thesis shows three different surveillance and prediction application by utilizing heterogeneous data and the main contribution is as follows:The prevention of infectious diseases is a global health priority area.However,conventional surveillance systems publishes the surveillance results weeks after epidemic outbreaks.To improve the early detection of epidemic outbreaks,we build a hidden Markov model to predict the epidemic trends based on disease-related Google search volume.The proposed HMM achieve 91.9% and 98.2% estimation accuracy for hepatitis A and B in America.Accuracy for influenza and Lyme disease was 91.7% and 84.7%.This thesis presents a spatial-temporal method that incorporates heterogeneous data collected from the Internet to detect global influenza epidemics in real time.Specifically,the influenza morbidity data,the influenza-related Google query data and news data,and the international air transportation data are integrated in a multivariate hidden Markov model.Respective models are built for 106 countries and regions in the world.The proposed method achieves 93.34% accuracy on average for real-time detection of global influenza epidemics and 89.20% accuracy on average for next-time prediction of global influenza epidemics.This thesis also predicts the price of wheat,stock and bitcoin by deep learning Markov model.The error rate on average of predicted wheat price is 0.35%,and 2.90% for predicted stock price,4.03% for predicted bitcoin price.Compared with the parametric model based prediction method,the deep learning Markov method can easily seize the relation of heterogeneous data by few priori knowledge.And it's easy to incorporate more heterogeneous data for more extensive research.
Keywords/Search Tags:Heterogeneous Big Data, Machine Learning, Artificial Intelligence, Epidemic Surveillance, Trend Prediction
PDF Full Text Request
Related items