Font Size: a A A

Research On Machine Learning Algorithm With Environmental Data Prediction

Posted on:2019-09-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y QuFull Text:PDF
GTID:1368330572482134Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of machine learning in recent years,the application of machine learning into many interdisciplinary fields to solve a series of problems of data analysis and prediction has become a new research hotspot.In recent years,with the development of high efficient information acquisition and transmission technology such as Internet of things and mobile Internet,the characteristics of environmental data are becoming more and more multi-source,high dimension and serialization.There are also a large number of environmental data which show unobvious physical characteristics.Traditional statistical modeling and prediction methods are hard to make full use of these environmental data.In this paper,machine learning algorithm research is carried out for environmental data prediction.According to the characteristics of environmental data,the following research works have been carried out.(1)A prediction method of environmental data based on unobvious physical characteristics of DFSA:Feature selection is a part of data mining.The correct application of feature selection is the key to the traditional machine learning algorithms,such as Back Propagation(BP),Support Vector Machine(SVM),decision tree and so on.It is a common problem in the field of environmental data prediction to predict environmental parameters according to data with unobvious characteristics,which results in the difficulty of feature selection.In order to solve the above problems,this paper proposes a Divergence-based Feature Selection Algorithm(DFSA)and designs a machine learning framework based on DFSA.Taking the prediction of soil moisture content by remote sensing image data as an example,DFSA is compared with other feature selection algorithms.BP,SVM and other classifiers are used to predict the distribution of soil moisture content in Beijing area based on the characteristic data set of corresponding feature selection algorithm.The results show that the prediction accuracy of output feature set from DFSA can reach over 70%,which is higher than other feature selection algorithm.(2)A prediction method of environmental data based on interpolation completion and LSTM sequence:The uneven sampling rate and missing data lead to the imbalance of the number of samples,which is a common problem to predict environmental parameters using multidimensional data.In view of the above problems,this paper proposes a prediction method of environmental data based on interpolation completion and LSTM sequence,that is,utilizing interpolation method to complement the data,and then preprocessing the data through normalization and regularization,and finally making prediction by LSTM.Taking the prediction of PM2.5 concentration in Beijing as an example,a multi-dimensional dataset with uneven sampling rate is set up based on meteorological data and PM2.5 data,and the algorithm is verified as well.The equivalent method,linear interpolation method,Newton interpolation method and Lagrange interpolation method are utilized respectively to carry out frequency matching of temporal dimension and data supplement of spatial dimension.The results show that using the above four methods can improve the prediction accuracy obviously.The prediction accuracy with Lagrange interpolation method can be up to 82.73%,which is 20%higher than that without interpolation method.(3)A prediction method of multidimensional sequence data based on ConvLSTM-ELM:Multidimensional sequential environmental data.CNN and LSTM can be used respectively to solve the problem of automatic feature extraction and time series data utilization.In this paper,ConvLSTM-ELM,a deep learning network structure,is proposed based on CNN and LSTM.The ELM is utilized to replace the softmax classifier from traditional network,which is easy to fall into the local optimal solution.The ELM is used to output the final result.In order to verify the effectiveness of the above algorithm in the prediction of typical multidimensional sequential environmental data,an experimental platform for formaldehyde concentration prediction is designed and implemented based on multiple cheap gas sensors and a large number of sample data are collected.The results show that the prediction accuracy of CONVLSTM-ELM is better than the traditional CNN+LSTM method and LSTM method.
Keywords/Search Tags:Machine Learning, Environmental Data, Feature Selection, Space-Time Interpolation, Support Vector Machine, Long and Short Term Memory Model, Convolutional Neural Network, Limit Learning Machine
PDF Full Text Request
Related items