With the gradual development and application of Io T(Internet of Things)devices,physical environment can be sensed and quantified in the form of sensor readings,thereafter,be utilized in most aspects of our lives.Currently,the main approach of environmental monitoring is to deploy WSN(Wireless Sensor Networks).Due to its inherent limitations,there can be inevitable data losses during the data collection and transmission.Moreover,since the power of the sensors are limited,using the sampled readings of the whole sensing network can prolong the monitoring cycle of the network.Therefore,data imputation algorithms that reconstruct missing values from samples are of great significance concerning application-oriented environment monitoring.The main work of this thesis is listed as follows:(1)To solve the problem that there are many missing data in the datasets in the field of environment monitoring,resulting in the impracticality of applying complicated neural networks,this thesis proposes a dataset subset extraction algorithm based on distance metric clustering to filter and process the data in the original dataset.Our proposed algorithm includes four parts: time window determination,sensor aggregation,sensor exclusion,and data frame shape selection.The time window determination is used to synchronize the working time of all sensors obtain the most no missing adjacent working hours;the sensor aggregation aggregates the sensors whose working periods have no intersection so that the distribution of working time in the sensor network is more uniform;the sensor exclusion filters the quality of the sensor in the data set;the data frame shape selection takes the quantity and shape of the generated data frames into account,making the data set more applicable to neural networks.(2)To solve the problem that it is difficult for existing algorithms to effectively interpolate missing values at a high missing ratio,this thesis proposes a deep generative model based data imputation approach,using pre-trained generated models to generate data that has the least error with the sampling data as the interpolated data.To apply the deep generative model in missing value imputation,this thesis introduces three representation matrixes: the environment matrix,the sensory matrix,and the binary index matrix,which represent the original data in the environmental monitoring network,the collected missing data,and the index values of the corresponding missing locations,respectively.By using these matrixes,we can transform the missing value imputation problem into an optimization problem for solving the environmental matrix.To make the deep generative models capture the spatiotemporal characteristics of the sensor network better,this thesis adopts the training method of generative adversarial network and stores the prior information of the spatiotemporal data frame in the deep generative model,to obtain less imputation error,and is less affected by the missing ratio and missing pattern.In this thesis,we test our proposed dataset subset extraction algorithm and the imputation algorithm separately on a public large-scale dataset SensorScope in the field of environmental monitoring.Experiments suggest that the subset extracted by our proposed algorithm can generate more than 30% more spatiotemporal data frames than the original dataset,and the performance of the trained interpolation model can be improved by up to 20% on average.In addition,the experiments show that the deep generative model based imputation algorithm proposed in this thesis has performance advantages compared with the existing algorithms at a high missing ratio,and the mean absolute error is less than 1 at the 50% missing ratio and above,which can better utilize the spatiotemporal characteristics among the sensor networks for data frame reconstruction. |