Font Size: a A A

Research Of Data Preprocessing For Data-driven Modeling

Posted on:2014-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z W RenFull Text:PDF
GTID:2268330422456405Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the increasing complexity of the production process, technology andequipments, the classical control methods become more and more difficult to controlthe production process, which are based on accurate mathematical model drawn fromthe physical and chemical mechanism. Under this background, the data-driven idea hasgot fast development, but the monitoring data collected from production process oftenexist quality issues including missing values, outliers, etc. If direct use this raw data toforecast and make decision, the accuracy of the model will be greatly affected, andeven produce wrong analysis results. Therefore, the data preprocessing should becarried out before the data-driven modeling.In this paper, the development of data-driven and data preprocessing are reviewedand summarized at first, then the relevant theoretical of data preprocessing isintroduced, and the task of data preprocessing and the corresponding methods arediscussed in detail.Then, the problem of missing value imputation is studied. According to thecharacteristics of monitoring data of industrial process, an adaptive imputationalgorithm based on genetic optimization is presented on the basis of analyzing theexisting methods, and is successfully applied in the missing values imputation of themonitoring data of power station boiler. The algorithm has high accuracy and stabilityfor the missing value under different working conditions and under the condition ofhigh missing rate.As for the outliers in raw data, the problem of outlier detection is discussedemphatically. In comparison with the existing algorithms, an outlier detectionalgorithm based on the sum of global distance is constructed. The algorithm not onlycan eliminate the sensitivity of the parameters of classic distance-based outlierdetection algorithm, but also reduce the influence of data distribution, quantify theoutliers at the same time. The results of the simulation analysis in the monitoring dataof power station boiler show that the new detection algorithm has higher recognition rate and lower false alarm rate.At last, the soft sensor model is established based on least squares support vectormachine (LSSVM) for oxygen content in flue gas using the monitoring data of powerstation boiler. The results with preprocessing and without preprocessing by thepresented methods are compared. The results show that the two new preprocessingalgorithms of the adaptive imputation algorithm based on genetic optimization and theoutlier detection algorithm based on the sum of global distance can effectivelyimprove the data quality, which lays foundation for the unit operating optimizationbased on data and can increase the combustion efficiency and reduce pollutionemission.
Keywords/Search Tags:Data-driven modeling, Data preprocessing, Power station boiler, Missing value imputation, Outlier detection
PDF Full Text Request
Related items