Font Size: a A A

Research On Corrosion Prediction Of Materials In Natural Environment Via Data Ming

Posted on:2016-06-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q LuFull Text:PDF
GTID:1228330470959046Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
People have studied corrosion behaviors for hundred years, but it is still an important challenge to predict the long-term trend of corrosion behaviors of materials in natural environment. The corrosion test in natural environment often includes so many test sites and continues for a long time, so the corrosion test data often has the characteristics just like small sample size, hierarchy structure and high noise. In addition, the corrosion test data often be high dimensional because of the complexity of corrosion factors. All these characteristics make it hard to analysis the corrosion test data with traditional methods. Hence, there is a great need to develop new technologies and algorithms to predict the corrosion behavior of materials in natural environment.Prediction is one of the key tasks of data mining. With the theoretical development of data mining, many excellent algorithms for prediction were emerged, but for the real world data with poor data quality or with problems such as high-dimensional, small sample size, hierarchical structure, the existed models and algorithms can not satisfy the practical application.This paper strive to propose new prediction algorithms and technologies which could deal effectively with data sets with the characteristics just like high-dimensional, small sample size, hierarchy structure, and high noise, and apply them to the research on corrosion prediction of materials in natural environment. For this purpose, a series of innovative thoughts and methods are proposed, the main contents and innovations of the paper are as follows:(1) For the small sample size data with poor data quality, this paper proposes the I-BRT algorithm. On the base of Gradient Boosting Machine, some changes are applied. Firstly, by using the ε-insensitive loss function, the I-BRT algorithm is established on the structural risk minimization theory which can improve the generalization performance of the algorithm; Secondly, in the selective integration theoretical guidance, the algorithm uses dynamic contraction coefficients to improve the original algorithm; Thirdly, learning from the random forest, the algorithm enhances the difference among the base models to improve the performance of the integrated model.The experimental results show that I-BRT algorithm is suitable for the data sets which have the characteristics just like high-dimensional, small sample size, including missing values, and can provide better predicting performance than Gradient Boosting Machine for small sample size data;(2) For the high dimensional and small sample size data, a Lasso-based method is proposed named as SALP. To eliminate the noise data perturbation or outlier influence, SALP method uses Bayesian Bootstrap algorithm to reconstruct data sets and establishes models based on them. Then the predictors can be prescreened through integrating the result of those models. In order to deal with small sample size problem of data, the Partial Least Squares weighting factor is applied. Experimental results show that SALP algorithm is suitable for training model on high dimensional data and variable selection. The algorithm has feasibility and practical value in corrosion prediction of natural environment and similar research field;(3) This paper introduces the Hierarchical Linear Model theory to the study of corrosion prediction. To describe the hierarchal structure of corrosion data, this paper establishes a hierarchical linear model to study the corrosion behavior. Experimental results show that the model established with hierarchical linear model theory could reasonably describe the data structure and provide robust predictions. It can deal with the imbalance data or small sample size data easly. The algorithm has practicability and promotional value in analysising of scientific data with hierarchal structure and similar research fields;(4) For longitudinal data with small sample size, this paper proposes a RE-BET algorithm in the framework of mixed-effects model, the algorithm uses tree-based method to estimate the fixed effects of mixed effects model so that it can select important variables automatically and discover the relationship between variables. In order to handle small sample size data, a Bayesian method based on Dirichlet process prior was used to estimate the random effects of mixed effects model. Experimental results show that the algorithm has splendid flexibility and adaptability to the real world data and is suitable for longitudinal data with small sample size.This research contributes to the development and improvement of data mining prediction study, and has important reference value for research on corrosion prediction of materials in natural environment.
Keywords/Search Tags:data mining, corrosion prediction, high dimensional data, small samples size, hierarchal structure
PDF Full Text Request
Related items