Font Size: a A A

A Study On Process Industrial Data Mining Based On Support Vector Machines

Posted on:2006-12-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1118360152996424Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In this dissertation, several issues and the corresponding solutions about data mining technology based on support vector machines (SVM) are discussed. Based on SVM, some algorithms of data mining are proposed. Then the proposed algorithms are applied to a practical industry process of PX. The main contributions are described as follows,(1) A new incremental SVM learning algorithm (FS-SVM) is proposed. The training samples and incremental samples will influence each other when incremental samples are added into the current working set. In FS-SVM, support vectors are selected as much as possible into current working set to increase the predicted accuracy. The simulated result on UCI Adult data sets indicates that the proposed algorithm can efficiently increase the accuracy and speed.(2) In order to overcome model failure problem, a soft sensor modeling method based on incremental SVM (ISVM) is presented. In ISVM, an incremental sample which represents new operational condition is introduced to model, at the same time, an old sample is discarded from the model to control the size of working set. The proposed method is applied to predict the purity of PX in a PX fractionation by adsorption process. Simulation results indicate that the proposed soft sensor model actually increases the adaptive abilities to various operation conditions and solves the model failure problem caused by change of operation conditions or load.(3) In order to overcome the overfitting problem caused by the fixed penalty factor, fuzzy support vector regression (FSVR) and fuzzy least squares support vector machines(FLS-SVM) are proposed to deal with the problem. Strategies based on k nearest neighbor (&NN) and support vector data description (SVDD) are adopted to set the fuzzy membership values of data points. The proposed FSVR and FLS-SVM algorithms based on kNN and SVDD are applied to predict the concentration of 4-carboxy-benzaldehyde (4-CBA) in a practical purified terephthalic acid (PTA) oxidation process. Simulation results indicate that the proposed method actually reduces the effect of outliers and yields higher accuracy.(4) SVM is applied to many research fields because of its good generalization ability and solid theoretical foundation. However, as the model generated by SVM islike a black box, it is difficult for user to interpret and understand how the model makes its decision. A hyperrectangle rules extraction (HRE) algorithm is proposed to extract rules from trained SVM. Support vector clustering (SVC) algorithm is used to find the prototypes of each class, then hyperrectangles are constructed according to the prototypes and the support vectors under some heuristic conditions. When the hyperrectangles are projected onto coordinate axes, the if-then rules are obtained. Experimental results indicate that HRE algorithm can extract rules efficiently from trained SVM and the number and support of obtained rules can be easily controlled according to a user-defined minimal support threshold.(5) A novel data mining method is introduced to solve the multi-objective optimization problems of process industry. A hyperrectangle association rule mining (HARM) algorithm based on support vector machines is proposed. Hyperrectangles rules are constructed on the base of prototypes and support vectors under some heuristic limitations. The proposed algorithm is applied to a simulated moving bed (SMB) paraxylene adsorption process. The relationships between the key process variables and some objective variables such as purity, recovery rate of PX are obtained. Using existing domain knowledge about PX adsorption process, most of the obtained association rules can be explained.(6) In order to simplify the process of data mining, a data mining "5P" model of process industry is presented and a data mining system software ESP-PIDMS is written. Using the ESP-PIDMS, some data mining models are built to solve real industrial problems.
Keywords/Search Tags:process industry, data mining, support vector machines, para-xylene, pure terephthalic acid
PDF Full Text Request
Related items