Font Size: a A A

Data Mining Based Data Fusion Theory And Its Applications

Posted on:2003-05-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:B HanFull Text:PDF
GTID:1118360092475613Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Revolving round knowledge retrieval, some data mining approach, in particularly, rough set theory, causal model and the multivariate statistical analysis are intensively discussed to benefit the solving of pattern classification, knowledge discovery, reasoning under uncertainty, attributes analysis for the data fusion system, water monitoring system for instance, which always consist of uncompleted information, and whose data always are across-disciplines, polluted by noise and muted by other redundant information.Three kinds of approaches originated from the rough set theory, the causal model method and the PC A techniques are investigated to tackle a set of problems involved in data mining based data fusion. Firstly, due to the data fusion system always suffers from uncompleted data (inconsistent even contradict each other) and lacking of priori knowledge, a method based on rough set for rule-based modeling is proposed. Comparing with rough set reduct, the reduct based on information entropy is presented as well, to remove the barrier of database dynamically constructing and make the rough set based modeling applicably when the data are polluted by noise. Secondly, The causal model integrating symbolic structural knowledge and numeric measures of uncertainty is employed to describe the monitoring system, which benefits the solving of multi-membership classification problem and the expression of indeterministic causation. The causal model is a kind of Bayesian Network and has been proved performing quite well in practice even when strong attribute dependences are present. However it is sensitive to the number of features and the searching space will increase exponentially as the number of features increases. In order to solve this problem the "Reduct" method based on Rough set theory is used to remove the redundant features. Thirdly, the meaning of the concept of rough set reduct is expanded and the problem of information retrieval is examined from the viewpoint of statistics. Instead of the rough set reduct, some pattern classification approaches based on statistical feature selection and PCA are discussed. And the relations among the features in the data fusion system are quantitatively examined based on factor analysis and factor rotation. Also the correspondence analysis is introduced to analyze the relations among the features and the states be monitored in the data fusion system, with which these relations among them can be descried in a vivid way of illustration.In general, the research work in this thesis could be summarize as below:1. A rule-based model based on rough set theory for a water-pollution monitoring data fusion system is derived. The entropy theory is employed to analyze the attributesets.2. An information entropy based Reduct searching algorithm is proposed to benefit the applications of rough sets theory when it is used to small data set.3. Compare to rough set theory based Reduct, a statistical definition of Reduct is presented that make it easy to discovery the indeterministic causation.4. The causal model integrating symbolic structural knowledge and numeric measures of uncertainty benefits the solving of multi-membership classification problem and the expression of indeterministic causation. But the searching space will increase exponentially as the number of node increases when the Causal Network based searching algorithm is applied. Also, it is difficult to estimate the causal strength, especially when one has none prior knowledge. Based on rough set theory a searching algorithm with node selection is proposed which makes the searching space is greatly compressed, and at the same time, the classification accuracy is maintained, also a measurement of the causal strength is introduced.5. The problem of information retrieval is examined from the viewpoint of statistics. Instead of the Reduct, some pattern classification approaches based on statistical feature selection and PCA are discussed.6. A new pattern classification method is developed...
Keywords/Search Tags:data fusion, rough set, causal model, PCA, factor analysis, correspondence analysis
PDF Full Text Request
Related items