Font Size: a A A

Study On Rough Set Based Data Mining Methods

Posted on:2006-06-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q D WangFull Text:PDF
GTID:1118360182490583Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The data mining technique is a combination of machine learning, database and statistical theory. Data mining can seek interesting or valuable information within large, incomplete, noisy, rough, and random databases. Rough set theory (RST), introduced by Pawlak Z. in the early 1980s, is a mathematical tool used for dealing with vagueness and uncertainty. In recent years it has received great attention of researchers around the world and has been successfully applied in many areas, such as AI (artificial intelligence), KDD (knowledge discovery in database), pattern recognition, fault diagnosis and expert system.On the basis of summarization and reference of the experts' experience, the data mining technique is studied in this dissertation by aiming at the frequent problems in DM process. It's hard to obtain good results by mining on the raw databases, especially when the volume of data is large. It is an effective way to extract rules on transformed datasets. So, the corresponding dataset decomposition methods are proposed in this dissertation to solve the problems of ultra large data, noisy data, incomplete data and weak comprehensibility of model in data mining. And the research on clustering method is studied by combining information theory and RST. In detail, the main contents of this dissertation are as follows:1. In the beginning the concepts, background, research contents, main methods and hotspots of data mining technique are introduced. The development process of RST was reviewed. And the preliminary knowledge and present research status on RST were introduced in details.2. In order to extract rules from huge database efficiently and promptly, a feature selection measure was proposed, which could select appropriate features for classification. The feature selection measure based on RST was defined to increase the classification rate and the purity of each sub-database. And then a novel database decomposition method was proposed in this dissertation. We have analyzed the information characteristics and have proved that the feature set which selected by feature selection measure is a reduction of the original information system. The time complexity of this decomposition method is far less than the classical reduct method in RST.3. Model development in high dimension database is very difficult. This dissertation presents a new rough set based machine learning method, named feature decomposition method, to discover concept hierarchies and to develop a multi-hierarchy model of database. According to some measures of rough set theory, the objects defined on the proposed feature group are labeled by a new intermediate concept. The concept hierarchies of the database have specificmeaning, which can increase the transparency of data mining process and enhance the comprehensibility of the model. Each feature group and the corresponding intermediate concept compose the structure of the database.4. In order to extract rules from incomplete data without information distortion, a decomposition approach in incomplete data is proposed in this dissertation. Firstly, the template is selected according to rough sets based on template evaluating function, by which subset without missing values can be extracted from incomplete data step by step. Secondly, the intermediate concept is developed based on rough set theory, with which the incomplete information system can be decomposed to simplify the rule set. Thirdly, the rule set obtained in this approach can make layered decision analysis more efficient. Last but not least, a real-world example of fault diagnosis on steam turbine is presented to illustrate the decomposition process and the feasibility of this method in dealing with the incomplete data has been verified.5. In many clustering processes, the presence of more information usually doesn't generate a corresponding increase in performance of clustering because all the attributes are treated as equally important. In this dissertation, we proposed a solution to improve the quality of clustering that is an attribute-weighted clustering algorithm based on RST and the information theoretical refinement process.6. The complicated structure and the vibration complexity of turbine make the fault have the characteristic of multi-hierarchy, randomization and incomplete information. We use the feature selection measure based on RST to select the appreciate features and to develop multi-hierarchy fault diagnosis model in this dissertation. Compared with the rough set fault diagnosis model, the multi-hierarchy model shows that the rules have high support degree. And the hierarchical fault diagnosis method is easy to understand for its similarity to the reasoning way of human beings.Finally, a brief summary and some future research directions are highlighted in this dissertation.
Keywords/Search Tags:data mining, rough set theory, information theory, database decomposition, feature construction, clustering, incomplete information system, fault diagnosis, turbine vibration
PDF Full Text Request
Related items