Font Size: a A A

Research On Classification Methods Based On Rough Set

Posted on:2012-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J SunFull Text:PDF
GTID:1118330368978944Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
AI (Artificial Intelligence) is a subject of researching on computer to simulate human thinking process and intelligent behavior (such as study, reasoning, thought, planning, etc). It is a branch of computer science and is called one of three world advanced technologies since 1970s same as the space technology, the energy technology. AI is also thought to be one of three big tip technologies in the 21st century with the genetic engineering and the nano science. One of main goals for AI is to do some complex tasks with machine instead of human. These tasks usually need to complete with human intelligence. Therefore, machine learning is very important in artificial intelligence research. It is studying on computer simulating and realizing human learning process to get new knowledge or skills, at the same time, it also forms the existing knowledge structure constantly to improve its performance. The intelligent system without learning ability is difficult to be called a real intelligent system. Those former intelligent systems are generally lack of learning ability. Along with the development of computer technology, the ability of collecting and storing data for human has been greatly increased. A large amount of data has been accumulated not only in scientific research but also in social life. How to mine and analysis these data so as to find some rules among them almost has become a common need in all kinds of research fields. It is in this case, machine learning technology gets more and more attention and has become one of the cores of artificial intelligence.Rough set theory, firstly proposed by Polish mathematician Z.P awlak in 1982, is one of the mathematical theories which are applied to the analysis of uncertain data. Its advantage is not to be given some of the characteristics of data or described, and can find correlation between attributes only from the attributes of the given data so as to discover the law of data and creates the decision rules ultimately. The technology of rough set is one of important methods for machine learning. This theory, as a kind of theory for data analyzing, is a new type of mathematical tools to deal with fuzzy and uncertain knowledge. Rough set theory gradually becomes perfect since its occurrence under the continuous research of computer scientists and data analyzers. It has been widely used in pattern recognition, machine learning, data mining, knowledge acquisition and knowledge discovery, etc. Knowledge must be based on the classification ability of objects in rough set theory. The object can be anything to be expressed, which can be either specific or abstract.Rough set technology is one of the important machine learning methods. Rough set theory gives the formal definition of knowledge. It requires that knowledge should be based on the classification ability of object. Object is something that can be expressed. It may be specific or abstract. Knowledge is interrupted about a cluster division mode of domain in rough set. The direct facts of domain and the reasoning ability of facts implicated in them are also provided. We use rough set technology in the intelligent classification in this paper. From rough set theory and algorithm design, we have done further research in discretization and reduction of attribute in rough set. At the same time, we have given some methods about the calculation of attribute significance, the discretication of continuous attribute, the classification based on rough set, etc. Finally, we have discussed the relationship between reduction of rough set and decision rule.(1) Calculation of attribute significance Each condition attribute for the classification result is not the same important in a decision system. And the mutual information between the condition attributes and the decision attributes reflects the significance of the condition attributes. Therefore, the number of potential values of decision attributes indicates the significance of condition attribute relative to a decision attribute when a condition attribute has a value. If potential values of decision attributes are unique when a condition attribute gets a value ? , the value of condition attribute can ascertain the decision attribute uniquely. As a result, we need not to take other condition attributes into account whenever condition attributes own the value ? in the process of rule generating. We propose a novel calculation method about attribute significance, referred to definition 3.1, and proved its feasibility in our experiments based on what we have said above.(2) Research on discretization of continuous attributeThe data of samples may be continuous or discretized in a decision system. But rough set can only process those discretized condition data, so it is highly important for a decision-making system to discretize continuous attributes. In order to simplify a decision-making system, we still need to combine some discretized attributes into further abstract values though they are discretized before. As a result, we can get more common features about the sample data. The main idea of our attribute discretization algorithm, given in this paper, is that decision rules generally are stronger relative to those condition attributes that are more significant in a decision-making system. When discretizng a continuous attribute in our algorithm, we firstly cluster it with the traditional fuzzy C-mean (fuzzy c-means) clustering method to complete the initial discretization of attributes and calculate the significance of each condition attribute. We discretize an attribute combining calculation with the more significant attributes and pay full thought to classification target. The experiment results show the proposed algorithm can generate less discrete attribute value and the optimal set of rules. The experimental comparisons with other algorithms have been verified the feasibility and effectiveness of our algorithm.(3) Research on classification method based on rough setThe characteristics of rough set, also its main advantage, is that it doesn't need to provide any priory information of target except its data set and it is good at dealing with inaccurate, incompatible and incomplete data. The classification algorithm, based on rough set, firstly preprocess the decision-making system by discretizing condition attributes in descending order of their significance until the generated decision-making system is consistent or all condition attributes have been discretized so far, and then remove repetitive objects, the last step is the generation of decision rule set. In the traditional rough set theory, only the condition attributes own characteristics are considered during discretizing, and the system usually requires attribute reduction to get the final rule sets. The classification algorithm in this paper, which is based on rough set, discretizes all condition attributes according to their significance in descending order, at the same time; it fully considers each discretized condition attribute and each category attribute. The final decision rule set, generated by our algorithm, owns less number of rules and need not to reduce attribute further. We have tested our algorithm from several aspects in our experiments. Comparing with other classical algorithms, we have proved our algorithm superiority and feasibility.(4) Research on classification algorithm with breakpoints processing in rough setFor rough set theory, the number of discretized breakpoints is directly related to the division of attribute intervals. The classification algorithm in rough set, discussed in (3), doesn't carefully process some special breakpoints during attribute discretizing. So we propose a classification algorithm with breakpoints processing in rough set, of which the accuracy is enhanced with further analyzing and processing of breakpoints. At the same time, our algorithm can reduce the number of rules in a decision-making system. We have refined the end-points of two intervals in a group during merging: if the left end-point of a new interval is the right end-point of the old interval, then expand the left end-point value by ? times in order to avoid the inconsistence of data led by end-points. We have tested our fixed algorithm and the original one respectively in tea taste signal data and demonstrated that the rough set algorithm based on breakpoint procession is advanced.(5) Discussion about the relationship between rough set reduction and decision ruleThe main idea of rough set is to derive decisions for problems or classification rules through the knowledge reduction without changing its classification ability. The so-called reduction is to delete those irrelative or unimportant knowledge in keeping the classification ability of the knowledge base. The rule set is a decsion-making table which is generated after each attribute in the old decision-making table is classfied by a knowledge classification method. The advantage of attribute reduction is to simplify a decision-making system, to reduce the number of decision rules, to shorten the length of rules. Generally speaking, the length of rules is positive relationship with their number. We have verified the relationship between reduction and the decision-making rule in many experiments and proved the importance of reduction to rough set.The learning method, based on rough set, is a powerful tool in machine learning. We have carried much on the theoretical research and the design of algorithm for data classifying with it. Although our research results are limited, they are enough to reflect the boundless charm of rough set technology. Being in an information age of many method emerging, the rough set technology and related algorithm will play an increasingly important role.
Keywords/Search Tags:Rough set, Attribute significance, Discretization, Reduction, Decision rule
PDF Full Text Request
Related items