Font Size: a A A

The Research Of Rough Set Approach On Containing Order And Incomplete Information

Posted on:2008-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:C X FuFull Text:PDF
GTID:2178360212995887Subject:Computer application technology
Abstract/Summary:
The rapid development of computer technology makes that people can obtain more data. How to use these mass data effectively becomes a new demand. The Data Mining technology is appered for this request. Data Mining is a process that abstract or mine effective, novel or usefull information or knowledge form mass data. It can describe the general property of data; or do forecast based on current data. Data mining, as a general concept, contains lots of implement methods, such as: Genetic Algorithm, Rough Set, Neural Network, Decision Tree, Support Vector Machine (SVM) and so on. In this paper, this paper discusses the rough set method primary.Rough set theory is brough forward by Poland scholar Z. Pawlak in 1982. This is a math tool which is used for deal with uncertain and fuzzy knowledge. It can do analysis and reasoning of data, and discover the knowledge implieded in the data, in the same time, it alse have the ability to analyse imperfection information effectively. Rough Set theory is based on classification principle, it comprehends the classification as an equivalence relation in the given training data set, and equivalence relation form the partition of the space. It is objective relatively to describe and deal with the undetermined problems.This paper has done following work to research the Rough Set theory:1. The outline of the original Rough Set theoryIntroduce the brackgroud, development history and recent years'research of the original Rough Set theory; give the Rough Set theory's basic principle, the field it can deal with, and its advantage of don't need pre-order knowledge. The paper also give the definition of some basic notions which Rough Set theory involved, such as: equivalent relation, indiscernibility relation, knowledge, upper and lower approximateon, rough set, information system, decision table, reduction, core and so on.2. The research of Containing Order Rough Set Problem and the complement of GRs algorithm and IGRs algorithm(1) The introduction of Containing Order Rough Set Problem.In an information system, the attribute which contain preferential information know as criteria, because of the appearance of the criteria, the description of attribute and the decision of example are inconsistentsometimes, so lead to problems of inconsistentency. Moreover, decisions may be oridinal too, so decision attribute should be considered as an oridinal attribute. In the original Rough Set approach, attributes used to describe the characteristics of object; classification is achieved based on equivalent relation. The equivalent relation means that objects have the same value on the given attribute and not involves preference-orders. This led to the inconsistency concerning the criteria will be ignored, and it can't used for induce the preference-orders rules also.In order to solve the disadvantage of the original Rough Set theory, the dominance relation has been introduced in Containing Order Rough Set (CORS) method. The indiscernibility relation is replaced by the dominance relation. The knowledge used for approximateon is some objects'dominance set or be dominated set. This paper gives some Containing Order Rough Set method's notions: preference-orders decision table, dominance relation, dominance set, and the upward and downward unions of decision class. Furthermore, give the new notion of approximateon.(2) The implement and performance analysis of GRs and IGRs algorithm.This paper implements two CORS algorithms: GRs algorithm and IGRs algorithm. Give each algorithm's informal description and do some explanation and analysis.GRs algorithm usesed all the subsets of the condition attribute set C to gengrate rule, and covered all the possible rule's pre-condition, it ensure the completeness of the rules set. But if there are a lot of attributes, the number of subsets will be large, increase the space algorithm used. Furthermore, deal with such a large-scale attribute subset will take up a lot of time, at the same time, it will also have a lot of redundancy, and the time used to deal with the redundancy is prolonged, so GRs algorithm's space and time expenses may be all very great.GRs algorithm and AllRules algorithm have the same essence, they both use all the attribute subsets to generate complete rule, but they have different implement program. This paper analyse their similarities and dissimilarities; analyse performance from algorithm's design perspective.IGRs algorithm is not for all the possible rule's pre-condition, but for all the objects in the data set, when rules set cover all the objects, algorithm can be concluded. This algorithm introduces a heuristic formula, uses heuristic knowledge to judge whether a rule can be generated, and should generate what rule. Compared with algorithm GRs, the IGRs approach cansave a lot of time and space apparently, increasing efficiency and better suited to handl more data attributes. But it can't ensure the completeness of the rules set and algorithm's accuracy worse than GRs algorithm.This paper implements IGRs algorithm's improvement program. It research two aspects: mutuality degree and conflict solution. Use two weight values when calculate the mutuality degree, the problem of heuristic knowledge is too simple, too absolute has been solved. The paper proposes a variable precision of the conflict solution method, add a precision to each rule, when there is conflict, rule which have bigger precision value can be saved. These two methods are all rational and good for rule set's quality.3. The algorithms'testing, validation and comparisonThis paper gives five groups of data, compare the AllRules algorithm, GRs algorithm and IGRs algorithm on the aspect of the number of rules, running time and algorithm's accuracy, verify the result of theoretical analysis. Using three calculate methods to do the further research. Test the improved IGRs, the test proves that the the algorithm's improvement program has superiority.4. The research of Rough Set approach for incomplete dataThe data in the practical application usually have missing value; original Rough Set approach can't deal with such incomplete information, hence an extended model to solve this type of problem is needed. This paper presents four existing extended models'brief introduction. And highlight the Confined Valued Tolerance Relation model. It is a new model which combines the Valued Tolerance Relation and Confined Tolerance Relation. Confined Valued Tolerance Relation confine the tolerance relation farther and is a more rational extended model.Based on Confined Valued Tolerance Relation model, this paper proposes and completes the RQCR algorithm. In the aspect of data structure and algorithm design, this algorithm uses a different idea form original Rough Set approach. The paper gives the algorithm's informal description, and gives the interpretation and analysis for key steps in the algorithm.For the assignment of threshold valueαandβ, this paper proposes the Variable Threshold Confined Valued Tolerance Relation model, implement the dynamoic assignment of threshold value.
Keywords/Search Tags:Information
Related items