Font Size: a A A

Research On Uncertain Information Processing Method Based On Dominance-Based Rough Set Approach

Posted on:2016-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:W B DengFull Text:PDF
GTID:1228330461474263Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Owing to the inherent complexity, instability and human being’s incomplete understanding of the objective world, all kinds of errors will result from many procedures such as data collection, entry, editing, expression, processing and analysis. In addition, the concepts transformation between qualitative representation and quantitative representation will result in randomness, fuzziness and uncertainty. These phenomena are common in financial, military, economic, commercial, industrial control, telecom and many other practical fields. The uncertainty in data tends to lead to unreliable or even wrong results in data mining. Therefore, the theory and application research about uncertainty processing is widely concentrated by more and more experts and scholars. And it has been becoming an important field of intelligent information processing.Rough set, introduced by Pawlak in 1982, has been proved to be an excellent tool for dealing with inconsistent, imprecise, and incomplete information using certain mathematical theory. It has become a hot topic in the intelligent information field in recent years. Pawlak rough set is based on indiscernibility relation. In order to deal with information systems with preference-ordered and continuous attributes, Salvatore Greco and Roman Slowinski presented dominance-based rough set approach (DRSA), where indiscernibility relation has been replaced by dominance relation. DRSA can not only deal with monotonic decision problems, but also can solve equivalence relation problems. It is an important generalization model of the Pawlak rough set theory. It has been applied successfully into many fields such as investment risk evaluation, customer relationship management (CRM), evaluation of earthquake magnitude, military target threat assessment, and train comfort evaluation. Up to now, variable precision is the main method to deal with uncertain information for DRSA. Therefore, it is very important to design scientific and reasonable variable precision model and to set proper variable precision threshold. Furthermore, the research concentrating on new uncertain information processing theory and application has been becoming more and more urgent.In the dissertation, the theory and methods for processing uncertain information based on dominance-based rough set approach are studied. Specifically, the dissertation mainly concentrates on three aspects:the construction of variable consistency model, the method of transforming inconsistent information systems to consistent ones, the self-learning method based on data-driving data mining. Furthermore, the theoretical research results are applied into solving the practical problem of telecom customer value evaluation. The main contributions of the dissertation are listed in details as follows.(1) An improved variable precision model of DRSA is proposed, which is based on inclusion degree and support degree. This model can overcome the inadequacies of the existing models and can make full use of the original information at the same time.Owing to the existence of inconsistencies, VC-DRSA (Variable Consistency Dominance-based Rough Set Approach) will result in contradictions in the lower approximation calculating proceesing. For VP-DRSA (Variable Precision Dominance-based Rough Set Approach), marginal objects are usually excluded from the lower approximations. Therefore, these objects can not be used effectively. First, the inadequacies for treating inconsistencies of the existing variable consistency DRSA models including VC-DRSA and VP-DRSA are analyzed. Then an improved variable precision model of DRSA based on inclusion degree and support degree, named ISVP-DRSA, is proposed. Next, basic concepts are defined and some mathematical properties are discussed. Furthermore, it is found that the lower approximations obtained by the ISVP-DRSA model are the unions of those obtained by VC-DRSA and VP-DRSA, and the upper approximations obtained by the ISVP-DRSA model are the intersection of those obtained by the two models. As a result, more objects will be included in the lower approximations and fewer objects will be included in the upper approximations. Therefore, the uncertain area is reduced and the quality of approximation classification may be improved. Therefore, the uncertainty of the information system is reduced to a certain extent during the processing. Lastly, the efficiency of ISVP-DRSA is illustrated by an example of comprehensive evaluation on students and experiments on UCI and other data sets. (See Chapter 2 for more details.)(2) An algorithm for transforming inconsistent preference-ordered systems into consistent ones, named TIPStoC, is proposed. It is a novel method to handle inconsistencies for monotonic information systems.After measuring for downward inconsistency and upward inconsistency of an object, the overall object inconsistent concept is proposed and three overall inconsistency measures such as α, ε and μ are defined. Based on that, an algorithm for transforming inconsistent preference-ordered systems into consistent ones (TIPStoC) is proposed. TIPStoC eliminates the most inconsistent object from the data table iteratively. with respect to some overall object consistency measure. After that, the decision rules are extracted from the consistent systems and then classification prediction is conducted with the help of the rules. It is a new method of dealing with inconsistencies. Comparing with other methods, the outstanding characteristic consists in it can identified inconsistent information from the preference-ordered information system effectively. After that, various methods can be adopted for different purposes. For instance, this method can detect outliers effectively in some fields such as military, information security. (See Chapter 3 for more details.)(3) A data-driving data mining self-learning method based on dominance-based rough set approach is proposed, which avoiding dependence on the prior knowledge. This method enhances the inconsistent information processing adaptability of DRSA.First, some uncertainty measures, including the integral certainty measure, maximum integral certainty measure, integral uncertainty measure and minimum integral uncertainty measure of ordered information system, are defined. After that, the maximum certainty of every class union is measured and the corresponding computing algorithm is proposed. Then, a self-learning model based on DRSA is proposed, in which the max certainty coefficients of every class union is used as the consistency threshold value respectively. Compared with variable precision models, this method can learn knowledge automatically without setting the threshold values of consistency level depending on prior domain knowledge or by a complex tail-and-error procedure. Experiments on UCI and other data sets show that this method outperforms other models through setting variable precision threshold of each decision class according to the data. Especially, it is found that the method has advantages for dealing with high inconsistency preference-ordered information systems. (See Chapter 4 for more details.)(4) A novel method, based on domain-oriented data-driving data mining, for telecom customer value evaluation is proposed. This method is favorable to improve the ability of the telecom customer relationship management through combining with domain experts’prior knowledge and the characteristics of the data sets effectively.First, in the data collection process, according to the telecom experts’ priori knowledge, the value features are extracted and the training customers’category are labeled according to two criteria, the current value and the potential value. As a result, we get two customer value decision tables, one is the current value decision table and the other is the potential value decision table. After that, the maximum certainty of every class union is used as the corresponding variable threshold value to control the decision rule extraction process. The proposed method can combine the domain prior knowledge and the characteristics of the data sets effectively. Owing to the huge amount of telecom customer data and the difficulty in customers’category labeling, the active learning method based on neighbor entropy is adopted to select and label the training data, which can minimize the experts’ workload and improve the quality of the selected training data. Simulation experiments in the real field show that the method proposed in the paper can effectively evaluate the telecom customers’ value. This method can establish a firm foundation for telecom operators’ CRM. Furthermore, it can provide a beneficial reference for solving other actual problems. (See Chapter 5 for more details.)...
Keywords/Search Tags:data mining, rough sets, dominance relation, uncertainty, Variable precision model, self-learning, telecom customer value evaluation
PDF Full Text Request
Related items