Font Size: a A A

Research On Multi-attribute Discretization Method On Variable Precision Rough Set Theory

Posted on:2020-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HuFull Text:PDF
GTID:2428330590450634Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the fiery heat of machine learning and data mining technology,people are increasingly demanding big data applications.Researchers have proposed a number of learning algorithms that can be applied to various fields and scenarios,such as C4.5 decision tree algorithms and support vector machines algorithms that are well suited for classification learning.In order to improve the learning effect and prediction accuracy,as an important pre-processing step of data mining and machine learning,the importance of discretization of continuous value attributes is very important.Many learning algorithms,such as decision trees,can only be used for discretized data sets,and the discretization of continuous value attributes makes the data easier to understand and improve accuracy for learning algorithms.In addition,discretization can make learning algorithms more efficient.Currently,most discretization methods only consider the relationship between continuous attributes and class attributes when a certain continuous attribute is discrete.The single attribute discretization method lacks consideration of the importance of attributes,and the order of discrete attributes is mostly determined randomly.The discretization method combining single attribute and multi-attribute can solve the above problem.The importance degree of condition attribute relative to other attributes is used as multi-attribute evaluation standard.The information length based on minimum length theory is single attribute evaluation standard.The research proves that the discretization method effectively utilizes the attribute importance and improves the discrete effect.The discretization method combining single attribute and multi-attribute has certain defects,and its discretization stopping rule is based on the consistency level in classical rough set theory.The classical rough set theory is too strict to define the inclusion relationship between sets.The majority of large samples are also considered as inconsistent samples,which will inevitably lead to information loss.According to the variable precision rough set theory,the strict definition of the inclusion relationship is relaxed,and the noise threshold ? is introduced.When the degree of inclusion is not below the threshold,it indicates that a certain set is included in another set.In order to improve the original discrete method,this paper proposes the inconsistency rate based on the inclusion relationship of the variable precision rough set theory,and replaces the discrete stopping criterion of the original discretization method to obtain a new discretization algorithm.The experimental results show that the discretization method combining single attribute and multi-attribute is in line with theoretical expectation,effectively avoiding the loss of hidden information.In this paper,based on the variable precision rough set theory,the improvement of the discretization method combining single attribute and multi-attribute makes the performance of the original discretization method significantly improved.Although the variable precision rough set theory can obtain a more reasonable discrete stopping criterion,the noise threshold ? value is based on experience during the calculation process.
Keywords/Search Tags:Continuous value discretization, Discrete stop criterion, Variable precision rough set theory
PDF Full Text Request
Related items