Font Size: a A A

Research On Discretization Algorithm Based On Class-attribute Association

Posted on:2020-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:J B ZhengFull Text:PDF
GTID:2518306104495554Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Discretization is an important aspect of data preprocessing and one of the key technologies of data mining.In the classification of discretization algorithms,a discretization algorithm based on class-attribute association is one of the better methods of discretization.The discretization algorithm based on class-attribute association has a better performance in terms of operating efficiency and prediction accuracy of discrete data.However,when evaluating the segmentation conditions of the interval in the discretization process,there are often two shortcomings: First,insufficient consideration of a small number of categories in the interval;Second,the number of intervals generated during the discretization process is unreasonable.In view of the above problems,this paper analyzes the characteristics and defects of the CAIM algorithm and the ur?CAIM algorithm.Based on the characteristics of the existing discretization algorithms of class-attribute association,two new theories are proposed: One is to confirm the importance of attributes in the rough set theory and combine with the CAIM algorithm to propose a new calculation method for continuous attribute weights;The second is to propose the concept of the minimum standard value according to the characteristics of the ur?CAIM algorithm,and introduce a variable standard parameter e into the minimum standard value.Finally,based on the newly proposed calculation method of continuous attribute weights and the minimum standard value,the stopping criterion of the ur?CAIM algorithm is improved,and a new discretization algorithm is obtained.The improved algorithm will fully consider the various quantity categories in the interval and can get a more reasonable number of intervals during the discretization process,thereby obtaining a more reasonable discretization scheme.It can be verified through experimental results that the improved algorithm proposed in this paper can obtain a more reasonable average number of intervals,and it also performs better than other discretization algorithms in the prediction accuracy after classifier classification.Later,the improved algorithm of this paper is applied to wine quality detection,which proves the practical usability of the improved algorithm of this paper.Although the improved algorithm can get a more reasonable discretization scheme,the value of the standard parameter e in the paper is obtained through experience.Therefore,how to determine the value of the standard parameter e according to the characteristics of the data set is a direction that can be continued in the future.
Keywords/Search Tags:Discretization, Class-Attribute association, Rough set theory, Attribute weight, Minimum standard value
PDF Full Text Request
Related items