Rough set theory is a new data analysis tool for dealing with incomplete information such as uncertainty and inconsistency.It is widely used in data mining,decision analysis,fault diagnosis and so on.In the rough set theory,the classical Pawlak rough set model is only suitable for processing discrete variables,and can’t deal with the continuous variables directly.In order to solve this problem,the neighborhood rough set model is introduced by introducing the concepts of neighborhood granulation and rough approximation.It can support both continuous data types and discrete data types,so the application range of rough set theory is broadened.Attribute reduction is one of the core contents of rough set theory.Attribute reduction refers to deleting redundant attributes while maintaining the original decision system classification and decision-making ability.The main research content of this thesis is to improve the existing attribute reduction algorithm and verify it through experiments.In addition,the improved algorithm is combined with the decision tree classification algorithm to achieve more efficient classification.The research work of this thesis is as follows:1.In the attribute reduction algorithm based on neighborhood rough set model,the calculation of the positive region is an important basis to ensure its effectiveness,and it is also the most important part of its time expenditure.In order to reduce the overhead of the algorithm,a matrix preservation strategy is adopted.And the square of the calculated value between the samples is recorded by the matrix.The calculation of the metric on the original n-dimensional is improved to the calculation on the 1-dimensional.Based on the above method,the F2 HARNRS algorithm is improved.A forward search attribute reduction algorithm based on matrix preservation strategy is proposed.And experiments show that the algorithm is effective in improving the efficiency of the algorithm.2.When calculating the neighborhood sample,δ is a key parameter.Its value affects the result of reduction.In this thesis,based on the matrix retention strategy,the FHARA algorithm is improved.In the experiment,the standard deviation is adopted to measure the value of δ.First,the standard deviations of the attribute values of each column are taken,and then the standard deviations of these standard deviations are taken as the values of δ.A fast attribute reduction algorithm based on matrix preservation strategy is put forward.The experimental results show that for most data sets,the algorithm is efficient and fast.3.The current research status of decision tree classification algorithm is analyzed.The improved positive region calculation method and decision tree classification algorithm are combined to improve the classical algorithm ID3.And a decision tree algorithm based on positive region is proposed.The experiments on multiple data sets show that the algorithm can improve the efficiency of building decision trees. |