Font Size: a A A

Research On Data Analysis Technology Based On Fuzzy Rough Set Theory

Posted on:2022-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z F LiuFull Text:PDF
GTID:1528306836977339Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Artificial intelligence has gone through numerous phases since its inception in 1956.Artificial intelligence has ushered in a new era of progress in the twenty-first century with the introduction of deep learning algorithms.Data,being a fundamental component of artificial intelligence research,has a direct impact on the performance of AI algorithms and models.In the industry,there is a widely held belief that data and features determine the upper limit of machine learning,whereas models and algorithms just approach it.Data and features are both critical aspects in obtaining a high-performance algorithm model at this point.The paper examines the research value and academic relevance of fuzzy rough set theory in the field of data analysis,as represented by feature selection and instance selection,as a research method that has been successfully utilized in the industry.It is frequently necessary to process enormous volumes of high-dimensional data in application scenarios such as machine learning and data mining.To begin with,not all features have the same influence on predicting the target variable,and the data set frequently contains redundant and irrelevant features.Second,the data set contains noise or even erroneous data as a caused by human or other factors,which has a significant impact on the performance of downstream classifiers.As a result,correctly recognizing redundant features and noisy data in data sets can result in high-quality reduced data sets while simultaneously reducing data storage pressure and computational waste.In big data circumstances,it also has a good practical significance.The fuzzy rough set theory-based data modeling technique is analogous to human natural cognition.External parameters have a harder time affecting it,and it is more robust.The paper exploits the fuzzy rough set in the data preprocessing research as the following three aspects:(1)An improved fuzzy rough set-based instance selection algorithm is developed to reduce the computation of FRIS-III algorithm.The proposed method employs the FRIS-I algorithm to quickly filter out the suspected noise data by treating each data in the dataset differently.First,the impact of suspected noisy data on the data set’s dependence degree is assessed,rather than traversing the complete data set.The remaining data set is traversed if the data set does not totally belong to the positive domain after removing all suspected noisy data.The suggested technique efficiently decreases computation in the early step of instance selection while achieving model performance that is comparable to that of the original algorithm.(2)The application of a hybrid method of instance selection based on fuzzy rough sets in the field of credit rating is investigated.Traditional cluster-based credit scoring hybrid models minimize inconsistent data instances and produce high-quality data sets for downstream model training.However,an unreasonable number of clusters or the initial clusters’center points will have a significant impact on the clustering results,and instance selection using a fuzzy rough set identifies noise or outlier data based on the structural characteristics of the data set and is unaffected by external parameters.For credit scoring,a two-stage hybrid algorithm architecture is suggested,with FRIS-I and FRIS-II used in the preprocessing stage to identify core data and remove noise data separately.In the classification stage,the preprocessed data set is fed into classifiers like SVM to build the hybrid classifier.The experiments show that the performance of the two hybrid classifiers is much better than the benchmark classifiers such as LDA,LR,NN and SVM.Due to the different instance selection principles,the experiments show that the first hybrid classifier is more suitable for decentralized data sets,while the second hybrid classifier is more effective for relatively concentrated data sets.(3)A bireduct algorithm based on particle swarm optimization for fuzzy rough set simultaneous feature selection and instance selection is proposed.The proposed algorithm introduces a fitness function based onε-bireduct to evaluate the quality of bireduct,and guides the search process to approach the optimal solution.Compared with counterpart,the proposed algorithm adopts the update mechanism based on particle swarm optimization to avoid greedy search and random feature selection.With the exploration experience of particle itself and the neighbors,it can identify high-quality reduct in fewer iterations.The experimental results indicate that under the same experimental conditions,the proposed algorithm significantly reduces the number of features and instances.Compared with the SFRIS algorithm,the classification performance is improved by nearly 20%on some datasets.Better classification accuracy is obtained in fewer iterations than HSFSBR algorithm.
Keywords/Search Tags:Fuzzy Rough Set Theory, Feature Selection, Instance Selection, Bireduct
PDF Full Text Request
Related items