Font Size: a A A

Research On Neighborhood-based Efficient Classification Algorithm And Its Applications

Posted on:2022-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y S ChenFull Text:PDF
GTID:2518306575966939Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of data,how to efficiently mine knowledge from the complex data is an important research goal of artificial intelligence.Neighborhood-based data mining methods are often used in the scenarios of classification and feature selection.The neighborhood-based algorithms have been widely studied and applied because of its ability to characterize distribution of complex data and interpretability.However,the existing neighborhood-based algorithms still have the problem of inefficiency when faced with massive and high-dimensional data.Therefore,there are still many problems worthy of study in the neighborhood-based algorithms.For example,the process of constructing covering in neighborhood covering model is inefficient.How to improve the construction efficiency of neighborhood covering is an important research direction.In addition,the existing neighborhood covering models is rarely considered by the perspective of influence of features on the classification effectiveness and efficiency.Adding the analysis of features to improve the classification effect of the neighborhood covering model is also another content worth studying.Furthermore,combining the neighborhood algorithm with other machine learning algorithms to handle problems in the real world is another issue which is worthy of study.Therefore,in this thesis,the neighborhood-based algorithm is studied and the efficient neighborhood-based classification models are proposed from different perspectives.The main work of this thesis is reflected in the following aspects:(1)The neighborhood covering model has been widely used in classification tasks because of its simple mechanism and ability to process complex data.Most of the existing neighborhood covering models mainly focus on improving the classification ability.Few thesises studied how to improve the efficiency of the neighborhood covering model.In view of the inefficiency of the existing neighborhood covering models,this thesis introduces the triangle inequality relationship between distances and local strategy in the neighborhood covering model to improve its performance from the two perspectives.In addition,sample information in the neighborhood is introduced and a new classification algorithm is designed to improve the classification performance of neighborhood covering model.(2)The influence of features on classification algorithms based on neighborhood covering models is rarely discussed.Different features have different effects on classification task.The features with powerful classification ability can provide more effective support for classification tasks.The features with weak classification ability provide very limited support for classification tasks.Meanwhile,there are also features which even have negative effects on the classification task.Fewer features can improve the efficiency of the model in the classification stage.To enhance the role of the features with powerful classification ability in the classification algorithm based on the neighborhood covering model while reduce the role of the features with weak classification ability,and remove features which will produce negative effects,a feature weighting algorithm based on neighborhood covering reduction is proposed in this thesis.And the weights of features are applied to the classification algorithm based on the neighborhood covering model.(3)When banks and financial institutions conduct credit risk management,they will face a large amount of user credit information,and manual review is inefficient and errorprone.Using classification algorithms to predict credit scores is a preferred solution to this problem.Improving the accuracy of credit scoring is a very important issue when deal with credit data sets.Therefore,an ensemble SVM model based on neighborhood and shadowed sets is proposed and applied to the credit data set.The proposed credit scoring method provides an effective solution for the credit risk management.
Keywords/Search Tags:neighborhood rough set, neighborhood covering, triangle inequality check, feature weighting algorithm, credit scoring
PDF Full Text Request
Related items