Font Size: a A A

Research On Multi-View Classification With Cost-Sensitive

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:M L TanFull Text:PDF
GTID:2428330629953130Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The classification algorithms have been widely used in industry,commerce,and scientific research as a means of data analysis in machine learning.Due to diversified structural feature of the data,many data have the characteristics of multiple view.Multi-view data often plays a more important role than single-view data.The main reason is that multi-view data can describe the data information more comprehensively,and the principles of consistency and complementary are followed between the views.Meanwhile,the more suitable models can be obtained in the process of data mining.Multi-view data analysis has currently achieved significant research results in the field of biological information recognition such as face recognition.With the in-depth study of multi-view learning,researchers have discovered some problems.First,high-dimensional and completed data will make the algorithms time complexity too high due to the noise in the process of learning model.Second,when the internal distribution of the collected data is in an imbalanced scenario,this type of multi-view data will inevitably cause the classifier biased towards data with a large number of categories,which will reduce the classification performance in the end.Third,multi-view data will produce the cost of misclassification in classification.In view of the above problems,this dissertation solves the problem of misclassification by introducing cost-sensitive learning in multi-view data analysis.The problems of missing data frequently occur because of the collection difficulties,high cost and equipment failures in the process of data collection.This will not only increase the difficulty of data analysis,but also affect the model construction and data analysis results.The current methods for processing missing data sets classification mainly include feature independence assumptions and random missing.In addition,the missing value imputing method can also be used as an effective method for processing missing data.The former still faces many problems such as high time complexity and unsatisfactory algorithm efficiency when dealing with missing data.Therefore,this dissertation uses the methods of missing imputing methods to deal with the missing problem.In addition,this proposed method further reduces the impact of missing data on model training.The core content and original innovation points are as follows:1.To deal with the problem of imbalance between classes and noise in the classification,this dissertation proposes a method based on multi-view completed data classification with cost-sensitive learning.This method uses L21-norm function and effectively combines theweights of data at the sample-level,class-level and view-level to automatically assigns large weights to important samples,and small weights to unimportant samples to effectively reduce the influence of noise in these areas.It is verify that the proposed method is superior to other comparison algorithms by calculating the total misclassification cost in the experiments.2.When there is missing value in the multi-view data,this dissertation proposes a classification method of missing value imputing based on sample constraints and cost-sensitive learning.This method uses missing value imputing to make the missing data into complete data.In the comparative experiment,the methods of single imputing and multiple imputing are all used,and finally the process of data classification and analysis is realized.In addition,the weight analysis strategy is adopted,and the noise of different levels in the multi-view data is also eliminated in theory.It is denoted that the proposed classification method has better robustness and classification performance by experiments in this dissertation.In a conclusion,the methods proposed in this dissertation are based on cost-sensitive multi-view classification research,which can effectively improve the misclassification problem of imbalanced data in classification.Meanwhile,the proposed algorithm can obtain better classification performance on different evaluation indicators by using the missing value imputing methods to supply missing feature values.In the future,I will consider combining deep learning methods for classification analysis to further optimize and improve the performance of the proposed classification algorithm.
Keywords/Search Tags:Imbalanced Data, Cost-sensitive Learning, Misclassification Cost, Missing Value Imputing, Robustness
PDF Full Text Request
Related items