Font Size: a A A

A Progressive Fitting Classification Method Based On Ensemble Learning

Posted on:2022-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306482993649Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Big data has permeated every industry and business function today and has become an important production factor.The related processing technology is developing rapidly,and all industries are trying to create higher value with the existing big data,and how to classify these big data accurately and efficiently is rapidly becoming a hot topic of research.Big data is characterized by large quantity and fast growth,low density of effective information and diverse types,which poses a great challenge to the previous processing methods.Big data processing is a concept different from the traditional thinking of processing data: based on traditional algorithms,effective information or rules of things are abstracted from the data for the characteristics of the data to help us make judgments or make decisions or even predict the answers to the same problems.In this paper,we mainly explore and study the classification methods of big data in the domain,and solve the classification problems in the domain by deeply analyzing the large-scale domain-specific data.This paper first provides a brief introduction to the knowledge related to the research and then summarizes the domestic and international research.Through these summaries,some shortcomings in traditional methods are identified,namely,the current big data treatment of classification problems within a specific domain often suffers from the problem that only the features that are obviously relevant to the problem are considered and the existing methods are poorly trained for small sample datasets.Usually the traditional approach is to construct a learner-based classification model by obtaining the relationship between these features through statistical methods for some of the features where the problem is located.Analytical processing of the problem only can achieve good accuracy quickly,but it becomes more difficult to continue improving in classification accuracy due to the limitation of the analysis domain.In this paper,we focus on and study the classification problem within a specific domain of big data processing,and first propose a pre-fit random forest-based classification algorithm(PF-RF)that improves the noise resistance of traditional classification models with small sample data.The method first extracts full feature information by pre-fitting,and then classifies this information by integrating a learning base learner.Then,based on this algorithm focusing on the problem of processing multi-feature domain data,a progressive fitting classification method based on the idea of integrated learning is proposed,called the multi-domain data depth analysis(MODE)method.MODE aims to focus on the association between the feature domains of the original data,firstly,an orthogonal feature extraction is performed on the data of each domain,and then the feature dimension is expanded for the characteristics of small data sets,and finally the expanded and gradually strengthened features are used to obtain a better classification model.The experiments try to apply this paper's method to big data in the real sociological domain for the specific task of classifying his annual income by the data of the census.The experimental results show that the method is more effective in classifying multi-domain data with small samples compared to existing methods.
Keywords/Search Tags:Domain big data, Integration learning, Pre-fitting, Multi-domain data, Progressive fitting
PDF Full Text Request
Related items