A Research Of Multiple Instance Deep Forest

Posted on:2021-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:J Ren

Full Text:PDF

GTID:2428330647451057

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Multi-instance learning is a machine learning problem that evolves from weakly supervised learning.It has a very wide range of applications in many fields such as image retrieval,text classification and medical detection.It can achieve good performance with few hyperparameters,and it works well on image tasks and achieves outstanding performance on non-image tasks.This paper mainly studies the multi-instance deep forest model and obtains the following innovative results:First,this paper proposed a new framework of deep forest called Multiple Instance Deep Forest(MIDF)and two bag-level multi-instance forest called Multiple Instance Random Forest and Multiple Instance Extra Trees.The framework uses a cascade structure,where each layer uses bag-level forests.At the same time,the algorithm will treat each instance in the training data set as a bag for concatenate.This type of operation can ensure the probability distribution of each layer concatenate with the original feature successfully and make the entire cascade structure effective.Additionally,the framework can also automatically determine the number of layers required for deep forest,which greatly reduces the cost of manual design and the time spent on hpyerparameters adjustment.The experimental results show that the hyperparameters of the MIDF algorithm are robust,and can achieve good results in drug activation prediction,automatic image annotation,and text categorization.Secondly,this paper proposed a new acceleration algorithm for multi-instance deep forest MIDF,including algorithm design level and code implementation level.From algorithm design level,this paper introduced online sorting algorithm in the selection partition part which is the most time-consuming part of the algorithm to achieve the purpose of continuous access to memory during calculation.At the same time,the attributes of the parent node that are not suitable for continued division are recorded to reduce the cost of CPU-intensive impure calculation operations.From code implementation level,this paper accelerated the computation-intensive tasks(decision trees)in the code using Cython language,and avoid the GIL interpretation lock to fully utilize the advantages of multi-core CPUs,while also improving the efficiency of single-task operations.This paper also redefined the package structure in the code and used Numpy vectorization operation to replace the loop which can speed up the algorithm.The experimental results show that the optimized MIDF code is faster than the original code by 20 ? 106 in training time and 7 ? 55 in test time.

Keywords/Search Tags:

Machine Learning, Deep Forest, Deep Learning, Ensemble Learning, Multi-instance Learning

PDF Full Text Request

Related items

1	A Research Of Multi-Label Learning Method Based On Deep Forest
2	Algorithm And System For Distributed Deep Ensemble Learning And Architecture Search
3	Application For Homologous And Heterogeneous Multimodal Data Based On Multiple Deep Learning Blocks
4	Research On Novel Deep Forest Models
5	Research On Prediction And Decision-making Methods Based On Multi-source Information Fusion
6	Research On Model Optimization Of Restricted Boltzmann Machine And Deep Representation Learning
7	Research On Multi-instance Multi-labe Learning Based On Feature Learning
8	Research On Deep Multi-instance Learning Models And Algorithms
9	Research On Deep Learning-Based Representation Learning Algorithms
10	Research On Deep Multiple-instance Algorithms Based On Channel And Spatial Attention