Font Size: a A A

A Research Of Multiple Instance Deep Forest

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J RenFull Text:PDF
GTID:2428330647451057Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-instance learning is a machine learning problem that evolves from weakly supervised learning.It has a very wide range of applications in many fields such as image retrieval,text classification and medical detection.It can achieve good performance with few hyperparameters,and it works well on image tasks and achieves outstanding performance on non-image tasks.This paper mainly studies the multi-instance deep forest model and obtains the following innovative results:First,this paper proposed a new framework of deep forest called Multiple Instance Deep Forest(MIDF)and two bag-level multi-instance forest called Multiple Instance Random Forest and Multiple Instance Extra Trees.The framework uses a cascade structure,where each layer uses bag-level forests.At the same time,the algorithm will treat each instance in the training data set as a bag for concatenate.This type of operation can ensure the probability distribution of each layer concatenate with the original feature successfully and make the entire cascade structure effective.Additionally,the framework can also automatically determine the number of layers required for deep forest,which greatly reduces the cost of manual design and the time spent on hpyerparameters adjustment.The experimental results show that the hyperparameters of the MIDF algorithm are robust,and can achieve good results in drug activation prediction,automatic image annotation,and text categorization.Secondly,this paper proposed a new acceleration algorithm for multi-instance deep forest MIDF,including algorithm design level and code implementation level.From algorithm design level,this paper introduced online sorting algorithm in the selection partition part which is the most time-consuming part of the algorithm to achieve the purpose of continuous access to memory during calculation.At the same time,the attributes of the parent node that are not suitable for continued division are recorded to reduce the cost of CPU-intensive impure calculation operations.From code implementation level,this paper accelerated the computation-intensive tasks(decision trees)in the code using Cython language,and avoid the GIL interpretation lock to fully utilize the advantages of multi-core CPUs,while also improving the efficiency of single-task operations.This paper also redefined the package structure in the code and used Numpy vectorization operation to replace the loop which can speed up the algorithm.The experimental results show that the optimized MIDF code is faster than the original code by 20 ? 106 in training time and 7 ? 55 in test time.
Keywords/Search Tags:Machine Learning, Deep Forest, Deep Learning, Ensemble Learning, Multi-instance Learning
PDF Full Text Request
Related items