Font Size: a A A

Research On Multi-instance Learning Algorithm Based On Graph Structure Features

Posted on:2022-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:J Q BiFull Text:PDF
GTID:2518306512953379Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of machine learning,the main research of multiinstance learning is more and more in-depth,and its application in the real world is more and more extensive and diversified.At present,the main research direction of multi-instance learning is to improve the performance of the model on the premise that the algorithm can correctly classify the unlabeled bags.The correlation between instances and the representative instances in the bag are very important to optimize the performance of the model.Therefore,it is one of the mainstream ideas to use graph data structure to represent the instances relationship in each bag.According to the level of multi-instance learning research,the existing research can be divided into two categories:based on bag feature and instance feature,and the multi-instance learning based on instance features can be divided into two categories:based on vector feature and graph structure.Based on the analysis of the above research methods advantages and disadvantages,this paper proposes two multi-instance learning algorithms MIL-GCC and MIL-FTSBN from two levels of mining the correlation between instances and building graph classifier directly on the bag graph structure,which can improve the model performance in different application scenarios.The main work and innovation of this paper are as follows:(1)A multi-instance learning algorithm of graph convolution based on clustering(MIL-GCC)is proposed,which uses graph structure to represent the correlation between instances in the bag,and directly establishes graph classifier on the bag graph structure,so as to solve the problem of randomly selecting instances from bags to construct graph structure and low classification accuracy.Firstly,the super-instances in each bag are obtained by clustering as the nodes in the bag graph structure.Secondly,mining the relationship between the super-instances to construct the edges of the bag graph and determine the structure of the bag graph.Finally,the importance score of the nodes in the bag graph is learned by graph convolution,and the nodes of Top N before the importance score ranking and the bag graph structure composed of these nodes are selected as the basis of model classification.(2)A bag graph structure reduction algorithm based on DFS is designed.Firstly,the node with the highest influence degree in the labeled graph is regarded as the root node of the maximal frequent item subgraph.Then,the Depth First Search(DFS)algorithm is used to traverse the nodes corresponding to the edge related to the root node in order to obtain the support of the current graph structure,and the graph structures that is higher than the preset minimum support threshold are selected as the frequent item subgraph.Finally,whether the current frequent item subgraph is the maximal frequent item subgraph is judged according to the support range of the adjacent nodes of the new frequent item subgraph,and output the maximal frequent item subgraph set that meet the conditions,which is used for input the graph classification model.(3)In this paper,a multi-instance algorithm of frequent term subgraph based on Bayesian network(MIL-FTSBN)is proposed.The structure of bag graph is reduced on the basis of using graph structure to represent the correlation among instances in the bag,so as to reduce the influence of complex bag graph structure on the generated model and improve the classification accuracy of the model.Firstly,the weights of instances in each bag are self-learning by Bayesian network,and the instances whose weights are higher than the set threshold are selected as representative instances in the bag.Secondly,the representative instances are regarded as nodes in the bag graph structure,the adjacency relationship between the representative instances is regarded as the edge,and user-defined the importance of the nodes and other nodes is regarded as the weight of the edge,so as to construct the bag graph structure.Then,the structure of the bag graph is reduced by bag graph structure reduction algorithm based on DFS,and the maximal frequent item subgraph is generated to eliminate the structure information of the bag graph which is not related to classification.The algorithm proposed in this paper is verified and analyzed experimentally on public datasets.Experimental results show that MIL-GCC and MIL-FTSBN can improve the classification accuracy of the classifier and effectively optimize the quality of the model.
Keywords/Search Tags:multi-instance learning, clustering, graph structure, Bayesian network, frequent term subgraph
PDF Full Text Request
Related items