Font Size: a A A

Feature Selection Algorithm Research For Multi-label Graph Data Base On HSIC

Posted on:2018-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:C W LiFull Text:PDF
GTID:2348330536470601Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As one of the most common used data structures,graphs can be used to represent the complex relationships between data objects.so it is widely used in many fields.With such advantages of graph data,graph data classification based on graphs is an important branch of data mining.In multi-labels task,each graph data can be assigned with a set of multiple labels simultaneously.Usually,in the multi-label data,there are a lot of redundant and irrelevant feature data.It takes a lot of time to process these data and reduce the classification performance.Therefore,multi-label data should usually be selected and then classified.However,the traditional classification method places the main research direction on the single-label classification(binary classification)problem and assumes that each graph has only one label.For single label classification,the traditional feature selection method can be extended and used to find the best(most robust)subgraph feature in a single label map dataset.However,in the multi-label classification problem,each graph has multiple labels,and many feature subgraphs need to be excavated.so the traditional feature selection algorithm can't be directly used for multi-label feature extraction.Based on the HSIC evaluation criteria,this paper proposes an evaluation criterion for evaluating the usefulness of multi-label subgraph feature set according to the existence of some correlation between multiple labels.The main research contents are as follows:(1)For the practical application,the number of subgraph features of the training atlas is very large.In order to avoid detailed list of sub-graph which results in the algorithm time complexity is too high,we put the evaluation criteria to the subgraph pattern mining step by setting an upper bound as a constraint,and prune the search space by constraint.Therefore in the fourth chapter,we propose a method of calculating the upper bounds according to the correlation between the labels.(2)For the characteristics of multi-tabular data,that is,a sample belongs to multiple categories,this paper proposes a feature selection algorithm for multi-tabular data based on HSIC evaluation criteria,which is used to evaluate the correlation between samples and labels.The evaluation criteria is then introduced to the subgraph mode mining step,and the subgraphs are evaluated in the process of prune the search space to achieve the purpose of feature selection.(3)In this paper,an alternative optimization algorithm is proposed to solve the problem of algorithm optimization.In the objective equation of the algorithm,there are two variables: subgraph g and label weight ?,but the predecessors' research shows that there is no global optimal result.Therefore,this paper uses the alternating optimization algorithm.Firstly we fixed variable ? to do optimization for g to select the optimal t subgraph.Secondly the variable g is fixed and the optimization for ? is carried,so that the weight of the subgraph label is optimized on the basis of the selected subgraph.The proposed feature extraction algorithm is combined with the two classifier SVM and the multi-standard classifier Boos Texter,and the comparison between traditional feature selection methods is based on NCI and PTC data sets.The experiment shows that the proposed method has higher accuracy rate.
Keywords/Search Tags:Graph Data, Evaluation Criteria, Multi-label, Feature Selection, Branch-andBound
PDF Full Text Request
Related items