Font Size: a A A

Research On Combining Collective Classification With Active Learning

Posted on:2012-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ChaiFull Text:PDF
GTID:2178330332999592Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There are many real world systems in the form of the network, such as communications networks, financial transaction network, the network describes the physical systems and social networks, etc. Classification for the object on the networks has universal value. Network data is constituted by different types of entities and rich links between them. In mining task of network data, except using features of data itself, using information on these links we could analyze the whole network data more completely and precisely. Using links to classify network data has practical significance.Collective Classification is an effective method to classify network data. It utilizes dependencies between nodes, including the dependencies between a node's label and its own features, the dependencies between a node and its labeled neighbor nodes, the dependencies between a node and its unlabeled neighbor nodes. Active learning is a method to get better accuracy. It seeks the minimum node set which can maximize the classification performance by analyzing the graph structure contained in network data, and labels them previously. Generally, when the sample is very sparse in the classification tasks, the collective classification accuracy is lower. By reusing the links between the network data, active learning can improve the classification accuracy.This paper is about combining collective classification with active learning. This can be used to improve accuracy of collective classification in sparse sample. Paper is studied on the iterative classification algorithm (ICA) based on vector model, the loopy belief propagation algorithm (LBP) based on graph model, and some sorts of centralizations in graph structure for active learning. Based on the above, combining collective classification with active learning is proposed. In the experiments, I use data sets CITESEER and CORA, which come from the real literature library and contain rich references.First of all, this paper does researches on the choice of the local classifier, iteration termination conditions of ICA, and the learning process of Relational Markov Networks (RMN). By experiments in sample of different proportion, it is concluded that when the labeled nodes increase, the accuracy increases; when the labeled nodes are sparse, due to the lack of a sufficient number of labeled neighbors, performance of classification significantly decreases.Secondly, this paper does researches on degree centralization, betweenness centralization, closeness centralization, k-means centralization in graph structure. By using them in active learning, it seeks the minimum node set which can maximize the classification performance, and labels them previously. Two collective classification algorithms separately combined with different active learning method were tested in sparse sample. It is concluded that active learning increases accuracy in sparse sample; there is clear differences between the different methods, especially, the k-means heuristic function has the optimal performance.In summary, this paper studies some problems about combining collective classification with active learning. It has practical significance for improving the classification accuracy of network data in very sparse sample.
Keywords/Search Tags:Collective Classification, Active Learning, Markov Networks, Iterative Classification Algorithm, Loopy Belief Propagation
PDF Full Text Request
Related items