Font Size: a A A

Research On Semi-supervised Classification Methods For Relational Network Data

Posted on:2014-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:R C ShiFull Text:PDF
GTID:2298330422490434Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, especially while online social media andmobile network are becoming more and more common in real world, there exists alarge number of relational network data where instances are no longer independentto each other and they are linked together. For instance, a microblog system can beviewed as a relational network data where nodes represent users and relationsrepresent friendships.Classification task for network data is known as Collective Classification (CC)problem and it has drawn much attention in past ten years. Exploiting and utilizingthe dependencies between instances in network data can increase classificationaccuracy essentially. For instance, friendship relations can be used for findingmicroblog users with similar interest and hyper-linked web-pages tend to possesssame topics. However conventional CC algorithms cannot deal withsparsely-labeled network data which is very common in real world applicationsbecause of expensive labeling effort.In this article we focus on semi-supervised CC (SSCC) problem and discussissues on dealing with sparsely-labeled network data:(1) We divide SSCC problem into three sub-problems, including learning oncontent features, learning on relation network, and combining content features andnetwork information together for classification.(2) We propose a heuristic algorithm constructing homogeneous network toutilize network information in semi-supervised setting. This heuristic algorithmsolves the second sub-problem of SSCC problem for network data with weekhomophily.(3) We propose a generative model with network regularization (GMNR) forSSCC problem in homogeneous network data. We develop a new generative modelbased on the Probabilistic Latent Semantic Analysis (PLSA) method using attributefeatures of all instances. Then a network regularizer is employed to smooth the labelprobability distributions on the network topology of data, thus linked instances tendto have same labels. Finally, we develop an effective EM algorithm to compute thelabel probability distributions for label prediction. Experimental results on three realsparsely-labeled network datasets show that the proposed model GMNRoutperforms state-of-the-art CC algorithms and other SSCC algorithms.
Keywords/Search Tags:relational network data, semi-supervised collective classification, generative model, network regularization
PDF Full Text Request
Related items