Learning in Relational Networks

Posted on:2015-10-03

Degree:Ph.D

Type:Thesis

University:George Mason University

Candidate:Saha, Tanwistha

Full Text:PDF

GTID:2478390020452280

Subject:Computer Science

Abstract/Summary:

Classification of nodes in relational networks is an important task because it involves applications in multiple areas that can impact people's lives on a daily basis. The inability to use traditional classification algorithms for classifying nodes in relational networks has encouraged researchers to develop a special class of methods, known as collective classification algorithms. During the training phase, collective classification exploits the structural information embedded in the network for jointly classifying the labels of all test nodes. Any relational model needs good samples for training in order to do better predictions on unseen test data. Hence, to do a fair evaluation of a model we should always make sure that the samples on which the model is trained, are good representation of the original dataset. However, unlike traditional machine learning on non-relational data where randomly selected samples are considered good enough for training a model, relational learning relies heavily on the method of sample selection. This is because, in relational learning information propagates from the training samples to the test samples through the link structure. Hence, a sampling method that is specifically tailored for evaluating collective classification algorithms is required. A remotely related concept to sampling is the process of acquiring informative labeled data for training. Labeled data comes with a cost because it involves human interaction. In order to minimize this cost, numerous active learning algorithms have been proposed by researchers. Although active learning methods have evolved over the years, not much had been done to deal with relational networks which are very common representation of many real-world datasets.;Relational network analysis can also be applied to improve the recommendation system framework. In traditional setting recommender system learns a model based on past ratings of users in the user-item rating matrix, and predicts the items to be rated by target users in the future. The most popular recommender system models are either neighborhood-based or latent factor based, both of which have witnessed a lot of advancements over the past few decades. Although the idea of incorporating concepts from relational classification into recommender systems seems far-fetched, in my thesis I have shown that it is quite feasible and effective in improving overall performance of the system.;There are few research gaps in the areas related to collective classification, which I have aimed to address in my thesis. To begin with, there have been a few models for multi-label collective classification, but they treat all the neighbors of a target node equally during label prediction. This is unfair because labels from influential training nodes are bound to have more effect on the target nodes, in comparison with other neighboring nodes. Sampling algorithms in networks are often aimed towards collecting samples that inherit the key characteristics of the original network. However, which feature is the "key'' depends on the task we are trying to solve. Hence, instead of developing a generalized sampling method, we should propose an algorithm that can propagate label information from training nodes to test nodes more effectively, thereby improving classification performance. Active learning for network datasets studied by researchers mostly involves single-labeled networks, without any insights on how those methods can be extended to deal multi-labeled networks. Additionally, querying multiple labels of a node in multi-labeled network involves more cost. Finally, the use of short texts or tags in recommender systems has been well-explored except that, researchers often assumed the availability of these tags during training a model. In my thesis, I have predicted tags for users as "preferences'' and tags for items as ''descriptors'' using collective classification in order to improve the overall performance of the system. In the past, there has been no work on incorporating concepts from relational learning into this paradigm to improve item recommendations. I aimed to bridge this gap by integrating predicted tags into state-of-the-art recommender systems.;Although a lot of research have been pursued in different directions to address all these previously discussed concerns individually, in my thesis I have tried to come up with a unified approach. I have developed a rank based influential neighbor selection method for collective classification in multi-labeled networks. This method ensures that all the labeled neighbors are not treated equally while assigning labels to target nodes. To address the sampling issue, I have developed an approach tailored for improving collective classification in single-labeled networks. The active learning algorithms proposed in my thesis, work for both single- and multi-labeled networks that do not have node features accessible during training. This is an important property of the algorithm because sample features are often not accessible in network due to privacy issues. I have also developed algorithms that use cost-per-label concept for querying labels from multi-labeled nodes during active learning. Going further, I have used collective classification of tags for user preference prediction and item descriptor prediction in recommender system. Often the number of users and items with known tags are very few. In such situations, I have successfully used my active learning models to learn classifiers for predicting multiple tags for both the users and the items in the system. I have tested all the models on several real world networks and the extensive experimental results show statistically significant improvements over multiple state-of-the-art baseline methods.

Keywords/Search Tags:

Networks, Relational, Classification, Nodes, Multiple, Active learning, Model, Method

Related items

1	Research On Methods Of Learning Statistical Relational Model
2	Multi-relational Classification: An Ensemble Learning Approach Based On Multiple Views
3	Learning Bounds And Applications Of Relational Classification Model
4	Research On Strategies Of Active Learning And Its Application To Image Classification
5	Research On Combining Collective Classification With Active Learning
6	The Classification Method Based On The Shortest Path Between Nodes Of Network Data
7	Research And Application Of Chinese Text Classification Based On Active Learning
8	Study On Key Technologies Of Active Learning In Division Classification Model
9	Research On Image Classification Based On LDA And Active Learning
10	Statistical learning from relational databases