Font Size: a A A

Study Of Inferring The Missing Information In The Information Network

Posted on:2016-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:L L WuFull Text:PDF
GTID:2298330467492843Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The "information networks" are the interactive networks generated by the interactions between individuals using the internet applications. Newman first proposed "information networks" in his paper to refer to the networks in which nodes contain a wealth of information. Users in information networks on one hand expect the network to provide more convenient service including content recommendation or friend recommendation, while on the other hand refuse to fill out their personal information or unveil their social relationship for the consideration of privacy protection. Therefore, attribute inference and link prediction become two major tasks in information network mining in order to solve the contradictions mentioned above. Link prediction is aimed at predicting whether there will be a link between two nodes or retrieving the missing links. And the attribute inference targets at inferring the missing attribute or determine the authenticity of known attributes.So far, most of the attribute inference algorithms rely on using the structural or content information to build a model with sufficient labeled training data. However, labeling data is usually very expensive and time-consuming. That’s the reason why supervised machine learning models is hard to apply in real applications. What’s more, most of the existing work considered link prediction and attribute inference as two different problem. However, according to homophily, a strong connection exists between these two problems.Faced with these problems, we firstly propose a two-phases model to tackle the attribute inference problem with limited labeled training data constraint. In the first phase, the model utilizes a community-detection-like algorithm to extend the labeled data. For the second phase, the model adopts supervised random walk to make the most of structural information as well as content information to effectively infer the missing attributes.Secondly, we propose the idea of using the community information in the information network to solve the attribute inference problem and link prediction problem at the same time. Our method uses SAN network (social attribute network) to combine network structural information with user attributes and tackle the two problems through community information at the same time. According to homophily, attributes information of users and the link information are mutually reinforcing. Therefore, we propose a iterative framework which allows the attribute inference process and the link prediction process to enhance each other.Finally, the experiments on two real datasets validate that our model has a better performance than other well-established methods. The experiment results also indicate that the community information can not only be used to solve the insufficient labeled dataset problem, but also be used to integrate the attribute inference problem and link prediction problem into a unified framework. Moreover, both the supervised random walk and the random walk in SAN network are able to measure the similarity of nodes based on network structure and node-generated content.
Keywords/Search Tags:attribute inference, link prediction, community detection, supervised random walk
PDF Full Text Request
Related items