Font Size: a A A

Several Key Technologies And Applications Of Entity Alignment And Information Correlation Between Multiple Social Networks

Posted on:2019-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X ZhuFull Text:PDF
GTID:1368330623450385Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Online social networks have become popular in recent years,and some information entities from different social networks may represent a common object(user or item)in real world.For example,two user entities from Sina Weibo and Twitter respectively can represent the same user in real world.And the process of predicting whether two given information entities represent the same object is often called “entity alignment”.By using the inter-network entity alignment relationships,we can fuse the information from multiple social networks,and thus we can study social network problems more precisely and detailedly from more dimensions and perspectives,which is very important to the applications of recommending entities across multiple networks or tracking the criminals online.And the basic techniques of these applications are very similar.Basing on widely studying and analyzing the existing works,and focusing on aligning entities and fusing their information,this thesis firstly studies the techniques of aligning entities between multiple social networks,and then studies the techniques and related applications of using internetwork entity alignment relationships to fuse multiple networks' information.The main works are as follows:1.Proposing a multi-view approach based on naming behavioral modeling for aligning Chinese user entities across multiple networks.Chinese naming behavioral models are very complex,these models include using Chinese letters or using English letters,using traditional Chinese characters or using simplified Chinese characters,using polyphones or using homophones,and insert numbers or some special letters,etc.Considering these behavioral models,we firstly divided the pairs of Chinese usernames which need to be matched into three types,i.e.,both of them contain Chinese letters,only one of them contains Chinese letters,and none of them contain Chinese letters.And for different types of Chinese username pairs,we use different ways to preprocess them,such as converting Chinese letters to their Pinyin forms,converting traditional Chinese characters to their simplified forms,or deleting some special letters.In this way to get diverse converted forms for them.Secondly,for each type of Chinese usernames pairs,we creat a username matching model which match these pairs of usernames by comprehensively considering many different similarities computed from the diverse converted forms of them.These similarities include the Levenshtein distance based similarity,cosine similarity,Jarcard index,etc.We notice that a user entity may have many different usernames,e.g.,user-id,nick name,and prior name.So for each type of usernames,we use the created username matching models to compute the username similarities between user entities.By assigning different types of Chinese username matchings with different weight,we construct a multi-view model MCUA to align Chinese user entities across multiple networks.The experiments prove that when aligning Chinese user entities between the collected data sets of Sina Weibo and Twitter,MCUA can outperform many famous username matching methods,such as OM-LR,and the content-based method.2.Designing the constrained active learning methods for aligning user entities across multiple heterogeneous social networks.Here,“constrained” means there exists a constraint which restrict that each user can have at most one entities in each social network,and even if one user had multiple entities in one network,these entities have also been aggregated in advance to form one unique virtual entity by the intra-network user entity alignment methods.Given a user entity alignment model,active learning aims at acquiring the most effective training sample set in a limited cost to train it,by analyzing the entropy(which denotes to the amount of contained information)for each of the unlabeled inter-network user entity alignment relationship samples,and select the samples with largest entropies to be the training samples.And to our constrained active learning methods,according to the constraint,in a given sets of the unlabeled user entity alignment relationship samples between two networks,when the label of one sample is identified as positive(i.e.,the user entities aligned by it represent the same user),the labels of the other samples which incident to it are automatically labeled as negative(i.e.,the user entities aligned by them represent different users)and added to the training sample set.Moreover,when computing the entropies for these predicted positive samples,the entropies of their related negative samples can also be taken into considerations.We use randomly collected user entity alignment relationships between Twitter and Foursquare as the experiment data sets,and the experimental results shows that our proposed constrained active learning methods can outperform the traditional training sample collection methods(e.g.,the randomly sampling method)and acquire the most effective training sample set in a limited cost.The related works are published on the SCI journal SENSORS and WSDM2017 which is one of the top data mining conferences in the world.3.Proposing the information fusion method for cross-network collaborative recommendation.Between any two item entities aligned by the inter-network entity alignment relationship,this method aims at transferring the item information which is needed by them,in this way to bridge and fuse two different networks information.While the transfered information includes the item entity similarity information and the item entity latent semantic information.Fusing the item entity similarity information is relatively simple,for any two items in the real world,we compute their entity similarities in these two networks,and select the similarity computed by more user-item ratings as these two items' similarity,in this way to fuse two networks' item similarity information into its related real world item similarities.However,fusing the entity latent semantic information is more complicated,we should firstly conduct matrix factorization on the user-item rating information in each network to get the item entity latent semantic information,which reflects the main features of the item entities and adapts to be transferred between networks.And then we fuse the item latent semantic information of these two networks by restricting that the latent semantic information of any two item entities,which come from different networks but represent the same item,to be as similar as possible.And in the process of fusing the item latent semantic information,we use a domain adaptation matrix to overcome the domain differences between these two networks,such as using different official languages.Finally,we apply the information fusion method to the cross-network recommendation problem,and propose a Cross-network Collaborative Matrix Factorization(CCMF)model,in this way to alleviate the sparse information problem which may exists when using only one network's information to conduct recommendation.We conduct experiments on the clawed user and item information from Douban Movie and IMDb,and the results show that by properly fusing different networks' information,CCMF outperforms many famous recommendation methods,such as LMF,CST,and SimMF-I(i),on dealing with our studied sparse information problems in the recommendation systems.The related work is published on CIKM 2017,which is one of the top data mining conferences in the world.4.Proposing the information fusion method for cold start recommendation across multiple heterogeneous social networks.In order to tackle the cold start recommendation for the newly imported items which haven't got any user-item ratings in the target network,we firstly conduct matrix factorization on the sufficient user-item rating information in the other network,in this way to extract these items' related item entity latent semantic information,which has lower dimensions,and transfer it to the target network via the inter-network item entity alignment relationships.Secondly,we fuse the item latent semantic information of these two networks by restricting that the latent semantic information of any two item entities,which come from different networks but represent the same item,to be as similar as possible.Finally,we apply this information fusion process to the cold start recommendation for the newly imported item entities in the target network.And then restrict that the latent semantic information of two similar item entities should also be similar.Here,the utilized item entity similarity information is extracted from the sufficient heterogeneous relationship information in the target network by the meta-path based similarity computing method.In this way,we create our cross-network cold start recommendation model CHRS.We conduct experiments on the clawed user and item information from Douban Movie and IMDb,and the results shows that when dealing with our studied cold start and semi-cold start recommendation problems,CHRS outperforms many famous recommendation methods,such as Amp-MF,CST,and SimMF-I(i).The related work is published on the SCI journal IEEE ACCESS.
Keywords/Search Tags:Multiple Social Networks, Entity Alignment, Active Learning, Recommendation System, Matrix Factorization
PDF Full Text Request
Related items