Font Size: a A A

Research On Cross-domain Data Mining Based On User Interests For Social Media

Posted on:2021-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:1368330605481237Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of social media and the increase of the number of users,the amount of data has exploded.It is difficult for users to extract useful infor-mation from massive data.Meanwhile,users often interact with multiple social media to better enjoy different types of services,and the generated data are dis-tributed across domains(i.e.,social media)and related to each other.The tra-ditional data mining methods are mainly designed for single-domain scenarios.Due to ignoring the data in other domains,these methods suffer from data spar-sity.By aggregating fragmented data in different domains,the cross-domain data mining methods not only supplement the missing information in a single domain,but also more comprehensively mine the value in social media data.Existing cross-domain data mining methods mainly utilize a single type of data or simplify the correlation between cross-domain information to achieve cross-domain joint mining.These methods will be difficult to apply to the scenarios where cross-domain correlations are more complex due to the heterogeneity,anonymization and dynamicity of cross-domain data in the actual application.User interests are actually the intrinsic driver of data generation.Modeling user interests can help social media applications fully describe complex cor-relations between cross-domain information and achieve comprehensive inte-gration of cross-domain data.Therefore,the study of new cross-domain data mining methods based on user interests in social media can not only effectively solve the challenges brought about by the heterogeneity,anonymization,and dynamicity of cross-domain data,but also play an important role in promoting practical applications of cross-domain data mining.The thesis is supported by the sub-topics of Beijing Municipal Education Commission Co-construction Project,i.e,“Research on information dissemi-nation and evolution mechanism of heterogeneous information network based on big data”and“Research on social-sensed cross-media data analysis and mining”.Focusing on the key issues that affect the performance improvement of cross-domain data mining methods caused by cross-domain data heterogene-ity,anonymization and dynamicity in actual cross-domain application scenar-ios,the thesis studies new cross-domain data mining methods based on user interests in social media.By modeling cross-domain user interests and min-ing complex cross-domain correlations,the thesis improves the performance of cross-domain data mining methods and lays a foundation for the practical application of cross-domain data mining research.The detailed research and achievements are as follows:(1)Focusing on the problem of the complex correlations between cross-domain information caused by the heterogeneity of cross-domain data,a cross-domain user identity linkage method based on user interests with heterogeneous cross-domain data,is studied.In order to uniformly capture correlations be-tween cross-domain data from heterogeneous feature spaces,a cross-domain user identity linkage method based on linked heterogeneous network embed-ding is proposed.The method designs a linked heterogeneous network to de-scribe the complex cross-domain correlations,and achieves the fusion of hetero-geneous data by jointly capturing intra-network and inter-network user-interest information based on content topics and social relations in the same interest space,which can help learn the complete representations of cross-domain user interests.Meanwhile,a joint training algorithm based on negative sampling is designed to alternately train heterogeneous relations for further improving model performance and training efficiency.The experimental results show the performance of the proposed method increased by at least 19%compared to the method that assumes that different types of cross-domain information are independent of each other,which proves that the performance of cross-domainuser identify linkage methods can be effectively improved by fully mining the correlations between different types of cross-domain information.(2)Aiming at the problem of insufficient correlations between cross-domain information caused by the anonymization of cross-domain data,a cross-domain recommendation method based on user interests with anonymous cross-domain data,is studied.Since users only obtain a small amount of behavioral data when they are anonymous,a cross-domain recommendation method based on cross-domain heterogeneous relation embedding is proposed for supplementing the missing related information.Firstly,the method uses the co-clustering algo-rithm to mine cluster-level inter-domain links to supplement the inter-domain correlations.Then,the correlations between cross-domain information is fur-ther enriched by jointly embedding the cross-domain heterogeneous item-item and item-cluster relation.Finally,a complete cross-domain interest representa-tion can be obtained by combining the cross-domain fragmented information.In addition,the model can be efficiently optimized and trained by designing a joint training strategy based on heterogeneous relations.The experimental re-sults show the performance of the proposed method increased by at least 11%compared to the traditional single-domain recommendation and cross-domain recommendation methods with anonymous users,even when 50%information is lost and users are anonymous.It proves that the missing cross-domain infor-mation can be compensated to improve the recommendation performance by fully mining hidden cross-domain correlations.(3)For the problem of complex dynamic correlations between cross-domain information caused by the dynamicity of cross-domain data,a cross-domain rec-ommendation method based on user interests with dynamic cross-domain data,is studied.Firstly,considering the differences between the cross-domain global dynamics and the single-domain local dynamics of cross-domain information,a cross-domain recommendation method based on the hierarchical recurrent neu-ral networks,which jointly models the cross-domain global evolution patterns and single-domain local dynamic patterns of user interests for fully capturing the dynamic correlations of user interests,is designed.Secondly,in order to solve the problem of losing some links between single-domain behaviors when capturing dynamics of cross-domain information,a cross-domain recommen-dation method based on cross-domain recurrent-gated neural networks is pro-posed.The method can effectively recover lost single-domain behavioral links from long-term and short-term interest perspectives,while captures the global evolution patterns and local dynamic patterns of user interests.The experimen-tal results show the performance of the two proposed methods increased by at least 8%compared to the existing single-domain recommendation and cross-domain recommendation methods based on dynamic data,which proves that the recommendation performance can be effectively improved by accurately capturing the dynamic correlations of cross-domain information.
Keywords/Search Tags:social media, user interests, cross-domain data mining, cross-domain recommendations, cross-domain user identity linkage
PDF Full Text Request
Related items