Font Size: a A A

User Profiling Technology Based On Social Media User Content And Behavior Data

Posted on:2019-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y M WeiFull Text:PDF
GTID:2428330545482438Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,large amounts of network information have been generated.User Profiling is in great need to help users select and filter network information and enhance user experience and user satisfaction.User Profiling is a portrayal of user characteristics based on the content,behavior and other information of the user on the Internet.In short,User Profiling is the tagging of user information.The tag is a refined feature generated by the analysis of user information.Mainly,there are two aspects of user information on the Internet: one is the social media content data generated by users,which contains the text information published by users on various online social media,and the other is the social media behavior data of users,that is,some interactive behavior information of users on the online social media.To achieve the tagging of user information better,this paper completes two User Profiling tasks using the content and behavior data of users on the social media:(1)Automatic extraction of user-content topic words based on classification.Through supervised learning,the automatic extraction of topic words can be regarded as a binary classification problem.N-gram candidate recognition technology,combined with non-controlled word-sampling techniques,is used to filter the candidate words.Select appropriate features according to the document set and use the support vector machine model training to get the classifier.And the generation of the feature vectors leverages the method of the weighted "Feature Set",which is the composition of element features that are variable in quantity.(2)User interest tag marking based on improved word co-occurrence degree and Behavior Cloud.Combine all the content data published by the user on the social networks into a pseudo-content document set and filter the candidate words using the two extraction factors: the relative document set frequency and the inverse document set frequency.Extract the user's topic words of the pseudo-content document set from the connected graph,which consists of the co-occurrence degree among the candidate vocabulary to convey the user's main ideas.Obtain the user's interest tags in the presence of the Behavior Cloud from his topic words of the pseudo-content document set and the related user tables generated from his behavior data.Compared to the model trained by the element feature,the model trained by the feature vector,which is generated by the weight Feature Set,has more significant improvements and advantages.Compared with the traditional method of word cooccurrence degree,the pseudo-content document set topic words extracted with improved word co-occurrence degree and combined with the user interest tags obtained from the behavior data,get a higher correct rate.At the same time,the use of Behavior Cloud to represent the user's interest tags has a significant advantage over traditional statistical methods in the text visualization.
Keywords/Search Tags:User Profiling, Topic Word, Interest Tag, Feature Set, Word Co-occurrence Degree
PDF Full Text Request
Related items