Font Size: a A A

Discovering User Interest On Twitter With A Hierarchical Clustering Model

Posted on:2016-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhangFull Text:PDF
GTID:2348330536967421Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of online social networks,social networking platforms(such as Twitter,Facebook,Weibo,etc.)has gotten an explosive growth in recent years,social networks have a significantly influence to all aspects of people's daily life.People communicate with each other,record their life,publish blog,share photos,video,etc.on social networks.Social networks have become a map of people's real life.However,with the development of the micro-blog platform,the amount of information is growing at an explosive speed,which leads to information explosion.Users are around by too much information,some of which are useless to them.They have to find out what they are interested in reading from a large amount of information.In this paper,we try to find the blogs users' interest and put them in the front.This paper mainly completes the following work:First,this paper is aimed at the characteristics of the noise in the micro-blog platform,The concept of topic-tweet is carried out.Twitter tweet data is used to train the LDA model as a feature of the tweet vocabulary,a vector-supported machine was built with the combinations of characteristics of the tweet vocabulary,the social features and the grammatical characteristics of the tweets.Experimental results indicate that the classifier has higher accuracy and recall rate,which can meet the requirements of the system.Second,based on the above method,extract the user's blog tweets,and use the search engine and the external knowledge base to expand the key words.Then,the Word2 Vec model is trained by using the offline Wikipedia corpus,and the user's blog tweets are mapped to high dimensional vector space.Then,hierarchical clustering method is used to cluster the user's blog tweets.The 3 clusters of words are selected by the purity of the clusters and the weights of the clusters to characterize the user's interest.The validity of the algorithm is verified by experiments.Finally,utilized the non-topic tweet filtering technology,the blog keyword expansion technology,the blog vocabulary to quantify technology,user interest discovery technology,we designed an online user interest discovery and personalized recommendation system.And analyze the design and realization of each module of the system,including the data acquisition module,data preprocessing module,user interest discovery module and so on.
Keywords/Search Tags:interest discovery, tweet filtering, feature extension, short text clustering
PDF Full Text Request
Related items