Font Size: a A A

The Study And Implementation Of Micro-blog User Interests Detection System Based On Labeled Lda

Posted on:2015-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2298330434950612Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Micro-blog is a relation based platform for information sharing, communication and access, with the characteristics of simple content, strong interactivity and low useage threshold, so that it made a growth spurt in our country. As a popular social networking service media, user interests research based on micro-blog has quickly become a major research topic. Here are the reasons:firstly finding interesting micro-blog accounts and information is the most important activity most micro-blog users do, so the micro-blog platform should recommend those information accurately based on every user’s interest; secondly the user interests detection system is the base to realize precise advertising, and interests mining accuracy is directly related to the effect of advertising and the micro-blog platform profits.In this project, the author learned the algorithm of traditional text classification, which often use vector space model for text feature representation, expanded unsupervised non-hierarchical topic model LDA and realised the supervision non-hierarchy topic model Labeled LDA to identify Sina micro-blog’s user interests. This paper talks about the key issues of detecting the user’s interests, mainly in the following three aspects:(1) Customize Scrapy, a crawler frame written in Python, to have a web crawler to crawl Sina micro-blog. Surpassing the limit from API, get the micro-blog text concurrently, and obtain plenty of experimental data for the research work.(2) Study the text mining and text classification technology, use the Labeled LDA in system, a supervised, non hierarchical topic model. Using the texts from micro-blog theme account to train the model, predict the other micro-blog users’ interests.(3) Conside the scene of massive data, the author use the distributed framework like Hadoop and Hive to implement the distributed segmentation and preprocessing. The data of users’ interests has been successfully applied to generate the word cloud for individual user, adjust and optimize the search results and advertising and so on.
Keywords/Search Tags:Interest Point Detection, Text Classification, LDA, Labeled LDA
PDF Full Text Request
Related items