Font Size: a A A

Implementation Of Microblog User Classification And Recommendation System Based On Flask Framework

Posted on:2018-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:S L ZengFull Text:PDF
GTID:2348330515960117Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet Information Age,information data show explosive growth,and the way of transmission show more diversified.The microblog,as the representative of the social media,has become an important platform for the spread of social hot events.In the process of studying the spread of hot events,the defining of user's behavior and their characteristics in the event becomes a issue that people pay more and more attention when analyzing public opinion and digging potential customers.Due to the hot events characteristics include rapid updating,large amount of data and complex content categories,we need to study a fast and efficient data acquisition model to track the hot events,and collect the relevant user information and microblog content information into the database meanwhile.Microblog users classification can be re,garded as the problem of user tag mining and label similarity calculation.So the progress of generating user's tags through user-related features is the key to achieve user classification.At the same time,in order to avoid the impact of machine users on the classification results,we need to filter out unrelated users,especially the spammers,to enhance the accuracy of the user classification recommendation.In view of the previous introduce,the main job and research for the implement of the system are as follows:1.By comparing the performance of various web crawler frameworks and studying the anti-crawling strategy of microblog,a distributed crawler combining Selenuim and Scrapy framework is proposed to improve the stability of data acquisition.Besides,we improve the speed of data acquisition by using multi-thread.2.So far,the research of spammer recognition mainly has the following problems:(a)the traditional communication model is not accurate enough;(b)With the continuous stronger of spammers 's self-disguise ability,their behavior and attribute characteristics tend to normalization;(c)the production of microblog data is too large and high dimension.According to above problems,a new identification model based on the characteristics of the sender is proposed to filter the spammer,and the introduction of the feature factor also includes the information integrity,emotional volatility index as well.By comparing the performance of the spammer recognition after adding the introduced feature through the classification algorithm include Naive Bayesian,Bayesian network and random forest.3.A user classification model based on personality tag and relation chain is proposed.Find the user's interest as part of the user's feature through the user interest mining method.Then Take users label from the relationship chain as another part of the feature.Combine the two parts of feature into a new user tag vector,and Calculate the user label vector similarity to achieve user classification afterwards,providing a more reliable basis for customized user recommendations.4.Using the Flask framework to design a set of network hot event mining and user classification recommendation system to achieve the whole process,which may display the analysis results more intuitively and provide a platform the mining of customers conveniently.
Keywords/Search Tags:Microblog crawler, Selenium tools, Spammer detection, User classification, Flask framework
PDF Full Text Request
Related items