Font Size: a A A

The Design And Implementation Of On-time Social Media Analytics System

Posted on:2014-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:T SiFull Text:PDF
GTID:2268330422452008Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the further popularization of the Internet and the information’s enrichmentover it, it is an indisputable fact that the Internet has become a new form of media.Many social networking sites get visits no less than the traditional media. Facebookand Twitter has become the representative of social networking sites, they havebecome a new force for the dissemination of information, and the tremendousenergy they exude makes the field of communication dissemination go t a new term-Social Media. Meanwhile, the data generated by the social media also contains ahuge quantity of information. In this paper we built up a real-time social mediaanalytics system based on social media data on Twitter site, which doing somestatistical analysis on users’ social media data. The analytics system used TwitterStorm as a platform, with Twitter APIs and Python NLTK and some othertechnologies; the system extracted keyword and did sentiment analysis on users’social media data, providing users with recommendations and references.Firstly, defined the system requirements based on system’s design goals, made itclearly of system’s functional goals, and then proposed performance goals.Identified technical solutions according to the system’s scenarios andrequirements, the system used Twitter Storm real-time computation system toachieve rapid social media data processing, and continuous and real-time computing;used Python NLTK for keyword extraction and sentiment analysis tasks.Then according to the requirements definition, the system was divided into7modules, thus Twitter Streaming API adapter module, stream computing and shorttext analys is module, data maintenance module, network structure analysis module,content pushing module, configuration and log module. The previous five moduleswere mainly used for system functional goals, the configuration and log moduleincreased system availability. Stream computing and short text analysis moduleincluded the execution logics of Twitter Storm and the logics of text analysis basedon Python NLTK. With the advantage of Twitter Storm on processing stream data,the system can process social media data rapidly, used Python NLTK to extractkeywords of Twitter data, used Naive Bayes classifier to achieve sentiment analysis of social media data, with its simple, efficient and relatively reasonable computingmodel and computing results, Naive Bayes classifier can improve system’scomputing quality and response time; utilizing network node similarity and Q valuealgorithm, network structural analysis module achieved community division foruser groups, the node similarity calculation used matrix operations to get a topologyon the similarity of all nodes, while the Q value calculation used this topology to geta relatively reasonable community division; system configuration module used thelocking mechanisms supported by Zookeeper to ensure configuration completeness.Finally, made test for each module and the whole system. Test each module ofthe system and the whole system on both function and performance. Results showedthat each module met the requirements, system’s response speed and operatingresults met the design goals.
Keywords/Search Tags:Social Media, Twitter Storm, Python NLTK, Community PartitioningAlgorithm
PDF Full Text Request
Related items