Design And Implementation Of Weibo Data Mining System Based On Hadoop Platform

Posted on:2019-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Liu

Full Text:PDF

GTID:2428330551960312

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Big data,at the new stage of informatization development,have been applied into governmental public service,finance,court decision,medicine,game,tourism and any other hot industries widely.As a software framework for distributed processing of big data,Hadoop computing platform could help users easily frame and develop distributed programs so as to achieve the high-speed computation and storage of big data.Weibo generates billions of Weibo text data each day,which is the most valuable source of big data.Therefore,this thesis chose Weibo data as the research object,designing to achieve an automatic Weibo data mining system based on Hadoop platform,and the thesis also analyzes vast Weibo data to acquire the valuable information hiding behind of it.The main research contents of the thesis are as follows:(1)Data collection.The thesis redacted distributed a concurrent frame according to the module of the producers and consumers.And the thesis also designed to achieve Weibo data collection system based on Python,and this system deployed on Linux system,which carry on the real-time collection for the recent original Weibo data from a mass of high-quality Weibo users.Through the preprocessing of data,it could be saved to the native each several time in the form of text file.(2)Data storage.It based on batch data preserved at native,utilizing distributed file system HDFS and data warehouse Hive,designing data district-divided form,redacting Linux users' timed task,establishing data part automatically,redacting script file and uploading Weibo data saved at native to Hadoop cluster automatically,importing into the corresponding part in Hive storage according to data,and it is put into the Kafka information queue storage,based on real-time collection data.(3)Data analyzing.Measuring the heat of a Weibo based on the number of times for comments,reposts and giving like to it.To analyze and find out the hot Weibo,the Weibo data in HDFS is batch processed by Hive-SQL and the Weibo data in Kafka information queue is real-time processed by Spark Streaming.Basing on LDA theme model arithmetic,it finds out the hot topic peoples talking about at present time.(4)Result showing.With the help of Kdp-report system,it designs the report structure,and extracting data analysis result from Hive data warehouse.The result manifested in webpage visual form,so that users could browse hot Weibo and hot topics in time.

Keywords/Search Tags:

Weibo, Hadoop, Hive, Spark, LDA

PDF Full Text Request

Related items

1	Agricultural Product Price Analysis And Forecast System Design Based On Hadoop+Spark Platform
2	Design And Implementation Of Massive Web Log Analysis System Based On Hadoop/Hive
3	Spectral Clustering Algorithm Based On Spark And The Application On QAR Data
4	Design And Implementation Of Advertising Business Data Management Platform
5	The Design And Implementation Of Network Authentication System Based On Hadoop/hive
6	The Research Of Weibo User And Weibo Influence Ranking Based On Hadoop
7	Design And Implementation Of Agricultural Product E-commerce Data Warehouse Analysis And Evaluation System Based On Hive On Spark
8	Design And Implementation Of Hive On Spark Dynamic Partition Pruning
9	Design And Implementation Of NetEase Mobile Big Data Support Platform Based On Spark And Hive
10	Design And Implementation Of Recommender System Based On Hadoop Platform And Spark Framework