Analysis Of Micro-blog User Influences Based On Hadoop

Posted on:2019-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2428330542996910

Subject:Computer science and technology

Abstract/Summary:

PDF Full Text Request

With the increase of the number of Internet users and the growing influence of social media,micro-blog as a more influential social platform has attracted more and more public participation,especially the joining of industry celebrities and popular stars,which strengthens the social influence intensity and breadth of micro-blog.At present,the number of Internet users continue to increase,the Internet traffic surge in the Internet era,application of large data to obtain data,analysis of data has become one of the hot spots.There are amounts of original data in Sina micro-blog,therefore,analysis of the original data and found its potential value has been a foundation of learning more about users,which provides strong technical assistance for the enterprise's precise marketing and commercial promotion.This project collects micro-blog data sets and classifies users based on user influence,which provides theoretical basis for individual service.There are three modules in the project,including modules of data crawling,data import,and data analysis.The content of data crawling is to design the architecture of data crawling and collect user data and micro-blog data.The content of data import is to research data access to Hadoop cluster with multiple data source formats and structures,which could improve the robustness and efficiency of data acquisition.The content of data analysis is to extract user characters firstly,design the model of user influence,and get user clusters based on the algorithm of K-MEANS with three dimensions including CMI(the content's charm index of micro-blog),user behavior infactor,and transfer depth infactor.The main architecture of the collector is Scrapy framework based on the language of Python,which includes the module of proxy IP to solve the anti-climbing limit of target websites.The main architecture of data import module is to design data access of heterogeneous data source,using Sqoop,the language of Shell,Apache Flume to achieve data access of relational database,non-relational database and file system to HDFS or Hive.The main architecture of the data analysis is to extract user and micro-blog characters,including the number of fans,followers,likes,comments,forwarding,and the forward depth.Then get the clusters num K by implementing the computeCost interface and get the clustering results using K-MEANS core algorithms based on Spark MLlib.The main achievement of this paper is building a distributed data crawling and analysis platform,based on which designing data access procedures loading data to Hadoop clusters,and finally analysis the influence of sina micro-blog users.This paper provides methods for data crawling and access,and provides relative theoretical basis for business marketing.

Keywords/Search Tags:

Hadoop, Distribution, Data Crawling, Data Access, User Clustering

PDF Full Text Request

Related items

1	Research And Implementation Of User Clustering Algorithm For Telecom Big Data Based On Hadoop
2	Research On The User Electricity Characteristics Based On Big Data
3	Research On Technologies Of Efficient Data Access Based On Hadoop
4	The Study And Implementation Of Efficient And Stable Methods For Data Crawling In Vertical Search Engines
5	User Identification And Interest Analysis Of Internet Access Log Data
6	Performance Monitoring And Analysis On Hadoop-based Data Analysis Platform
7	The Study Of Meteorological Data Acquisition And Data Dining Platform Based On Hadoop
8	The Research And Implementation Of Data Access And Distribution System Based On IoT
9	The Design And Implementation Of USER And LSTG Part Of EBay Hadoop Migration System
10	Design And Implementation Of User Portrait System Based On Bank Client Data Analysis