The Design And Implementation Of We-media Hotspots Mining System Based On Hadoop And R

Posted on:2016-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:R F Zhu

Full Text:PDF

GTID:2308330473957224

Subject:Information security

Abstract/Summary:

PDF Full Text Request

As a new media forms of communication which is different from mainstream media, the We-Media is accepted by more and more people. On the background of personalized society, there is a growing emphasis on independent thinking of things.The public is more likely to make judgments which is based on objective facts and their own thinking rather than to be informed of a unified voice. Therefore, the information from We-Media can reflect the present social concerns and direction of public opinions better.In this context, mining hotspot topics from nowadays we-media timely will be used to help people’s production life, such as individual investment behavior, travel behavior of the public, as well as to optimize the government who can guide the public opinions.Facing the overwhelming information from We-Media on the internet, the traditional data mining way can not do the job of collecting and processing vast amounts of information. This phenomenon requires an efficient, real-time processing way to produce positive economic and social value. Hadoop has an efficient performance in a distributed mass data storage and processing. The use of Hadoop can effectively solve the problems caused by the traditional data mining way and to help us to extract valuable information in vast amounts of information from We-Media. R language specializes in statistics, computing, graphics, it is focused on the analysis of the sample data.R language can be competent for the work of text classificationThis subject is a joint project between laboratory and research institute which is controlled by the ChengDu Economy and Information Committee. It focuses on the research and implementation about We-Media information mining system which is based on Hadoop and R language. In this paper, we begin with the data collection,storage, mining, analysis, display, and various other aspects, describing the entire course of the study in detail. The main works of this paper are as follows:1.Grasping the information from We-Media site by Nutch Crawler. Then I process the site contents by the Parser tools.Then I store the text information in accordance as the XML format.2.Processing the word text.Then I do the word segmentation, word characterizing and extraction.Then I use TF-IDF weighting of entries to form the vector space finally.At last I lay a foundation for subsequent classifying and clustering.3.Using Java environment and R language,Getting through their boundary. Then I use the Java language and call R language to calculate the vector space in order to classify information. Then I do the vector clustering by the Mahout framework to get hot information.At last,I will show the result charts.4.Using SpringMVC to build a hot spot display system which is based on J2 EE.The system will provides hotspot classification, hotspots display, hot trend statistics...

Keywords/Search Tags:

We-Media, data mining, Hadoop, R language

PDF Full Text Request

Related items

1	The Research And Implement Of Data Mining Algorithms Based On Hadoop
2	Based On Hadoop Electric Offline Patterns Of Data Mining System Design And Implementation
3	Research And Implementation Of Mining Association Rules For EMU Failure Data Based On Hadoop
4	Research And Implementation Of Integration Of R Language And Hadoop
5	Application Of ETL Component In Distributed Data Mining Engine Based On Hadoop
6	The Research Of Clustering Mining Based On Logistics History Data On The Hadoop
7	Design And Realization Of A Online Data Mining System Based On Hadoop
8	Research And Implementation Of Big Data Analysis And Mining Technology Based On Hadoop In Telecommunications Industry
9	The Application Of Hadoop Based Data Mining In Telecom Customer Analysis
10	Research And Implementation Of Marine Information OLAP And Data Mining System Based On Hadoop