Font Size: a A A

The Design And Implementation Of We-media Hotspots Mining System Based On Hadoop And R

Posted on:2016-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:R F ZhuFull Text:PDF
GTID:2308330473957224Subject:Information security
Abstract/Summary:PDF Full Text Request
As a new media forms of communication which is different from mainstream media, the We-Media is accepted by more and more people. On the background of personalized society, there is a growing emphasis on independent thinking of things.The public is more likely to make judgments which is based on objective facts and their own thinking rather than to be informed of a unified voice. Therefore, the information from We-Media can reflect the present social concerns and direction of public opinions better.In this context, mining hotspot topics from nowadays we-media timely will be used to help people’s production life, such as individual investment behavior, travel behavior of the public, as well as to optimize the government who can guide the public opinions.Facing the overwhelming information from We-Media on the internet, the traditional data mining way can not do the job of collecting and processing vast amounts of information. This phenomenon requires an efficient, real-time processing way to produce positive economic and social value. Hadoop has an efficient performance in a distributed mass data storage and processing. The use of Hadoop can effectively solve the problems caused by the traditional data mining way and to help us to extract valuable information in vast amounts of information from We-Media. R language specializes in statistics, computing, graphics, it is focused on the analysis of the sample data.R language can be competent for the work of text classificationThis subject is a joint project between laboratory and research institute which is controlled by the ChengDu Economy and Information Committee. It focuses on the research and implementation about We-Media information mining system which is based on Hadoop and R language. In this paper, we begin with the data collection,storage, mining, analysis, display, and various other aspects, describing the entire course of the study in detail. The main works of this paper are as follows:1.Grasping the information from We-Media site by Nutch Crawler. Then I process the site contents by the Parser tools.Then I store the text information in accordance as the XML format.2.Processing the word text.Then I do the word segmentation, word characterizing and extraction.Then I use TF-IDF weighting of entries to form the vector space finally.At last I lay a foundation for subsequent classifying and clustering.3.Using Java environment and R language,Getting through their boundary. Then I use the Java language and call R language to calculate the vector space in order to classify information. Then I do the vector clustering by the Mahout framework to get hot information.At last,I will show the result charts.4.Using SpringMVC to build a hot spot display system which is based on J2 EE.The system will provides hotspot classification, hotspots display, hot trend statistics...
Keywords/Search Tags:We-Media, data mining, Hadoop, R language
PDF Full Text Request
Related items