Font Size: a A A

Research And Application Of The Massive Web Data Analysis Based On Hadoop

Posted on:2015-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:F H ChenFull Text:PDF
GTID:2298330422486475Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapaid development of Internet technology and society, there are more and morenetwork data, Web has become the world’s largest data warehouse, whether enterprise orpersonal are facing the problem of massive Web data on how to effectively manage.Traditional data processing methods have many disadvantages, for example, high cost, lowreliability, difficulties of parallel processing program etc. Based on open source framework,Hadoop parallel processing can be effective, reliable, intelligent management of massive Webdatas.In order to improve the traditional single node which exists the efficiency of time andspace for the massive Web data analysis and mining, by analyzing the Hadoop cloudcomputing platform technology research status and development trends at home and abroad,based on open source framework Hadoop Distributed File System (HDFS) and Map/Reduceprogramming model, this paper research the massive Web logs performance indicators and theMap/Reduce process of Web-mining algorithms, design the massive Web data analysis systemarchitecture, build the Hadoop implements a distributed and implements a distributedmass for the development of Web data analysis system. The system integrates data andapplications, which collect Eclipse via the application programming interface (API) ofHadoop, using Maven to manage and building the Hadoop project, in order to realize the tasksharing.This paper bulid the four nodes of Hadoop cluster through the virtual machine, whichanalyses the processing of shell script in the system and traditional system, analyses thecollection of Web log data and its key performance indicators (KPI) for the Hadoop platform,completes parallel programs based on collaborative filtering algorithm, the test results show that the system is effective goods to improve the efficiency of time and space in massive Webdata analysis and mining.
Keywords/Search Tags:Cloud Platform, Massive Data, Hadoop, Data analysis, Web
PDF Full Text Request
Related items