Font Size: a A A

Forum Log Analysis Based On The Big Data Processing Technology Hadoop

Posted on:2015-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:S M XieFull Text:PDF
GTID:2298330434951216Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the coming of21st century and the rapid development of Internettechnology, the generation of information becomes faster and faster, the data volume alsoroses sharply, with the words "information explosion","information explosion" describedalso in for, this makes we enter the era of big data. Facing T B even PB dataquantity, the enterprise is no longer limited amount of big data acquisition, but focus onhow data mining, how in vast amounts of data to extract the valuable information to theenterprise, but enterprise current data processing technology already cannot satisfy thelarge amounts of data processing.Open source organizations under the Apache foundation,the birth of big dataprocessing platform Hadoop, breakthrough the bottleneck of traditional data processingmethods, making huge amounts of data collection, storage, calculation easier and moreefficient. Hadoop system is a distributed data storage and processing platform, can beimplemented on a cheap computer cluster, provides a huge amounts of data distributedstorage and computing, the architecture of the HDFS file system and the computingframework MapReduce, users can take full advantage of the large capacity of cluster spaceto store huge amounts of data and the cluster scores high speed calculation ability todevelop distributed Application, implement millisecond high-speed massive dataprocessing. Due to the platform adopts the object-oriented programming language writtenin Java, so it has good portability and scalability. Development up to now, has expandedout some good framework. Enterprises with more frameworks such as Flume, ZooKeeper,HBase, Pig, Hive, Sqoop, implement some business logic encapsulation, simplifies the useof Hadoop.This article is based on Linux system Hadoop platform developing enterprise web loganalysis solution. mainly divides into five modules, respectively,is the file upload module,data cleaning module, data statistics and analysis module, data export module, data displaymodule. File upload using the Flume framework, data cleaning core algorithm using graphs,statistical analysis of data using Hive framework, BBS can be calculated from the majorkey indicators, such as traffic PV, the number of registered users, IP, the bounce rate,operators in order to offer decision-making, data export using SQOOP framework that willbe exported to the various indexes of the cluster of relational database in MySql,data showusing ZooKeeper and HBase framework, can realize millisecond huge amounts of dataquery. The system finally encapsulated into a script file on Linux system, add the Linuxscheduler, can realize automatic operation of the project. Finally, the project by test can beformally launched after operation.
Keywords/Search Tags:big data, Hadoop. Framework, Log analysis
PDF Full Text Request
Related items