| With the development of the internet, web has become the largest information system, which accelerates the increase of web logs. While how to find valuable information from a large of web logs is a problem confronting most enterprises. Storage or calculation with single mainframe can not meet the current demand. Thus it is inevitable to adopt distributed storage and calculation in future development.Hadoop is a widely used distributed storage and calculation frame applicable to massive distributed calculation. It is attracting more and more attention, and is widely used in advertisement calculation, log analysis, web search and data mining. The core technologies of Hadoop include HDFS (Hadoop Distributed File System) and Map/Reduce (distributed calculation frame). In HDFS, the file is segmented into several file blocks with the same size, and these file blocks are stored in different nodes of the cluster, in this way, a large amount of logs may be stored. Map/Reduce is a distributed programming model for the data processing of large-scale clusters. With this programming model, codes of abundant log files may be programmed conveniently. This paper realizes a data processing system for log storage and calculation based on Hadoop.Through analyses on problems appeared in log processing of enterprises, an open reporting system is realized with the two core technologies of Hadoop. This system mainly includes collection and storage of logs, background statistics program, and front-end user interface. This system shows many improvements than processing programs before. In this system, the configuration data table may be customized, which increased the customization of the report. Engineers shall only maintain this system, without directly face with various demands issued by the customer, therefore, the amount of work is reduced while the working efficiency is increased. |