Font Size: a A A

Design And Implementation Of The Large-scale Web Log Analysis System Based On NoSQL

Posted on:2014-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2308330464464359Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology, more and more Web applications are used by the user. Internet companies are also facing more challenges, every day tens of thousands of pages will be used by the users, making the number of Web logs rapid growth. Internet companies how to better improve the quality of service, to understand the user’s needs and preferences, increasing the viscosity of the user, become an urgent problem for the Internet industry. How to take advantage of large-scale Web log data collected from these massive data centers to find valuable information on the enterprise, is the most big problem that enterprises are facing.Use Web log data, to improve the design of the site to attract more traffic, thereby improving the user experience, and bring benefits to the enterprise, to promote the generation of Web log analysis. Web log analysis is through the collection of user access Web pages generated by the log of these logs format conversion, filtering, cleaning, digging a series of processes. With the increasing amount of Web applications to access the corresponding number of Web logs are constantly expanding, traditional data storage methods cannot meet the current demand. For a Web log analysis, a single node processing capacity is limited, cannot reach the Internet business analysis journal of the demands, for large-scale Web log analysis system came into being. The log using distributed storage as well as large-scale distributed computing has become a Web log analysis inevitable trend of development.This paper studies the NoSQL database MongoDB and Hadoop distributed computing architecture, use MongoDB and Hadoop to design and implementation a high-performance large-scale Web log analysis system. Hadoop consists of two core technologies:Distributed File System HDFS and distributed programming model MapReduce. The system uses MongoDB to store the log file, the log through the slicing technique is divided into a number of the same size of the log data set, they are stored in different nodes of a distributed system; using the Hadoop MapReduce programming model provided by the node to deal mass Web log. The process includes analyzing log collection, log processing, log storage and log analysis, system will eventually show the way to the Web interface to the user. NoSQL-based large-scale Web log analysis system for the collection, processing, storage and analysis more faster than previous systems, can be highly effective against a variety of formats logs for processing, not only reduces the developer’s workload while improving use staff efficiency.
Keywords/Search Tags:Log analysis, Large-scale, Hadoop, MongoDB, NoSQL
PDF Full Text Request
Related items