Font Size: a A A

The Design Of Distributed Log Analysis System Based On MongoDB

Posted on:2016-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:H G SunFull Text:PDF
GTID:2308330461954794Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Mongo DB is a non-relational database(No SQL) which is very popular in the current IT industry. It has rich language query function and high performance, at the same time, it is easy to be expanded, these characters make it popular. Mongo DB has been widely used in the need to frequently read operations of distributed systems as data storage warehouses. In this paper, the study of the video website log analysis system is based on the advantages of Mongo DB and the special needs of video site log analysis. With the help of the study, the video content will be more high quality and provide better service for users, at the same time, Mongo DB also can identify the intention of the users and adjust page structure, in this way, the user experience will be more better and meet the needs of users and bring more economic value. In order to solve the deficiency of the traditional single point and single server type to an existing log analysis, this paper makes a scheme which can deal with video website massive log data, the aim of the scheme is to extract effective data from log data,discover the user pattern which is hidden in the log data and optimize the structure of the website and business model.This article is based on the function of video website business to determine the actual demand of log analysis system; For the requirements, design the function structure of this massive log analysis system, the system divides into the log collection subsystem module, log analysis subsystem and business analysis subsystem.Log collection subsystem, complete the collection of raw log data and preprocessing operations, then deposite into the mongo DB. Log analysis subsystem, extract out the log data from the Mongo DB for cleaning, user identification, session identification and complementary path and so on, generated the intermediate data, providing data for the business analysis subsystem. Business analysis subsystem, the second grouping on the middle of the production logging data and aggregation, according to the needs of the business to classify, to ex-tract the information then statistic users Reach data, final polymerization by day, week, month of data for the unit, eventually the result storage into the file or deposited into the database.In the realization of the system, log collection and log analysis subsystem take full advantage of the distributed system architecture in dealing with a huge amounts of data, the vast amounts of the original log and preprocessing results stored in theMongo DB. Through the Velocity template engine design templates, cleaning the documents which is storged into the Mongo DB, filtering, change into a DBObject type of object, provide data for Map Reduce operation. Based on Map Reduce framework, take advantage of its efficient parallel processing mechanism, build different Map and Reduce functions to calculate the assembly to implement data statistics processing. Finally, from the system stability and data read-write efficiency and system throughput process the relevant test. From the commercial effect, the system will statistic user’s behavior information base on different business needs and provide an effective basis for decision-making development.
Keywords/Search Tags:log analysis, distributed, MongoDB, MapReduce
PDF Full Text Request
Related items