Font Size: a A A

A Design And Implementation Of Distributed Real-time Log Data Storage And Processing System Based On Storm And Mongodb

Posted on:2016-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZengFull Text:PDF
GTID:2308330470967666Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Large-scale real-time computing is an important part of big data computing, more and more computer or Internet systems have adopted the Large-scale real-time log data computing technology, it has been applied for real-time statistics, real-time risk control, real-time recommendation, real-time monitoring, personalized services, real-time search and other real-time applications. Based on the requirements of storing and processing large-scale real-time log data, this paper designed and implemented a large-scale real-time log data storing and processing system based on MongoDB and Storm. This paper has described system’s requirements, architecture and implementation in detail. This architecture solved the problem of collecting distributed multi-source heterogeneous log data, storage resources waste and load balancing of real-time computation. This architecture can also reduce energy consumption and facilitate real-time data analysis process. This architecture can collect, publish, process in real-time large-scale distributed real-time log data and provide mass storage. This architecture provide a complete, stable, scalable, fault-tolerant, distributed, high-performanced and energy-efficient large-scale real-time log data storing and processing system.This paper has designed a distributed log collection module based on Flume to collect multi-source heterogeneous log data. The module is easy to scale, distributed, fault-tolerant and has high-performance. Because of the suddenness of the real-time log data, this paper designed a task scheduling module based on an energy-efficient load balancing algorithm to satisfy the performance premise and achieve load balancing and energy saving. Because data within a short period of time is frequently accessed and old log data has little access, this paper has designed a access--heat based algorithm to store log data based on access--heat in different cluster zones, this algorithm also employs different replica policies for log data with different access--head.
Keywords/Search Tags:log data, real-time computing, storm
PDF Full Text Request
Related items