Font Size: a A A

Design And Implementation Of Massive Data Processing Platform For Firewall Equipment

Posted on:2018-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y LiuFull Text:PDF
GTID:2348330542472220Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the present era,the convenience of the rapid development of the Internet has greatly facilitated people's life,but the resulting information security problem has become more and more prominent.All kinds of viruses,threats and malicious attacks are endless,and the losses caused are countless.The firewall is currently one of the widely used security technologies,which establish a security barrier between the internal network and the external network to prevent internal users from passing messages to the external network or malicious attacks from the external network to the internal network.But with the increasing bandwidth,firewall equipment is collecting more and more flow,and the business module generated a lot of data,the pressure on the equipment to processing these data is also growing.As the amount of data increases,the user's query efficiency of data is getting lower and lower,the demand for data display and analysis will be difficult to meet.In order to solve the bottleneck problem of query performance of firewall equipment,this paper has designed and implemented the massive data processing platform.The platform consists of three modules: data preprocessing,data storage and data cache.The efficiency of the query is improved by reducing the data volume of single table,using the better database and index,and introducing the cache mechanism.In this paper,the architecture of the platform is designed,and then the architecture of each module is designed respectively.The whole data processing flow is described and analyzed.The function of the data preprocessing module is to convert the untreated data sent to the kernel to the data of the format required for persistence and to reduce the amount of data in the single table to reduce the data scanned time of the query.At first,The module reads the log data from the shared buffer,and then adds it to the ring buffer.And The ring buffer sends the data to the hash table for aggregation,and finally the aggregated data is writed to the data result buffer waiting further data persistence.This module reduces the overall data volume by means of data aggregation,and adjusts the structure of the table by separating the data table from the business and time horizon to reduce the amount of single table data.The function of the data storage module is to store the persistent data in a reasonable way,and to improve the query efficiency through the bitmap index and the column storage.Firstly,the improved scheme of WAH compression algorithm is proposed to solve the problem that the encoding section of WAH compression algorithm is fixed and long,which results in poor compression of the bitmap index,and it can be based on the actual distribution of data to determine the length of the code segment,thus reducing the compression rate of the bitmap index and saving the equipment space,and it has been implemented in Fastbit.Then the coding method of Fastbit bitmap is discussed and selected according to the characteristics of data and the characteristics of frequently used query statements.Finally,the structure of data partition is designed,and the aggregation process of data with different time granularity is realized on the basis of this structure.The function of the data cache module is to cache the commonly used result set,so that users can query directly from the cache to obtain data,thereby reduce the time of hard disk I/O,and reduce the user's waiting time.At first the cache module combines the existing data type of Redis to realize the storage of the result set.Then,according to the difference between the log data and the trend data,the interaction rules of the data cache module and the data storage module,the rules of creation and updating of the cache are detailedly designed.Finally,combined with the data types and expiration time of result set,the results sets are used to achieve the LFU cache elimination strategy as a whole.Finally,the overall performance and performance of each module are tested and analyzed respectively.Through the the realization of above design,the data of a single table has been significantly reduced by the data processing platform and the firewall can store more than one year of data;the time of the query to database has reduced by more than an order of magnitude,and the time of query to the cache also has reduced by an order of magnitude than it to the database;compared with the WAH compression algorithm,the query performance of improved solution of WAH algorithm is almost the same,but the space occupied by indexes has decreased by more than 60%;the cache module with the LFU algorithm has a high hit rate,and when the cache space is insufficient,setting the expiration time can achieve a higher hit rate,otherwise it's not necessary to set the expiration time.
Keywords/Search Tags:Firewall, Bitmap index compression algorithm, Fastbit, Redis, Mass Data
PDF Full Text Request
Related items