Font Size: a A A

Research On Key Technologies In Analysis Processing Of Network Security Monitoring Data Stream

Posted on:2012-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L GanFull Text:PDF
GTID:1228330341451724Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The continuous development of informationization makes the Internet increasinglysignificant in our lives; while the emergence of various security events during theprocess highly threatens the application&development of the network. As a result of it,network security monitoring is essential to network maintenance, key infrastructureprotection and information system security. And one of the most challenging issuesoccurring in network monitoring and data analysis processing is how to process andquery the real-time massive monitoring data in an efficient manner, thus to providesupport for various follow-up applications.Based on the background of network security monitoring (NSM) and faced withthe above challenges, this dissertation focuses on the cost-efficient solutions to severalbasic problems derived from the difficulty in analyzing the dynamic real-time datastream, such as continuous top-k queries, join, multi-query optimization, big timewindow queries. The main contributions are concluded as follows:1. Continuous top-k queries on NSM data streams. An index of data stream canimprove the performance of queries efficiently. However, a grid index is usually usedfor continuous top-k, in which a lot of free data points (proved to be not top-k results)are included. Directing to prune those points, we propose an index structure, k-maxcalculating region (k-MCR) based on the reverse dominant point set (RDPS) and gridindex. We get k-MCR through calculating RDPS within grid index and pruning the unitsapproximately and quickly in grid, putting forward the updating algorithm of adding inand deleting data points in data stream. Analytical and experimental evidences displaythat k-MCR index approach performs better on both storage of index and efficiency ofqueries.2. Join algorithm for huge dimension tables on NSM data streams. We proposeMDI-NL join algorithm to optimize join between huge dimension tables (such asIPaddress tables, containing 232 tuples) and NSM data streams, which reduces theconsumption of CPU power and memory capacity. Generally speaking, a hugedimension table should be partitioned into small sub-tables by row and each sub-table isloaded into memory in turn, resulting in frequent disk I/O. However, Compressing hugedimension tables into small in-memory tables will improve the efficiency. So, we findthat the join key of a dimension table is so large and redundant for join, which can beused to compress dimension tables losslessly. We divide each dimension table into nsub-dimension tables by column, then compress those sub-dimension tables’s join keyand build an index for each sub-dimension table. In join operation, MDI-NL selectsindices of sub-dimensional tables dynamically according to the projection of query,which makes sure that each choosed sub-dimensional tables has the least join key rebundancy. Theoretical analysis and experimental evidences show that MDI-NL ismore adaptable for join with the huge dimension table in which there are largeredundant join keys. Since dimensional tables are compressed to relatively small andlossless sub-dimensional tables, the cost of scanning in each nest-loop and storage spaceis decreased, and therefore performance of table join is improved.3. Multi-query optimization on NSM data streams. Two types of methods can beadopted for multi-query optimization in data stream: optimization of query plan, andmaterialized intermediate results. It is common used of materialized views or indices inthe existing approaches of materialized intermediate results, which has no compressionon intermediate results. To solve such a problem, we propose compressed stream cubeas the storage structure of materialized intermediate results based on the compressedcube (QC-tree and Dwarf). For further reducing the space, part of the nodes arematerialized in the compressed model, different from the traditional method that all thenodes are materialized in a compressed model. In our model, the compressed nodes aredivided into basic nodes and additive nodes; in which the former ones are those thatmeet the validity of queries, and the latter ones are used to increase the speed inresponse of a specific query. This approach adopts an effective pruning technique tominimize the number of elements in the limited memory space. By this method, lots ofunused nodes in a query are pruned; while some useful materialized nodes are alsoinevitably pruned at the same time, leading to the comparatively decrease of efficiency.Theoretical analysis and experimental evidences (StreamQCtree as an example) indicatethat our approach can discard a great number of nodes under the circumstance ofprocessing fewer queries in a certain amount of time, which can effectively reduce thestorage of stream cube via a little efficiency loss in query.4. Big time window queries on NSM data streams. In the context of limited CPUpower and storage, DSMS (Data Stream Management System) just can deal withqueries within current time window W, while big time window queries which arebeyond the time window W can not be handled. Aiming at this problem, we propose anincremental hybrid storage architecture that combines real-time DSMS and mass storageDBMS (DataBase Management System). To improve the efficiency of DSMS, timedimension is "divide-and-conquer" into small non-overlapping time window blocks, andwe only handle data within the same small block and combine those small blocksqueries into a big time window query. Furthermore, materialized views are used to storethe query results beyond window W in DBMS, and updated by incrementally loadingthe results in expired small blocks from DSMS, free from the inefficient updating,which improves the efficiency in updating and maintenance of materialized views. Itshould be mentioned that this system only can do with aggregation operation in OLAP.This dissertation addresses the problem needed to be solved urgently incost-efficient analysis processing and queries for NSM data streams. A lot of work has been done on the continuous top-k, join, multi-query and big time window query,bringing a breakthrough to the field. This is a promotion of large-scale networkmonitoring data processing on both theoretical study and practical applications.
Keywords/Search Tags:network security monitoring, data stream management system, continuous top-k monitoring, table join, multi-query optimization, big time windowqueries, stream cube, maintaining materialized views
PDF Full Text Request
Related items