Font Size: a A A

Research On Indexing And Query Method Of Log Big Data

Posted on:2018-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y M DingFull Text:PDF
GTID:2428330569985433Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Log is an important data that reflects the running status and user behavior of the system.It is an urgent problem to solve the problem of how to provide reliable storage and efficient and fast query to support log analysis.HBase has a close relationship with Hadoop soft stack,and is powerful in the storage and processing of unstructured and semi-structured data,so it is very suitable to use HBase in the scene of log application.HBase only establishes the index on the primary key,while the log query often involves non-primary key columns in,so it is necessary to optimize the non-primary key query performance of HBase.The basic idea of the log data secondary index is to reduce the mapping of the nonprimary key to the original data address to the primary key of the index record so that the query can be quickly positioned to the original data based on the non-primary key value to avoid slow Full table scan.The static construction of the index is done through the MapReduce job.When the new log data is added to the table or the region is split,the indexing process is implemented by the HBase coprocessor to ensure data consistency.In order to more easily use the secondary index,the log query of the various models of abstract and modeling,designed a set of easy to use query API.The client initiates a log query request from the RegionServer,which delegates a query parser to parse the query criteria and perform a specific lookup process that performs acceleration in parallel on all the RegionServer.Aiming at the distribution characteristics of 80/20 in the log query process,the hotspot index is cached in memory,and a cumulative heat cache replacement strategy is proposed to further optimize the log data query process.The performance of the cumulative heat cache replacement strategy and the performance improvement of the secondary index system for log query are verified by comparing the experiment with the cache replacement strategy and the query performance comparison with the four servers as the test platform.When the index is built,the spatial overhead analysis shows that the cost of the secondary index is low.
Keywords/Search Tags:big data, HBase, secondary index, cache replacement policy, query processing
PDF Full Text Request
Related items