Font Size: a A A

Application Research Of Data Analysis Technology Based On NoSQL

Posted on:2017-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2348330515485794Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,there are volumes of data generated every day and many valuable information are contained in these data for mining and analysis.For example,there are many useful information in datawarehouse systems.It can help enterprise's decision makers to make good decisions through analysis of these information.How to acquire these useful information quickly to improve enterprise's work efficiency under the circumstances of saving computer resources has attracted many researchers.Traditional RDBMS is obviously unsuitable for query and analysis of big data.The main reason is its fixed relational model and low efficiency of table join operation,which restricts the query performance.However,many novel NoSQL databases such as Hyperdex are just suitable for solving these problems because of their more flexible store schema and query mode.The performance of data warehouses based on hyperdex is much higher than the traditional RDBMS.But in order to further optimize the query efficiency,hyperdex provides the region index technique.However,the region index hyperdex provides is not much efficient when the amount of objects in the region is enormous.Therefore,this thesis does some research about technology and application of data analysis based on NoSQL.Cube multiple objects replicas aggregate region index(CMORARI)and single object replica aggregate region index(CSORARI)are proposed based on hyperdex original region index.This algorithm transforms the original star schema cube stored in SQL into key value dictionary cube in HyperDex.The two methods are validated using TPC-H SSB,which is an actual retailer application case,and shows that HyperDex has strong application values in big data warehouse field.The main contents are as follows:1)Building of CMORARI:Firstly,reading all dimension tables' records to get the dimension domain of definition.Then iterating over the fact records in the fact table and building the aggregate region index of each dimension attribute of each fact record and associate the fact record with each dimension attribute if the fact record is in the dimension domain of definition,otherwise,ignoring the fact record and reading next record and repeating above procedure until all the fact records are processed.Finally,putting these aggregate region indexes of dimension attributes and their associated fact records into hyperdex database.Because one fact record associates with each dimension attribute,this algorithm is called multiple objects replicas.The region indexes the algorithm generates are the results of merging multiple original region indexes of hyperdex into one.Thus,not only the storage space of region indexes is decreased significantly,but also accelerating the index parsing process and futher improving the query performance.2)Building of CSORARI:In order to decrease the storage space of fact record replicas the above region index method occupies,the method of storing fact records alone is proposed.Compared to the CMORARI,the difference is that CSORARI does not associate the fact record with each dimension attribute.Each fact record is only kept one replica and stored in a single hyperspace of hyperdex alone,so it is called single object replica.3)Query implementation and performance comparison analysis:Based on the CMORARIed and CSORARIed cube above,the design and implementation of query are performed.Firstly,program's command line arguments are specified as the query condition.The program will then analyze these arguments and invoke the query API and return the query result.By the experiment on the query,the performance comparison of query is analyzed and evaluated and factors affecting the query performance and futher improvements are also discussed.Besides,the extra space needed by indexes is also analyzed.The experiment result shows that in order to improve the query performance,the proposed improved region index based on NoSQL is an effective method.
Keywords/Search Tags:NoSQL, Hyperdex, region index, distributed data warehouse, big data analysis
PDF Full Text Request
Related items