Font Size: a A A

Hbase Based Credible Dataware Construction Of Business Quarterly And OLAP Query Analysis

Posted on:2018-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:S L YangFull Text:PDF
GTID:2348330536468529Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increasing of the amount of data,the bottleneck of saving is becoming more and more serious,besides the effective information mining from historical data becomes more and more difficult.Based on the theory of data warehouse,we propose a model of constructing cloud data warehouse using NoSQL database,then we apply it into storage of enterprise quarterly results,and handling queries of OLAP from users,in a word,our model can eliminate the saving bottleneck of enterprise quarterly results in a certain degree and at the same time provide OLAP service and decision support.we take some effective strategies to avoid privacy disclosure issues in cloud data warehouse,we take a big data project from Hebei Provincial Institute of Scientific and Technical Information as the background,and take the calendar year enterprise quarterly results as research object,take the HBase which is an open source cloud database as storage platform,explore the strategy of dimensions in data partition,build cloud data warehouse platform and verify the OLAP queries.The main research work are as follows:(1)The dimension partition strategy is devisedDevising a Pretreatment-Vector-KMeans attribute dimension partition strategy,firstly picking up multiple information of each attribute and make attribute document,then based on text mining theory find feature of each attribute,vectoring and clustering these features over MapReduce treatment method for enhancing the efficiency of execution,finally we remain every attribute cluster as a dimension and feature the highest cluster weight value as its dimension mark,thus complete the dimension partition efficiently.(2)The data partition based privacy preserving mechanism is foundedCompared with the traditional data encryption,data segmentation has low cost,so that it can efficiently meet users' query,we based on the key decomposition theory then propose data partition privacy protection mechanism,in our mechanism,through constituting suitable segmentation parameters n,m,we can calculate the parent block measure and determine the sub-block measure,so that divide data into n blocks,only m blocks can restore the complete data,in this way data reliability has been improved,at the same time we integrate XOR encryption algorithm into our mechanism for hiding the true value of data block,thus it can avoid data privacy disclosure risk.(3)The cloud data warehouse constructing model is builtWe build a new ETL model for constructing data warehouse,in data extracting stage we divide the attributes into a set of dimension attributes and value attributes using our dimension partition strategy,we put forward uniform distribution code,standard dimension code,attribute dimension code,rowkey compound code et.al,data will be converted into <code,value> form in data transforming stage,finally data will be loading into HBase equably as a NoSQL cloud data warehouse.At last,the dissertation is summed up,and the direction for further research is pointed out.
Keywords/Search Tags:cloud data warehouse, dimension partition, data partition, ETL, HBase
PDF Full Text Request
Related items