Font Size: a A A

Multidimensional Data Model For Mining And Analysis Based On Multiple Structure Data Cube

Posted on:2015-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:X S ZhangFull Text:PDF
GTID:2268330428981357Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, modern society has entered an era of the network society. On the internet, huge amounts of data has been producted every day. Most of them are generated and stored in the form of semi-structured or unstructured. How to Mining useful information from mass text messages on the web fast, and analysis and processing them accurately, has become an big problem which the major organizations and individuals need to solve. Build data cube with text data, analysis and mining it with OLAP (online analytical processing) operation, is one of the effective way to solve this problem. But Traditional data cube is constructed by structured data, it cannot directly use unstructured data to construct data cube. It make the finding of how to construct data cube by unstructured data has important implications. In this paper we will propose an method of how to construct text cube, which is base of the tasks of mining hot topics from massive short texts. Our study has two aspects:Firstly we will use the method of distributed clustering to generate topics of text data set, in order to extract dimensions from those unstructured data, this method can improve the accuracy of current analysis on text cube, and reduce labor cost on building dimensions of text OLAP.Secondly, we present a method of generate the unit which used to measure fact table which based on TFIDF, this method allows directly use aggregation function of OLAP on the data cube to measure the text set topic features, and it need not define new methodsBecause the dataset of this application is huge, and its hard to Process on single computer, we design and implement a series of preprocess methods which based on MapReduce, it allows to preprocess data parallelly on distributed cluster. Test result show this construction method of text cube’s construction has strong practicability and expansibility, it can efficiently build cube model with large scale text data set The cube allows mining information of popular topics with OLAP operations from text dataset effectively, its great improving the efficiency of analysis and mining of text dataset.
Keywords/Search Tags:OLAP, data cube, text data, distributed parallel computing, MapReduce
PDF Full Text Request
Related items