Multidimensional Data Model For Mining And Analysis Based On Multiple Structure Data Cube

Posted on:2015-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:X S Zhang

Full Text:PDF

GTID:2268330428981357

Subject:Software engineering

Abstract/Summary:

With the rapid development of Internet technology, modern society has entered an era of the network society. On the internet, huge amounts of data has been producted every day. Most of them are generated and stored in the form of semi-structured or unstructured. How to Mining useful information from mass text messages on the web fast, and analysis and processing them accurately, has become an big problem which the major organizations and individuals need to solve. Build data cube with text data, analysis and mining it with OLAP (online analytical processing) operation, is one of the effective way to solve this problem. But Traditional data cube is constructed by structured data, it cannot directly use unstructured data to construct data cube. It make the finding of how to construct data cube by unstructured data has important implications. In this paper we will propose an method of how to construct text cube, which is base of the tasks of mining hot topics from massive short texts. Our study has two aspects:Firstly we will use the method of distributed clustering to generate topics of text data set, in order to extract dimensions from those unstructured data, this method can improve the accuracy of current analysis on text cube, and reduce labor cost on building dimensions of text OLAP.Secondly, we present a method of generate the unit which used to measure fact table which based on TFIDF, this method allows directly use aggregation function of OLAP on the data cube to measure the text set topic features, and it need not define new methodsBecause the dataset of this application is huge, and its hard to Process on single computer, we design and implement a series of preprocess methods which based on MapReduce, it allows to preprocess data parallelly on distributed cluster. Test result show this construction method of text cubeâ€™s construction has strong practicability and expansibility, it can efficiently build cube model with large scale text data set The cube allows mining information of popular topics with OLAP operations from text dataset effectively, its great improving the efficiency of analysis and mining of text dataset.

Keywords/Search Tags:

OLAP, data cube, text data, distributed parallel computing, MapReduce

Related items

1	Research On Parallel And Distributed Processing Technology Of Data Cube In OLAP System
2	Research And Implementation Of Building Data Cube Based On Mapreduce
3	Research On Data Cube Technology Based On MapReduce
4	Research And Implementation Of Distributed Cube Distributed Storage And Construction Algorithm
5	Research And Implementation Of Construction Algorithms For Closed Histogram Cube
6	The Design And Implementation Of Parallel Computing Platform Based On MapReduce
7	Research Of OLAP And Data Mining Technology Based On Water Supply Data Cube Of Quantity And Charge
8	Research And Design Of OLAP Exhibition Tool Based On Component Technology
9	On-Line Analytical Processing (OLAP) & OLAP Application In Commercial Automation
10	Research On Distributed Fast Clustering Algorithm Based On Mapreduce