Hadoop Based Distributed Storage And Algorithm Analysis For Mass Futures Data

Posted on:2013-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Li

Full Text:PDF

GTID:2218330362460682

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Futures, increasing fast, is treated as an important means to do investment. At the same time, the amount of data related to futures is increasing day by day. So, it becomes more and more import to take good use of these data. We can find useful information from the raw data by some data-mining tools; we can do it by traditional ways. But now we are faced with some new situations which prevent us doing this. Firstly, the amount of data is so large, TB, even PB. It is much expensive to save these data. Secondly, it takes much time to compute, which is too long to wait.In this paper, we propose a way to store and compute for mass data, which can be implemented on common computer cluster .we base our project on a open source framework, which is called Hadoop developed by Doug Cutting. This framework mainly contains two technologies, Mapreduce and HDFS (Hadoop distributed file system). by these , we make our application easier to expand and higher performance in fault-tolerant. Our program is composed of two parts: overall design and Concrete realization. Firstly, we propose a framework which is fit for mass data storage and mining. In this framework, we refer to a famous software architecture called hierarchical model. On the other hand, we make a simple realization about the levels in this architecture including web page, web service, data mining plug-in, Hase. We especially describe the process of data mining plug-in.In this program, we use web service and ajax to send requests on pages. These technologies can save network expense and eliminate the difference between systems. In the background, we use Spring IOC to setup service bean, in this way, we can reduce the invasive of codes and easily manage the dependencies among services. In the process of data mining plug-in, we realize the Parallel FP-Growth algorithm and we use maven to manage our codes to make it easier to manage and reuse. At last , we choose Hbase to store our mass data, which is supposed to be good at storing mass data.

Keywords/Search Tags:

Hadoop, Futures, Mass Data, Distributed Storage, Distributed Computing

PDF Full Text Request

Related items

1	Design And Implementation Of Distributed Data Storage Based On Hadoop
2	Research Of Mass Data Storage Technology Based On Hadoop Platform
3	Based On The Hadoop Mass File Storage System Analysis And Design
4	Research And Implementation Of Integration Of R Language And Hadoop
5	Design And Analysis Of The Mass Image Storage Model Based On Hadoop
6	Designed Application Of Data-Analysis Based On Hadoop Platform
7	On The Hadoop Based Distributed Storage Techniques And Its Applications In Content Dissemination Design
8	Research Of Distributed Storage Of Massive RDF Data
9	The Design And Implementation Of A Log Analysis System Based On Distributed Computing Platform
10	The Methods And Optimizations For Mass Data P2P Distributed Steady Storage