Font Size: a A A

Hadoop Based Distributed Storage And Algorithm Analysis For Mass Futures Data

Posted on:2013-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:2218330362460682Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Futures, increasing fast, is treated as an important means to do investment. At the same time, the amount of data related to futures is increasing day by day. So, it becomes more and more import to take good use of these data. We can find useful information from the raw data by some data-mining tools; we can do it by traditional ways. But now we are faced with some new situations which prevent us doing this. Firstly, the amount of data is so large, TB, even PB. It is much expensive to save these data. Secondly, it takes much time to compute, which is too long to wait.In this paper, we propose a way to store and compute for mass data, which can be implemented on common computer cluster .we base our project on a open source framework, which is called Hadoop developed by Doug Cutting. This framework mainly contains two technologies, Mapreduce and HDFS (Hadoop distributed file system). by these , we make our application easier to expand and higher performance in fault-tolerant. Our program is composed of two parts: overall design and Concrete realization. Firstly, we propose a framework which is fit for mass data storage and mining. In this framework, we refer to a famous software architecture called hierarchical model. On the other hand, we make a simple realization about the levels in this architecture including web page, web service, data mining plug-in, Hase. We especially describe the process of data mining plug-in.In this program, we use web service and ajax to send requests on pages. These technologies can save network expense and eliminate the difference between systems. In the background, we use Spring IOC to setup service bean, in this way, we can reduce the invasive of codes and easily manage the dependencies among services. In the process of data mining plug-in, we realize the Parallel FP-Growth algorithm and we use maven to manage our codes to make it easier to manage and reuse. At last , we choose Hbase to store our mass data, which is supposed to be good at storing mass data.
Keywords/Search Tags:Hadoop, Futures, Mass Data, Distributed Storage, Distributed Computing
PDF Full Text Request
Related items