Clustering For Stock Data Analysis On Hadoop

Posted on:2019-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Chen

Full Text:PDF

GTID:2428330572954082

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,people have come to realize the importance of data.Data is not only a resource,but also the treasure.Among application fields of big data,financial data analysis is regarded as a promising field.Stock analysis is always a very popular topic in financial analysis,involving multiple knowledge domains.Previously,people adopted basic analysis method to predict stock movements,such as analyzing macroeconomic and microeconomic policies,development of the industries concerned,investor attitudes,indicators of enterprise's development and so on.With the development of big data techniques,it is a hot research subject to predict the trend of stock by discovering the law among mass historical stock data.This paper studies stock big data by cluster analysis,and the main works are as listed below:Data collection.We have collected 800 GB stock data by web crawler and TuShare(open source python package),including the basic information of company and historical stock data(data recorded daily and recorded at each moment).Construction of platform.We have set up a Hadoop cluster with 6 machines in our laboratory.One of them is a Master node(NameNode in HDFS,JobTracker in MapReduce),and the others are Slave nodes(DataNode in HDFS,TaskTracker in MapReduce).Cluster analysis.We have performed two clustering algorithms based on MapReduce:K-means and NMF(Non-negative Matrix Factorization),and results are analyzed after clustering.It indicates that the trends of stock in the same cluster share similarities.

Keywords/Search Tags:

Big data, Web crawler, Hadoop, Cluster analysis, NMF

PDF Full Text Request

Related items

1	Research On Big Data Text Analysis Based On Hadoop Architecture
2	Design And Implementation Of A Distributed Web Crawler System Based On Hadoop
3	Research On Optimization Of Hadoop Distributed Web Crawler System
4	Research And Implementation Of Data Mining And Analysis On Online Entertainment Platforms Based On Hadoop
5	Investigation On Web Crawler Technology Based On Hadoop Platform
6	Design And Implementation Of Online Education Big Data Analysis Platform
7	Large-scale Bilingual Parallel Corpus Collection System Based On Hadoop
8	Research And Implementation Of Distributed Web Crawler Based On Hadoop
9	The Sports Monitoring And Managing System Based On Hadoop Cluster
10	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop Cluster