Font Size: a A A

A Comparative Study On Statistical Analysis Techniques For Big Data

Posted on:2015-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2308330461460722Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Improvement in cloud computing, internet of things and social networking is stimulating and promoting the data types and scale of human society at an unprecedented pace, Big Data era has already arrived. It is not only another disruptive technology revolution of the IT industry, but also the tremendous impetus for economy and society. "Big Data" refers to those data of which the scale, emerging speed and handling difficulties have already beyond the current conventional technological capabilities of storage, management and analysis. Volume, velocity, variety and value are its main feature. The development of "Big Data" passes through three stages:passive stage, active stage and automatic stage. The analysis of "Big Data" must rely on data analysis tools such as data mining, machine learning and statistical analysis.The demand for large data analysis of financial industries, such as securities industry, is always very high, because huge commercial value is hidden behind the massive data stored. For example, stock holding degree in stock market has been considered an important factor in stock price volatility, and it is generally considered that the stock price will rise when ownership concentrates and will fall when ownership disperses. But the related research in stock holding degree both at home and abroad has been very rare until now.An internal data set of daily stock holding degree within some securities company is adopted in this article mainly to compare the efficiencies, advantages, disadvantages and application scopes of various data analysis methods. In addition, the relationship between stock holding degree and its price is also explored and some conclusions of previous studies are also confirmed.Four big data analysis techniques are used in this paper:First, the classical linear regression analysis method based on the least squares about which the theory has been very mature;Second, regression analysis technique based on support vector machine using linear kernel function which mathematical model is linear as same as the linear regression method;Third, regression analysis technique based on support vector machine using radial basis function which mathematical model is nonlinear and the fitting result has a higher degree of interpretation;Fourth, the maximum information coefficient which is a new analysis statistic based on the principle of maximum entropy and specially aimed for big data analysis, and it takes into account generality and equitability required for big data analysis techniques.Innovations of this paper are:First, the analysis of the ownership concentration is rare;Second, the data is pre-processed to avoid the disadvantages of the maximum information coefficient, including filtering out irrelevant relations and compressing data;Third, the new idea of combination of maximum information coefficient and support vector machine is proposed.The conclusions of this paper are:First, the indicator of stock holding degree does have a very important impact on stock prices;Second, the relation between stock holding degree and stock prices does not always exhibit a positive correlation, and sometimes it is negatively correlated, which is mainly affected by the shareholding structure;Third, support vector machine is not suitable for simple linear regression analysis;Fourth, the interpretation degree of the analysis results based on the support vector regression machine using RBF kernel is the highest, but still need parameter optimization;Fifth, maximum information coefficient method takes into account all types of relationships and has strong robustness, but it is dimension-limited and unable to filter out irrelevant relations;Finally, the parameter optimization of support vector machine and the algorithm improvement of the maximum information coefficient will be the further research directions.
Keywords/Search Tags:Big Data, Big Data Analysis, Stock Holding Degree, Linear Regression, Support Vector Machine, Maximum Information Coefficient
PDF Full Text Request
Related items