| The network public opinion exists in the fo rm of stock commentary,most of which are massive,unstructured and real-time.About 90 percent of the data is in text form,and the investors refine the fragment information on BBS to optimize their own investment decisions,which ultimately feed into the stock market’s direct forecast.Therefore,the public opinion data acquired by the technology of network crawling,the problem of mass data storage is solved by the technology of large data,the valuable soft informations in network public opinion are ext racted by the technology chinese text mining,the stock price in the future period of time is predicted by the the model of regression prediction,which are the focus of this paper.The text data of Shanghai 180 index stock shares of Oriental Wealth and st ock prices in the corresponding Wind database are selected as the subject investigated.Building Hadoop platform to solve the storage problem cause d by explosive growth of data.On Hadoop,Hive is set up to clean the stock evaluation text data,and R Language is used to realize the analysis of emotional tendencies of text and the building and visualization of model are accomplished by R Language.The result of prediction is conducive to understand the relationship between network public opinion and stock price.The specific work of this paper are as follows:1.The construction of measurement of network public opinion index based on machine learning.The text data preprocessed and quantified,combining the data of stock market to build several classification models which based on Naive Bias,K nearest neighbor algorithm and Support Vector Machine respectively.After evaluating the performance of each classification model,it is found that Support Vector Machine is the most classifier.The construction of meas urement to network public opinion index based on emotional dictionary.By sorting the chinese general sentiment dictionary resources from data hall,the basic emotion dictionaries and auxiliary affective dictionaries are acquired.Then sorting the emotiona l dictionary related to the financial field further to get a sub dictionary.SO-PMI algorithm is adopted to calculate the emotional value which mismatch with seed words,constructing a new emotional dictionary.The two methods are analyzed and compared,an d the results show that the sentiment dictionary method is more suitable for the measurement of public opinion index of the economy class.2.The correlation analysis of synchronization,advance and lag between the Internet public opinion and the stock mar ket return,closing price and volume is implemented by Spearman rank correlation coefficient.It is found that the correlation value of network public opinion approached the peak in the closing price of lag and approached the minimum in the closing price o f advance.The Granger causality between network public opinion and stock price is analyzed,and the impact of network public opinion on stock price and its contribution are analyzed.The results show that the stock price can be predicted by online public opinion.3.The difference of the influence of network public opinion on stock price in the period of stationary,large concussion and slow growth is researched by constructing vector regression model.The result demonstrated that Internet public opinion h as different prediction effects on stock prices at different stages.Besides,the prediction of closing price is very close to its true value.For the entire stock market,short-term forecasting has superior performance relative to long-term forecasts. |