| In recent years,with the rapid development of 5G technology and the wide application of Internet social media,more and more users express their views and opinions through the Internet.Mining effective information from massive network text data has become one of the important means to obtain information.In the financial field,many investors choose to communicate and discuss stock prices and make predictions on online financial platforms.Other investors will also take these comments as references when operating transactions.Therefore,the overall sentiment of investors affects the changes of the stock market to some extent.This thesis takes this phenomenon as the entry point and the Shanghai Stock Exchange 50 Index from 2011 to 2022 as the research object.It collects comments and some financial news texts from the stock bar of the Oriental Fortune website through crawler technology.It uses MacBERT pre-training model to classify the text data and dig the investor sentiment information contained in it.Based on this,text sentiment index is constructed as the characteristics of direct investor sentiment.In order to test the effectiveness of the index,seven other sentiment indicators are selected and the indirect investor sentiment characteristics are constructed after dimensionality reduction through principal component analysis.The two investor sentiment characteristics together constitute the investor sentiment indicator system in this thesis.In terms of price prediction,since the stock market is affected by various factors,it is difficult to predict.Therefore,more and more scholars apply machine learning technology to stock price prediction,and the development of deep learning further promotes this trend.Using the advantages of various deep learning models,it has become the development direction of deep learning application in the financial field to build a mixed prediction model.Combined with the advantages of CNN and LSTM neural network models in data feature extraction and time series data processing respectively,this thesis built a CNN-LSTM hybrid model to predict the closing price of Shanghai Stock Exchange 50 Index,studied the prediction effect of different time windows,and added the direct and indirect investor sentiment characteristics respectively.To test the influence of investor sentiment characteristics on the prediction effect of the model,and compare it with other prediction models.The results of the empirical study show that the MacBERT model applied to the text sentiment classification in the financial field has achieved a good prediction effect with an accuracy of more than 80%,which is higher than other machine learning models.In terms of the prediction of stock index price,the mixed model with a time window of 7 trading days has the best prediction effect.After adding the characteristics of investor sentiment,the prediction error of each model is smaller than that of the benchmark model.The prediction error of the CNN-LSTM mixed model constructed in this thesis is significantly lower than that of the single model and other neural network models.Moreover,the fitting effect of the model integrating the characteristics of direct investor sentiment is better than that of the model integrating the characteristics of indirect investor sentiment.These groups of comparative tests prove that the text sentiment index constructed in this thesis can provide more effective investor sentiment information,and the CNN-LSTM network model structure can also extract data features well and capture the volatility of stock index prices,so as to make accurate predictions. |