| The development of quantitative investment technology makes stock analysis no longer solely rely on professional investment knowledge.Traditionally,investors use the quantitative model to predict the stock price directly and get the optimal investment portfolio as a reference.However,traditional quantitative investment methods of factor analysis on market time series technical data has four obvious shortcomings.First,traditional factor extraction methods rely too much on fixed manual feature engineering,which may gradually fail with the change of market trends.Second,only the article title or sentiment factor is used in the fusion of news data,resulting in information loss.Third,the multi-modal stock prediction methods that leverages both texts and company graph information are rarely explored,which leads to the underutilization of many fundamental data containing professional analysis and company relationship information.Finally,most quantitative models are concerned only with the profitability of the model and ignore the importance of risk control in investment.To solve the above problems,this thesis first leverages Stock BERT to get textual representation of news text and summary extracted by LDA,and proposes an extractive news enhanced stock recommendation model.To integrate various heterogeneous information from market data,fund manager position data,institutional research reports,financial news and company industry data,this thesis further proposes a hybrid attention stock recommendation model based on multi-source heterogeneous data(Multi-source Stock Prediction Network,MSPN).In addition,to facilitate quantitative researchers to verify the effectiveness of different quantitative strategies,this thesis further presents an quantitative open source platform for China’s A-share market,which consists of a complete pipeline of data collection and processing,model definition,and backtesting analysis.Sufficient experiments demonstrate the effectiveness of the heterogeneous data collected and the excellent performance of the model proposed.In summary,the core contributions of the this thesis are as follows:(1)In this thesis,we open source Astock,a complete quantitative research platform for Chinese stock markets: The platform includes three components: data,model and backtest.In the data module,a large data set with a time span of four years involving more than 4000 stocks,a total amount of data up to ten million,and five parts of heterogeneous data is constructed.It fills the gap of the lack of recognized highquality data sets in China’s stock market.In the model module,several commonlyused quantitative models are included,and the modules are separated according to the structure,training and parameter adjustment,which ensures high cohesion and low coupling.Finally,the stock backtesting module supports the independent selection of investment methods and investment cycles,and rich financial indicators and technical metrics help users evaluate the effectiveness of strategies from different perspectives.(2)This thesis proposes an extractive news text enhanced stock recommendation model: In order to make full use of the rich information in the news content,this work first uses Lattice-LSTM to assist named entity recognition to extract and align the company names involved in the news,and uses the LDA generation model to extract the topic text from the main paragraphs of the news content text.Then,we use the whole word mask task to train the Stock BERT,a pre-trained model for financialrelated downstream tasks,to get the textual representation of text summaries and news texts.Finally,in the prediction stage,the stock price was predicted by combining the market transaction sequence data of the post-recovery rights.(3)This thesis proposes a hybrid attention stock recommendation model based on multi-source heterogeneous data:In order to make full use of the multi-source advantages of fundamental analysis,this thesis first proposes a heterogeneous financial data aggregation framework,which obtains quarterly fund position matrix through matrix decomposition,uses Stock BERT for news text representation,applies joint attenuation function for investment advice reports,and process restoration for the market transaction data.And after aggregation,the market value industry neutrality is used to automatically extract factors.Then,a company relationship graph is constructed to aggregate the stock information of related companies,and based on this,an inter-attention source prediction network is further proposed to dynamically infer the importance of each input factor under different market characteristics.Finally,the model automatically constructed an investment portfolio with high return ability and risk control ability according to the predicted stock price.In summary,the open source Astock quantification platform simplifies and unifies the process of quantitative research work,and provides a comprehensive stock data set with rich data sources and effective information,a set of commonly used quantitative models,and a lightweight stock backtesting system.The extractive news text-enhanced stock recommendation model successfully obtained effective information in the news text content.The final MSPN model integrates multiple heterogeneous data sources,automatically extracts factors,dynamically aggregates and updates according to the company relationships,and performs better than the baseline model in terms of revenue ability and risk control ability. |