In recent years,the world economy has ushered in rapid development with the advancement of modern science and technology.As an important part of economic growth,financial markets play an increasingly important role in the development of the world economy.Especially,the futures market,is of great significance in stabilizing the spot market,avoiding price risks,and increasing market liquidity as a market organization form based on commodity entities.The accurate analysis and prediction of the futures market has significant socio-economic value.There are massive heterogeneous data from multiple data sources in the futures market,such as transaction data,news commentary,social media,etc.These data can be simply divided into structured numerical data and unstructured text data.This kind of data characteristics brings new challenges to the mining of potential laws in the futures market and the prediction of price changes.Its research has important academic and application value.With the continuous development of computer hardware and computing power,machine learning and deep learning have achieved remarkable results in many fields such as data analysis,computer vision,natural language processing,and speech recognition in recent years.Including accurate representation and fitting to massive,time series(such as 500ms/bar of transaction data),multiple sources(such as data sources from exchanges,social media,news media,etc.),and heterogeneous(such as numerical,image,text,and other data forms)data.This provides a strong technical basis for analyzing and forecasting futures markets with multi-source heterogeneous data.This paper comprehensively considers the characteristics of the futures market and existing related research work,mainly proposes solutions for the multi-source heterogeneity and high noise characteristics of data in futures prediction tasks and verifies the effectiveness of the method in the open data set.The main research work of this paper is as follows:1.This paper proposed an forecast framework based on text analysis methods and improved hidden Markov models to analysis the multi-source heterogeneity data in the futures market.The futures market contains massive amounts of heterogeneous data from multiple data sources.In general,these data can be divided into structured numerical data and unstructured text data.Generally speaking,structured numerical data is mainly composed of time series price information and derived technical indicators,which mainly reflect the price fluctuation range of futures commodities.Unstructured text data usually consists of news events related to the futures market,user reviews in social media,etc.,mainly reflecting the macro factors that affect futures price changes.These heterogeneous data from different sources can reflect different aspects of information in the futures market,which contain different degrees of impact on the futures market.Therefore,how to mine the factors that affect the price of the target futures from the massive multi-source heterogeneous information is the key issue to futures price prediction.We take the palm oil future as example in this paper.First of all,a relationship map with target objects as the core is constructed based on the concept of related subject matters,which is of great use to min the raw data with background information on the palm oil industry.Next,the feature vectors with the polarity of the futures from multi-source data is constructed based on the extended sentiment dictionary.Finally,the hidden Markov model based on the hybrid Gaussian model is used to represent the multi-source features in a unified feature space and min the relationship between the multi-source features and futures price fluctuations.We crawled the relevant data of palm oil futures from September 2016 to September 2017 to evaluate the effectiveness of the framework.The experimental results show that the method in this paper has reached a prediction accuracy rate of 64.15%and an F1 value of 0.7642 in the prediction of the rise and fall of multi-source futures prices,which has significantly improved compared to the baseline model.2.This paper proposes a multi-level analysis framework that combines attention mechanism and the Long Short-Term Memroy network to handle the multi-scale and high-noise characteristics of futures market data.There is a lot of noise in the massive heterogeneous and unstructured data in the futures market.These noise data will continue to spread with the network structure during the feature construction process,.To solve this problem,this paper proposes a feature extraction framework that includes four levels:sentence,trading day,time segment,and time scale.At each level,local features are fused based on the attention mechanism and the Long Short-Term Memroy network.Then the low-noise global features are obtained and used to predict the price of target futures.The experiment proves that the prediction results obtained by this method are significantly improved compared with the baseline model. |