From the inception of the quantitative investment concept in the 20th century to its maturity today,the quantitative investment models have become increasingly accepted by the market.During these decades,various quantitative investment strategies have been adopted,such as market-neutral strategy,CTA and index enhancement.Among the market-neutral strategies,multi-factor stock picking strategy occupies an important position,and investors often seek to outperform the benchmark index by building multi-factor stock picking models.After decades of market testing,multi-factor quantitative stock picking model is favored by quantitative investors for their stability and reliability.In a multi-factor strategy,the quality of the selected factors is closely related to the final stock selection effect.Therefore,a genetic planning factor mining model is constructed in this paper in order to uncover the implicit and long-term effective factors in the market.Firstly,Tushare,a domestic open source platform for financial data,is used to obtain daily frequency volume and price information of all constituents of the CSI 300 index in the A-share market for each trading day from January 4,2012 to December 31,2019,by which the stock pool is set to be updated simultaneously with the index components,totaling about 560,000 items.Then the above re-weighted base volume and price indicators will be used to construct the initial factor pool in a pre-weighted manner.Then the factors in the initial factor pool are evolved and filtered according to the genetic planning process using the genetic planning project Gplearn in python,constructing the fitness index RankIC and setting the relevant function parameters,and retaining the factors with higher fitness in the population.The top-ranked factor expressions were then analyzed by linear regression and hierarchical backtesting for single factor analysis,and the resulting factors were further subjected to data processing operations such as depolarization,normalization and neutralization to verify the validity of the factors and make judgments on the factor directions.The test results show that the average value of RankIC of each factor is negative regardless of the position time,which indicates that the exposure vector of these factors has a negative correlation with the return of the next trading section,i.e.,the smaller the value of the factor,the higher the return of the next section.Moreover,the absolute value of RankIC and the absolute value of t-value of each factor are large,and the cumulative return stratification effect is obvious,which fully prove the strong validity of the factors mined according to the genetic programming model.Finally,this paper uses the above factors and common fundamental factors in the industry to construct a three-factor,five-factor and multi-factor stock selection model by using the factor Rank value scoring method,and back-testing it in the period of 2020-2021 by weekly position adjustment.In order to better describe the real market,this paper also introduces transaction fees and slippage settings.Among them,the transaction fee setting includes agency service fee and stamp duty.In China’s securities trading agency service fees are currently charged in both directions,with a default commission factor of 0.03%for intermediaries and a minimum of 5 yuan.As for the stamp duty,it is unilateral.it is levied on sellers but not on buyers,with a coefficient of 0.1%.Since the real transaction price after placing an order in the real world has a certain deviation from the expected price,the industry often simulates the real transaction scenario as much as possible by introducing a slippage in backtesting,which is set to 0.1%of the current price in this paper.The backtest results show that the multi-factor stock selection model constructed by using the factors mined from the genetic programming model can reduce the investment risk and bring a certain degree of excess return compared with the portfolio based on traditional fundamental factors only. |