| With content marketing becoming more and more important,Taobao App began to recommend e-commerce content under its home page recommendation column.E-commerce content recommendation will face the common cold start problem in the initial stage of recommendation system.Taobao has accumulated user behavior logs for many years,which can effectively alleviate user cold start problem in the initial stage of content recommendation by using user behavior information on products.The most important two stages of recommendation system are recall and ranking.In the recall stage,users' interested resources are selected from a large number of resource pools as candidate sets for ranking stage.The recall phase determines the upper limit of accuracy for the sorting phase and needs to be completed in milliseconds.For this reason,for the cold start problem of Taobao in the initial stage of content recommendation,this paper does the following research work in the algorithm of recall stage:This paper proposes a user model that codes user's behavior on commodities as a vector.It integrates the text information of user's click and search sequence on commodities,introduces the temporal and temporal information of user's behavior sequence,and maps user's behavior information from commodity space to text semantic space.A user interest vocabulary is constructed by searching records of Taobao.Based on the vocabulary,a multi-label interest classification task is used to evaluate the user model.The behavior log of 100 million Taobao users for half a month is extracted as experimental data,and the validity of the user model is verified by design and experiment.A Bidirectional Transformer Encoder-based Content Model is used to extract content vectors.Using Taobao Zhongdaren's shared content as the experimental sample,and using the search index data of Taobao's experience to label the content shared by each person with interest,the length of the text sequence input by the content model is determined through the experiment,and compared with other methods,the encoding effect of the content model based on BERT is verified.Drawing on the idea of deep semantic matching model,this paper proposes a model that matches user's behavior and content in goods.For the problem of sparse data and inconsistent vector space in matching model training,a pre-training method for joint training with shared logic layer parameters is proposed.By analyzing the differences of random sampling,click-on heat sampling and positive sample distribution,the appropriate negative sample construction strategy is selected.Finally,using the behavior data of Taobao users as experimental samples,we designed a multi-group experimental verification matching model which can effectively solve the cold start problem of recommendation system when it carries out new business.In summary,the proposed vectorized recall model maps user behavior information from commodity space to text semantic space,realizes the matching of user and e-commerce content,and can effectively solve the cold start problem. |