Font Size: a A A

Analysis And Research Of User Portrait Construction Algorithm Based On Behavior Data

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:H N WangFull Text:PDF
GTID:2428330611470415Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of society,the speed of Internet technology innovation has advanced by leaps and bounds,and people's lives are becoming more and more inseparable from the Internet.By this time,the digital age has quietly arrived,and Internet data has exploded.For example,tools such as search engines,Weibo,and We Chat play an increasingly important role in people's daily lives.Every day,many users search for information on the search engine platform,leaving behavior data,which hides users Demographic attribute information,habits,hobbies and other information convert user attribute information into user tags,providing a data basis for constructing user portraits.How to efficiently and accurately mine user tags from behavioral data to describe user portraits,in this paper,algorithm models are used to predict user attribute information,which provides a new impetus for constructing user portraits.This article predicts the user's age,gender,and education level based on the user's historical query record data in the search engine.The main research contents are as follows:(1)According to the characteristics of user behavior data in search engines,analyze and study a variety of knowledge representation methods,comparatively analyze the characteristics of users in wording habits,topic information,etc.,and further analyze the association between words and words.According to the characteristics of user query words,on the basis of the Doc2 Vec model,a document memory query method based on distributed memory model(dbow-qdv)and a document query method based on distributed word bag model(dm-qdv)are proposed.Two improved training methods improve the accuracy of user query word document classification prediction.(2)A user portrait algorithm based on Stracking strategy and XGBoost is proposed to predict population attribute labels.According to the relevance of user attributes,cross-validation training model prediction task to improve the prediction effect.In the first-level model,different basic models are used to extract the features in the user's query words.In the second-level model,Stacking integrated learning strategies are used to further integrate the features to finally realize the prediction of population attribute labels.Experiments show that,Verifying the effectiveness of the proposed model in predicting multiple population attribute tasks.(3)Improve the algorithm model based on the integrated learning framework to achieve the generalization ability of the model;divide the overall algorithm architecture into an integrated learning model and a semantic coding model.In the integrated learning model,a multi-layer model is used to realize the prediction task;the semantic coding model uses the BERT model to encode text,extract deep semantic information,complete the prediction task of multidimensional population attribute labels through softmax,and finally vote on the results of the two The final classification results are obtained.The experimental results show that the proposed model can better complete the task of predicting the multi-dimensional attributes of the population.
Keywords/Search Tags:User portrait, knowledge representation, Stacking, XGBoost, BERT
PDF Full Text Request
Related items