Font Size: a A A

Research On The Knowledge Representation And Model Ensemble In User Portrait Construction

Posted on:2018-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:H C LiFull Text:PDF
GTID:2348330536460921Subject:Computer software theory
Abstract/Summary:PDF Full Text Request
With the Internet technology,especially Google and Baidu as the representative of the rapid development of search engines,the Internet data show explosive growth.Data is our era of oil,how to efficiently and systematically develop the use of these massive data is particularly important.Every day through the search engine we leave a large number of records such as historical query words,these data for the analysis of user population attributes and hobbies,meticulously and completely building user portrait,provides a wealth of data base.Making full use of user behavior record data,abstracting the user attribute information panorama,can be seen as the basis for enterprise application of large data.In 2016,the big data contest “Sogou User Portrait Mining” held by China Computer Federation,provided a month of query words and the user’s population attribute labels(including gender,age,education)as training data.For the user history query word data,we systematically compared and analyzed a variety of knowledge representation methods,Bag of Ngrams method reflects the differences in user language habits,Topic Word Embedding was used to extract the user query word theme information,Doc2 Vec was used to summarize the semantic association information between the user query words.In addition,for the user query words,we have specially improved the Doc2 Vec model.Respectively,we proposed two algorithms,Query Document Vector: Distributed Bag Of Words(qdv-dbow)and Query Document Vector: A Distributed Memory model(qdv-dm),which further enhance the quality of knowledge representation of the query words.For the user portrait building tasks,we presented a two-level ensemble algorithm framework for predicting multidimensional population attribute tags(including gender,age and education).(1)In the first-order single-task models,we combine the Trigram feature with the traditional machine learning model to summarize the differences of user’s words habit,and combine the Doc2 Vec knowledge representation with the neural network model to extract the user query semantic association information.(2)In the first-level multi-task models,we use the Very Deep Convolutional Neural Network model to extract the context-related information from the granularity of the character,and use the FastText neural network model to characterize the user’s query information from the granularity of the word.(3)In the second-order ensemble model,we use XGBTree model and the Stacking multi-model fusion method to comprehensively extract the association information between the attribute labels of the user’s portrait,and further enhance the generalization ability and prediction accuracy of the model.The proposed two-level ensemble algorithm framework won the championship in the big data contest "Sogou user portrait mining".
Keywords/Search Tags:User portrait, Knowledge Representation, Model Ensemble, Tag Prediction, Deep Learning
PDF Full Text Request
Related items