Font Size: a A A

Design And Implementation Of User Portrait System Based On Spark

Posted on:2022-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y HouFull Text:PDF
GTID:2518306722472884Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet today,the amount of information in the world is increasing "explosively",For Internet enterprises will face huge amounts of data every day,how to extract these seemingly random redundancy of data,processing,analysis and use,to maximize the value of the data,used to build user portrait,facilitate recommendation system or advertising system for precise recommendation or advertising,has become an important problem enterprises can not be ignored.How to deal with the problem of "user portrait" will greatly affect the revenue of enterprises,and also affect the development space of enterprises in the field of "user portrait".This paper mainly studies the design and implementation of user portrait system based on Spark framework.The system can not only provide data support for advertising delivery or accurate recommendation system,but also help enterprises to make operational decisions by visualizing user portrait data.The whole system is divided into data preprocessing and storage module,user portrait building module,data visualization module.The core part is the user portrait building module.Based on massive user data,the module uses the Spark framework to calculate user portrait labels in parallel,namely statistical labels,matching labels and mining labels,which solves the performance bottleneck of traditional data processing methods under massive data.This paper focuses on the design and implementation of mining tags.In order to improve the computing performance of mining tags,the parallelization of Softmax algorithm is studied based on Spark's parallelization computing ability,which solves the multi-classification problem on massive data and improves the computing performance of user loss risk tags.In addition,this paper also studies the parallelization of naive Bayes classification algorithm based on TF-IDF and mutual information weighting,improves the traditional naive Bayes algorithm by means of weighting,and applies it to the parallel calculation of user comment category tags based on Spark framework.In this paper,experiments are carried out on the Spark platform to conduct distributed training and prediction of the mining label computing model in the way of data parallel,and the effectiveness and execution efficiency of the model are evaluated.Finally,this paper based on the Spring framework,Spring MVC framework,My Batis framework and Echarts framework design and implementation of data visualization module,and user portrait data visualization output.
Keywords/Search Tags:Big Data, User Portrait, Tags, Ddistributed, Parallelization
PDF Full Text Request
Related items