Font Size: a A A

Collaborative Recommendation Of Multi-Source Data Based On Locality-Sensitive Hashing

Posted on:2024-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:K X YangFull Text:PDF
GTID:2568307070951819Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the scale of network users continues to expand,and intelligent devices have also penetrated into all aspects of life,so personalized advertising recommendation scenarios are also diversified.Therefore,it is necessary to fully mine data from different sources to achieve full-link data connectivity,precipitate user data assets,deeply mine data through big data technology,and develop effective personalized recommendation models,Realize personalized marketing for thousands of people.In this paper,based on the personalized recommendation scenario,we use a locally sensitive hashing algorithm to fuse multi-source data and a collaborative filtering algorithm based on users to make relevant recommendations for similar users.It effectively improves the data sparsity and cold start problems in the recommendation model to achieve optimization of personalized recommendation scenarios.The model creates a common feature dataset with samples from the advertising and payment sides,and optimizes the use of a locally sensitive hash correlation algorithm,with the ultimate goal of using the data dimensions of the payment side scenario to expand the portrait of the advertising side population and make similar recommendations.There are three main contribution points in this paper as follows:·A user feature dataset based on advertising scenarios and payment scenarios is created.In this paper,the data of advertising scenario and payment scenario are combined with external data for data normalization,and the common features are extracted from the users in the two scenarios based on geographic location dimension,time dimension and interest dimension.The geographic location dimension is mainly based on the conversion of latitude and longitude by Geohash algorithm and the number of active times is used as the weight to generate features;the time dimension is mainly based on the extraction of user features from three perspectives,such as time period,weekday and month;the hobby dimension is based on the user’s business meaning to form the crowd features,and finally the fusion of the features of these three dimensions.This standardized dataset provides support for the study of multisource data fusion recommendation scenarios.·Proposes an improved recommendation model for locally sensitive hashing algorithm.The locally sensitive hashing algorithm is prone to data skewing problems when computing large-scale similar user data,and too large or too small samples in the hash bucket have an impact on the accuracy of recommendation results.In this paper,we introduce the second iteration of user feature hash value by reducing the dimensionality of high-dimensional random features of locally sensitive hash,and use the position distance of adjacent buckets to merge adjacent hash buckets for the problem of too small data volume.At the same time,considering the influence of the number of valuable features on the similarity calculation degree,the correction coefficient is introduced to the user’s similarity calculation.After experiments,it is proved that the optimization processing of the above model can effectively improve the local sensitive hashing algorithm and improve the matching efficiency of similar users.·Comparative experiments are designed to demonstrate the effectiveness of the improved recommendation model proposed in this paper.In order to compare the effectiveness of the model more intuitively,comparative experiments are designed for the computational efficiency and accuracy of the recommendation model respectively.Firstly,for the computational efficiency of the recommendation model,the experiments are designed to verify whether the optimization of similarity calculation improves the matching efficiency,so the matching time before and after the algorithm optimization is compared.For the accuracy of the recommendation model,a comparison was designed between the experimental group of online users’ real orders and the control group,with the experimental group being the recommendation results of the improved locally sensitive hashing algorithm and the control group being the recommendation results of the model of the ordinary locally sensitive hashing algorithm,and whether the users clicked on the advertisement as the target to verify whether the accuracy of the recommendation model was improved.
Keywords/Search Tags:locality sensitive hashing, Recommendation system, Multi-source data, Collaborative filtering
PDF Full Text Request
Related items