| The Internet has accumulated large amounts of tags and contents about the books,which provide a new data source for book recommendation systems.This paper focuses on how to build the personalized book recommendation system to improve the performance based on the text data.The work mainly comprises there parts,i.e.preference features extraction based on semantics,recommendation algorithms and parallel implementation based on Spark platform.Firstly,backgrounds and motivations of the project are introduced.The key problems,which affects the performance of recommendation systems greatly,are summarized to clarify the content of the paper.Secondly,the techniques such as semantic analysis,recommendation algorithms etc.are reviewed and the advantages and disadvantages about them are discussed,which lay the foundation for future research.Next,the paper studies how to extract the preference features based on text semantics.The attenuation function is applied to characterize the time context of tag preference,which reflects the users’ current interest more accurate.The algorithm for calculating the similarity of tags based on word2vec and co-occurrence is proposed.Word segmentation system and Jaccard metric,which are the key modules of the algorithm,are designed carefully for our specific scenarios to improve the accuracy.Subsequently,PIC algorithm is adopted to cluster the tags,which solves the problem of sparse tags and model the semantic preferences based on tags.LDA is also introduced to extract the topic distributions of abstracts to improve the semantic preference when tags are less.The paper also discusses the extended collaborative filtering algorithm based on semantic preferences,and the experiment is designed carefully to validate the performance of the system.The experiment conducted on the dataset,which was crawled from douban website,shows that the features based on text semantics reflect the user’s preferences and perform well in the recommendations,especially on the precision and the diversity.Finally,we design the parallel implementation of the recommendation system based on Spark platform.With the help of MLlib,word2vec,LDA and clustering algorithms are realized easily.The extended collaborative filtering consists of UserCF and ItemCF.The details are both designed and implemented in the paper.The actual measurements show that the algorithms speedup well. |