Font Size: a A A

The Research Of Algorithm About Social Network Recommendation Service Based On Hadoop

Posted on:2014-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q RenFull Text:PDF
GTID:2248330395498022Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and network technology, Internethas already entered the Web2.0era. The Internet of Web2.0era becomes more intelligent,personalized and social, influencing and changing people’s way of life, one of the most typicalexamples is SNS(Social Networking Services).Due to the social network always has a huge user groups and users frequently updateweibo, causing social network would produce large amounts of user data every day. How tofind useful information from the user data, and how to provide users with personalizedrecommendation service becomes focus on the direction of social networks. However, thedatas generated by social networks are always large scale data sets, how to deal with this massof data set is one of the more severe challenges.Hadoop is the open source of Google’s cloud computing platform, which is a softwareframework widely applied in industry and academia. It is used for distributed processing oflarge amounts of data with high efficiency, high reliability, high scalability, economicaffordability, and many other advantages.In order to deal with huge amounts of data scalability, using a distributed platform tocomplete social networking service recommendation algorithm is a good choice. Given theinherent mass data storage and processing power of Hadoop, it can effectively solve thedifficulties in safe storage and efficient processing, at the same time it can guaranteereliability, effectiveness and security of the data. In this paper, we put forward building socialnetworking service recommendation system on Hadoop cloud platform.The system is divided into four parts, like data acquisition module, data preprocessingmodule, data storage module and service recommendation module. In the data acquisitionmodule, we use sina weibo API to access to user data. In the data preprocessing module,FudanNLP is adopted to proceed the Chinese word segmentation. In data storage module, webuild HBase tables to store sina weibo data, and use the HBase API to operate the tables. Inservice recommendation module, we implement the distributed TF-IDF algorithm in theMapReduce model, this algorithm is used to calculate the importance of each word in weibo,and to extract the keywords from user’s weibo. According to the keywords extracted from weibo, you can find the user’s interest, and recommend relevant content to the user.In order to verify the accuracy and validity of the distributed TF-IDF algorithm in thispaper, we compare the keywords extracted by the distributed TF-IDF algorithm with thekeywords extracted by the TextRank algorithm for many times. Results show that keywordsextracted by these two algorithms are very close, and with the increasing of keywords’number, the results become more closer. This proves that the distributed TF-IDF algorithmimplemented on MapReduce is accurate and effective. At the same time, due to the distributedTF-IDF algorithm considers the identification problem of keywords, it performs better thanthe TextRank algorithm. In addition, compared with TextRank algorithm of response time, itcan be seen that the distributed TF-IDF algorithm has good scalability.In this paper, the proposed recommendation system based on Hadoop cloud platform hasa certain reference value for data mining application in cloud platform, and has certainexploring significance for recommendation system implementation in cloud platform.
Keywords/Search Tags:Cloud Computing, Social Networking Services, Service Recommendation, Hadoop, HDFS, MapReduce, HBase, TF-IDF
PDF Full Text Request
Related items