Font Size: a A A

Research On Service Recommender System And Its Key Approaches Over Big Data

Posted on:2015-10-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:R HuFull Text:PDF
GTID:1318330518991341Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the continuous improvement of Web service standard and the maturity of software platform, Web services have already become very important computing resources and software assets. More and more Web services are developed and published on the Internet, providing strong support for enterprise application integration. However, while enjoying the convenience of Web services, users are also puzzled about how to find some desired ones. This has brought about the question of service overload.A service recommender system can help users find some services that they may be interested in, through creating binary relations between users and services. As a result, it has become an effective way to overcome the problem of service overload. Recommendation algorithm is the core of recommender system, which determines the system's category and performance. Among multiple recommendation algorithms, collaborative filtering is one of the most successful algorithms applied in recommender systems. The basic idea of collaborative filtering is the rating that an active user may give to an inexperienced item approximates the ratings that his/her neighbors have given to this item.With the increasing number of users and services, service recommender system faces many new challenges. Firstly, service relevant information, such as service description, service usage and feedback information, accumulates into service big data. Such huge, multi-type and dynamic data are not suitable to be stored in a traditional relational database. Secondly, since the nearest neighbors should be found from the whole user space in collaborative filtering algorithm, it is difficult to guarantee the real-time of recommendation. And then the response time of recommender system would be influenced. Thirdly, most users only give ratings to a small part of services, which results in a serious sparsity problem. It may affect the accuracy of recommendation.In view of these challenges, the main contributions of this paper are listed as follows:1) A framework of service recommender system is proposed in this paper. The system framework consists of three layers, which are big data collection layer, big data storage layer and service recommendation layer. Concretely, big data collection layer is responsible for collecting service big data, including users' queries, services' ratings, users' logs and services'descriptions. Big data storage layer is responsible for storing and managing service big data.And service recommendation layer is responsible for extracting the corresponding data from big data storage layer and applying collaborative filtering method based on service clustering and user filtering to implement service recommendation. A service big data storage model named ServiceTable, which is based on BigTable, is applied in service storage layer. Utilizing the advantages of BigTable such as high performance, high scalability and supporting mutil-type data, the problem of service big data storage is then solved.2) A service clustering method based on service characteristic similarity is proposed in this paper. This method can be mainly divided into four steps. Firstly, service tags and service functionalities are extracted from ServiceTable as the characteristic items of service.Secondly, service tags are stemmed using Potter Stemmer to unify the forms of words.Thirdly, the tag similarity and functionality similarity are calculated using Jaccard similarity coefficient respectively. The weighted sum of tag similarity and functionality similarity is regarded as the characteristic similarity between services. At last, services are clustered using agglomerative hierarchical clustering algorithm. The services that have higher characteristic similarities are grouped into clusters. Thus, in the process of collaborative filtering, the nearest neighbors of the active user are found from the users who have used the services belonging to the cluster that the target service belongs to. Usually, the number of services in the cluster that the target service belongs to is much less than the number of available services in the recommender system before clustering. Therefore, the sparsity is reduced and the real-time is improved by using this method.3) A user clustering method based on user interest similarity is proposed in this paper. This method is mainly divided into four steps. Firstly, users' logs are extracted from ServiceTable.Service queries, service names and service operations are parsed from the logs. Secondly, the service queries, service names and service operations of each user's log are preprocessed,including removing stop-words, extending abbreviations and stemming. The words that have preprocessed are put into the word-of-bag of the corresponding user. Thirdly, the weight of each word is computed using TF-IDF method. The words with top N weights are chosen as the interest words that are constructed as the user's interest vector. A user's interest vector is updated using exponential decay formula. At last, users' interest similarities are computed according to cosine distance formula. And the users who have higher interest similarities are grouped into clusters using paralleled K-means algorithm. Then, in the process of collaborative filtering, the nearest neighbors of the active user are found from the users belonging to the cluster that the active user belongs to. As the nearest neighbors need not found among all the users, the real-time efficiency is enhanced.4) A user filtering method based on user clustering is proposed in this paper. This method is divided into three steps. Firstly, the log of each user in the cluster that the active user belongs to is extracted from ServiceTable. The time and address that the user has invoked the target service are parsed from the log. Secondly, the time that the active user needs the target service and the time that the other users invoked the target service are mapped into some periods according to a period mapping function. And the address that the active user needs the target service and the address that the other users invoked the target service are mapped into some regions according to a region mapping function. Thirdly, if the period and region that a user invoked the target service equal to the period and region that the active user needs the target service, this user is considered to have context consistent with the active user. The context of each user in the cluster that the active user belongs to is compared with the context of the active user. If a user's context is consistent with the active user's context, he/she will be put into the context-consistent user set of the active user. Otherwise, the user will be filtered. Regarding the context-consistent user set as the user space in collaborative filtering,the sparsity will be reduced and the real-time efficiency of service recommendation will be improved.5) A collaborative filtering method based on service clustering and user filtering is proposed in this paper. This method can be divided into three steps. Firstly, the rating similarity between the active user and every user in the context-consistent user set is computed using Pearson correlation coefficient method. Secondly, a similarity threshold is assigned. A user whose rating similarity with the active user is higher than the similarity threshold will be chosen as the nearest neighbor of the active user and be put into the nearest neighbor set. At last, the weighted average rating that all the users in the nearest neighbor set gave to the target service is calculated as the predicted rating of the target service. Commonly, the number of services in the cluster that the target service belongs to is less than the whole number of services. The number of users in the context-consistency user set is less than the whole number of users.Therefore, the sparsity of this method is less than the traditional collaborative filtering method while the efficiency of this method is higher than traditional collaborative filtering.Finally, the effectiveness of the theories above is analyzed and verified in this paper.
Keywords/Search Tags:Big Data, service recommendation, clustering, MapReduce, BigTable, context, collaborative filtering
PDF Full Text Request
Related items