Font Size: a A A

Research On Key Problems And Technology In Personal Information Recommendation

Posted on:2015-12-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:F J YinFull Text:PDF
GTID:1108330509460957Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The development of Internet technology and the spread of information in the Internet make the amount of information expand rapidly, so during the information acquisition people are faced with the difficulties caused by information overload. How to help Internet users access information that they want more efficiently become a hot topic of the interdisciplinary among information science, computer and network science and other areas. Thanks to the tireless efforts of so many researchers, there are several ways to efficiently obtain information current years, namely, information retrieval and information filtering technology. The former, a typical representative of which is the search engine, interacts with the user to obtain a description about the target information in the form of several keywords; by the latter, information that the user may be interested in is recommended to the user by collecting user behavior data, analyzing the potential interest of the user and filtering the irrelated information. Compared with each other, the search technology requires the user to provide key words as clear as possible to describe their needs; same key words can not further distinguish the users who may have different habits and the recommendation result is same. However recommendation technology does not require the user to directly provide their own specific needs. The preferences and tendencies can be obtained by analyzing their historical behavior data, so users can get more precise information while doing less describtion. Therefore, when determined to find specific information, search technology is a good choice, yet in the case that there is no clear demand, the drawbacks of this approach began to appear and the user may not get a satisfying solution. In this situation, it is the right time to use recommendation technology and it promise a good result for these users whose history behaviors is sufficient.Recommendation has been developed for nearly two decades, and has made a lot of successful applications in many areas, especially in the area of electronic commerce, by using the recommendation the sale has got a substantial increase in quatity. As to academic research, the recommendation technique has get the attention of a large number of researchers. While the classical methods are still being studied a lot- such as collaborative filtering method, there are many new methods constantly proposed to further enrich the research about recommendation techniques- such as bipartite network-based approach. With the continuous deepening of the research as well as the forther changes in the application environment, recommendation technology is faced many problems and challenges, of which the most important is the data sparsity problem and large-scale data processing problem. Data sparsity problem refers in the collaborative filtering based recommendation, as the number of users and items increase vastly, a relatively small rating data leads to data sparsity of the user-items matrix, which would decrease the accuracy of the recommendation method. Large-scale data processing problem is that with the increasing of uaers and items amount in the practical application, the increasing pressure of recommendation algorithm in real time ask for designing more efficient methods or proposing other methods to improve the implement efficiency of the algorithm, enhancing the recommendation algorithms in the aspects of data processing capacity and processing speed.To solve the above major challenges faced by recommendation technology, in this paper we study the following four problems.First, we study the data sparseness problem in the rating prediction based on collaborative filtering method. Rating prediction is one of the main research content in personal information recommendation, by analyzing previous user ratings to predict the scores that the ungraded items could get possibly. The impact of data sparsity problem to collaborative filtering algorithm mainly lies in the similarity calculation between users and prediction generation. Data sparsity results in common rating data between users becoming more limited, making the credibility of the similarity between the users decline; by the impact of the sparsity, the integrity of the rating scores of nearest neighborhoods can not be guaranteed, and ratings on an incomplete set of reference values for prediction will not promise a high degree of accuracy. Therefore, a new method is proposed in this paper which is based on the absolute similarity metrics of the users(the projects) to carry out the neighbor selection and cross-dimensional recovering to improve the integrity of the reference rating set. Experimental results demonstrate the effect of the proposed algorithm in both reducing the impact of data sparsity and improving the accuracy of the recommendation.Second, data sparsity of top-n recommendation based on the bipartite network. Top-n recommendation is the other basic problem of the personal information recommendation, its purpose is to provide a list containing n recommended items to each user. The recommendation method based on bipartite network is a relatively new method which can better adapt to such sparse data, and may achive higher accuracy. The division of user interest based on user ratings considers only the user’s favorite part of items, so the data utilization rate is low while the information about users do not like of part of the projects is ignored. User ratings reflect the difference between the interest, so it should be reflected not only in the presence or absence of interest but also be further refined to the difference in the intensity of interest during the initialization and transfer of interest resources in the process. This paper presents a new method based on bipartite network, which take into consideration the information that which items users do not like. By analyzing the items of interest and others of non-interest, we establish an user interest model that is aware of negative interest. Besides, we adopt a rating sensitive interest resource initialization method and resource transfer method to reflect the difference in the degree of user interest. Experimental results show that using the new method proposed in this paper, the recommendation effect achieved is improved significantly.Third, ratings predict algorithm based on bipartite network. For these data sets whose node degree distribution is uneven, we propose a recommendation algorithm based on bipartite network for rating prediction. In the algorithm, the temperature difference between users is unbiased with user average rating and then conducted in the network, and a biased temperature recovering is implemented to get the predicted rating. Since no similarity calculation and selection of a fixed number of users(project) as neighbors, bipartite network based method can mitigate the impact of sparse data better. The proposed algorithm is based mainly on heat conduction process, and uses the temperature difference between users to conduction. Besides, we set the mean difference from all the nodes connected with the target node as its final value, of course every value is weighted using a weight associated with its particular temperature difference in order to balance its influence. In addition, we use a temperature recover process to predicted temperature for item node, which is also the rating prediction. From the experimental results, on the specific types of data sets, the proposed algorithm can get better performance than the method based on collaborative filtering effect, and the algorithm is more efficient than the standard heat conduction methods.Fourthly, Mapreduce based large-scale data processing for rating prediction and top-n recommendation algorithms. Personal information recommendation in practical application is confronted with increasing amount of data to be processed, which put forward higher requirements to recommendation algorithm. Some studies focus on simplifying the process for the algorithm, such as dimensionality reduction in matrix factorization, but these kinds of methods are still hard to break their inherent limits. In this paper, the proposed three recommendations algorithms are analyzed and top-n recommendation algorithm and rating prediction algorithm based on bipartite network are then designed and implemented for parallel computing by using Mapreduce parallel computing framework. In order to improve the efficiency of the algorithm and reduce the time consumption, it allocates the entire computation of large-scale data to multiple computing nodes to process simultaneity. The advantage of such a method is, with the increasing amount of data, as long as sufficient amount of computing nodes is provided, the scalability can be increased sostenuto.
Keywords/Search Tags:personalized information recommendation, rating prediction, top-n recommendation, collaborative filtering, bipartite network, data sparsity, massive data processing problem, Mapreduce, parallel computing
PDF Full Text Request
Related items