| With the rapid development of computer network technology and storage technology, information shows explosive growth in the Internet. The time of Big Data has coming. It is like looking for a needle in the ocean for users to search useful information in massive data. It becomes a hot topic for both academic and business circles about how to excavate and offer the most valuable information to users from the massive information. In recent years, recommendation system has raised worldwide as an intelligent individual information service technology, and is applied in e-commerce, video entertainment, social network and many other areas. Nowadays, individual recommendation technology has become a very important research direction among big companies and institutions.With the development of many years, recommendation system has derived into collaborative filtering recommendation, basing on content recommendation, hybrid recommendation and so on. Among these, collaborative filtering recommendation is the most mature and the most popular one. However, the collaborative filtering algorithm has its own problem, like data sparseness and scalability problem. Especially under the background of Big Data, these problems are magnified largely, which becomes the bottleneck of its development.This paper lucubrates the reason of the scalability problem of the traditional collaborative filtering algorithm, combining with practical application environment of recommendation system, having a discussion about similarity calculation of the algorithm. Focusing on pretreatment of input data, an improvement is applied to the collaborative filtering algorithm based on users. The improved algorithm utilizes hierarchical inverted index structure based on "Bag-of-Words" model to filter valid data, and proposes a "soft-assignment" strategy to make up the error of data filtering.For the achievement of algorithm, cloud computing technology brings a new solving idea for the scalability problem. Under the background of Big Data, the best choice is adopting parallel implementation to the algorithm. This paper analyzes the operation procedure of Hadoop cloud computing platform and the programming thought of MapReduce distributed framework, and designs a parallel implementation based on MapReduce to the improved collaborative filtering algorithm.The method proposed in this paper was experimented via real data set and simulated data set on Hadoop platform. The result demonstrates that the improved method can solve the scalability problem efficiently, comparing the traditional collaborative filtering algorithm. And keep the recommendation accuracy of the recommendation algorithm at the same time. |