Font Size: a A A

Research And Design Of Literature-Related Discovery System Based On MapReduce

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q D DuanFull Text:PDF
GTID:2308330485978975Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As the development of Internet, the data in every field grows exponentially and affects our life in the form of multivariate, polymorphism and interconnection as well. Big data has become the sign of the times. In academia, a large number of literatures are published every year. The related net between the literatures is becoming more and more complex. The traditional method of literature-related discovery is usually based on searching relevant information from the literature databases which exist in some database management systems, such as by the way of keyword matching for theme retrieval. However, there are few researches on the correlation among literatures and people’s concern, so the result is that there are some defects, such as poor association between literatures, no clear theme, and low quality of literatures. The defects will be more obvious with the massive data. The rapid development of big data technology provides effective means to solve the above problems.Nowadays, the common technologies used in big data are distributed computing, distributed storage, data warehouse and so on. Apache Hadoop, an open source platform, is one of the most popular tools to analyze big data now, in which the parallel computing framework-MapReduce, and the Hadoop Distributed File System-HDFS, are the basic components of it. In this paper, we use MapReduce to analyze the academic data sets. And a literature-related discovery system based on MapReduce is proposed to find the needed and excepted literatures quickly and efficiently for users.First of all, the method of literature-related discovery based on MapReduce is proposed by combining the technologies of distributed computing and algorithms of data mining. On the large scale literature data, on the one hand, the active level of literatures are analyzed in a distributed way, so as to find the active level of literatures, on the other hand, the parallel FP-Growth algorithm is implemented to mining frequent item sets in the literature data, so the potential relations between literatures can be discovered. And then the performance of the parallel algorithms is evaluated.Then, this paper carries on the related structure and function design of the distributed literature-related discovery system. In order to realize the function of personalized recommendation in this system, the users’historical search logs are analyzed to mine the preferences, so users can get expected literatures with high quality.This paper proposes to deal with the large datasets in the academic with the method of distributed computing, where the potential relationship is discovered and the combination of theory and application is realized as well. It undoubtedly makes the distributed computing play the new applied innovation in academic. At the same time, the personalized recommendation is achieved in this paper through the analysis of the users’preferences. This move fits the idea of "people-oriented" and has the practical significance.
Keywords/Search Tags:literature-related discovery, distributed computing, MapReduce, parallel FP-Growth, recommendation
PDF Full Text Request
Related items