Font Size: a A A

Research On“Expert Robot” Based On Big Data Processing Technology

Posted on:2017-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:X D WeiFull Text:PDF
GTID:2308330503479772Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, the academic paper, journal articles, patent database, network media, social networking platform(micro message, micro blog, blog, forum) contains a large number of potential knowledge,the paper will refer to these professional field as the "experts field". The “experts field” contains lots of “experts data” as experts’ research results, academic viewpoint, dynamic and latest comments.Reasonable organize and use these experts data, will get higher value than the data itself. It is the process of analysis and mining data resources that the core of the research of “expert robot”. Through the study of the expert robot deeply extract and analysis the useful information contained in the mass data, and converted it into understandable and available knowledge resources. This paper is the study of expert robot based on the big data, which is the epitome of the research of the big data in the expert field. In this paper, it is studied the emergence expert data of the internet by three aspect : the data processing speed, data association mining, data application quality.The specific research work of this paper includes the following aspects: Firstly, analyzed the Hadoop platform in detail, and focused on the introduction of the MapReduce distributed programing model and HDFS distributed file system. At the same time, it introduced the full text search engine working principle and the index building process. Due to the surge of the expert data and the lack of efficiency and security of building index in the single machine, This paper puts forward the idea of parallel index construction based on MapReduce.Secondly, described the work principle of the PageRank algorithm which is based on the Web page scoring algorithm, and the process of solving the PR value through the power method. In order to improve the quality of the academic searching results in "expert field", this paper puts forward the E-PeopleRank scoring algorithm suited for the “expert field”. It can measure the proportion of experts in the search results through the ranking. E-PeopleRank scoring algorithm make up the shortcoming of original algorithm’s topic irrelevant, and builds a mapping between the web page association of original algorithm and the expert association of new algorithm. In the process of the iterative computation, the calculation quantity of adjacency matrix of expert will show an exponential growth trend. Therefore, the new algorithm is introduced into the MapReduce programming architecture, and detailed described the specific tasks arrangement in each step.Thirdly, in order to realize the personalized search, the collaborative filtering algorithm is described in detail in this paper. Because the original algorithm ignores the time variation factor, a collaborative filtering recommendation algorithm incorporated with user interest change is proposed. In the improved algorithm, an interest bias function is introduced into original algorithm model. In this paper, the specific improved process of algorithm is described in detail.Fourthly, the Hadoop cluster is built, which provide the running environment for the parallel index building based on MapReduce and the algorithm improvement, and verified the feasibility of the proposed ideas.Finally, summarized the research work of this paper, and provide discussion on the possible future research issues.
Keywords/Search Tags:Expert Data Mining Distributed Index Construction, PageRank, Algorithm Collaborative Filtering Algorithm
PDF Full Text Request
Related items