Font Size: a A A

Bibliometric Analysis And Name Disambiguation Research Based On Knowledge Clustering

Posted on:2017-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y P LiFull Text:PDF
GTID:2348330488997133Subject:Software engineering
Abstract/Summary:PDF Full Text Request
There have been two problems in the research of knowledge analysis field. On one hand, the large scale and the coarse classification granularity of resources in literature knowledge bases lead to disorientation and overloading when learners retrieve and read literatures, on the other hand, author name disambiguation is a very important and complex research topic, during the retrieval and research of literatures, the quality of the investigation results has been reduced because of the high probability of different authors sharing the same name, which lengthens the whole cycle of the scientific research.Accordding to these two problems, this thesis proposes a mechanism of knowledge clustering and knowledge statistics based on Map Reduce and a mechanism of author name disambiguation based on the fusion of multiple features to solve these two problems in the field of knowledge analysis field. The following three points are a summary of the contents.(1) This thesis proposes a mechanism of knowledge clustering and knowledge statistics based on Map Reduce, which contains two algorithms, one is Map Reduce-based Co-occurrence Matrix building algorithm(MR-Co Matrix) and the other is Map Reduce-based knowledge Statistics(MR-Statistics). MR-Co Matrix is used to build knowledge clustering tree and MR-Statistics is used to make statistics of knowledge properties.(2) This thesis proposes a mechanism of author name disambiguation based on the fusion of multiple features, which contains three steps. First, a single feature similarity detection algorithm(SFSD) is proposed to compute the degree of similarity between two features of a literature and to get the threshold value. Then, SFSDD is proposed to realize the preliminary SFSD-based disambiguation algorithm(SFSDD). Furthermore, an author name disambiguation algorithm based on the fusion of multiple features(NDFMF) is proposed to disambiguate author names.(3) This thesis designs and builds the knowledge analysis system. First, it introduces the architecture of the system in detail, second, it introduces the design goal and target user of the system, last, it introduces the concrete realization process of the system in modularization.
Keywords/Search Tags:knowledge clustering, co-occurrence matrix, Map Reduce, name disambiguation, the fusion of multiple features
PDF Full Text Request
Related items