Bibliometric Analysis And Name Disambiguation Research Based On Knowledge Clustering

Posted on:2017-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Li

Full Text:PDF

GTID:2348330488997133

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

There have been two problems in the research of knowledge analysis field. On one hand, the large scale and the coarse classification granularity of resources in literature knowledge bases lead to disorientation and overloading when learners retrieve and read literatures, on the other hand, author name disambiguation is a very important and complex research topic, during the retrieval and research of literatures, the quality of the investigation results has been reduced because of the high probability of different authors sharing the same name, which lengthens the whole cycle of the scientific research.Accordding to these two problems, this thesis proposes a mechanism of knowledge clustering and knowledge statistics based on Map Reduce and a mechanism of author name disambiguation based on the fusion of multiple features to solve these two problems in the field of knowledge analysis field. The following three points are a summary of the contents.(1) This thesis proposes a mechanism of knowledge clustering and knowledge statistics based on Map Reduce, which contains two algorithms, one is Map Reduce-based Co-occurrence Matrix building algorithm(MR-Co Matrix) and the other is Map Reduce-based knowledge Statistics(MR-Statistics). MR-Co Matrix is used to build knowledge clustering tree and MR-Statistics is used to make statistics of knowledge properties.(2) This thesis proposes a mechanism of author name disambiguation based on the fusion of multiple features, which contains three steps. First, a single feature similarity detection algorithm(SFSD) is proposed to compute the degree of similarity between two features of a literature and to get the threshold value. Then, SFSDD is proposed to realize the preliminary SFSD-based disambiguation algorithm(SFSDD). Furthermore, an author name disambiguation algorithm based on the fusion of multiple features(NDFMF) is proposed to disambiguate author names.(3) This thesis designs and builds the knowledge analysis system. First, it introduces the architecture of the system in detail, second, it introduces the design goal and target user of the system, last, it introduces the concrete realization process of the system in modularization.

Keywords/Search Tags:

knowledge clustering, co-occurrence matrix, Map Reduce, name disambiguation, the fusion of multiple features

PDF Full Text Request

Related items

1	Design And Implementation Of Image Retrieval System Based On Multi-feature Fusion
2	The Research On Fuzzy Clustering Alogrithms With Noise Immunity And Its Application
3	Research On Saliency Detection Based On Multiple Features Fusion And Low Rank Representation
4	Image Classification Based On HSV And Texture Features
5	Research On Scholar Disambiguation Based On Heterogeneous Information Networks And Fine-Grained Features
6	The Research And Application Of Name Disambiguation Algorithm Based On Multi-level Clustering
7	Based On Semi-supervised Clustering Diagram Experts Disambiguation
8	Research On Name Disambiguation Method For Author Retrieval Of Sci-tech Literature
9	Technology Study On The Retrieval And Recognition Of Seawater Pearl Based On The Integration Of Multiple Features
10	Study On Several Basic Issues Of Color Image Processing