Research And Application On Disambiguating Authors

Posted on:2021-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:N Li

Full Text:PDF

GTID:2428330620468179

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The continuous improvement of the level of informatization has accelerated the construction of digital libraries,which has greatly facilitated people's study and work.However,the rapid development of digital libraries has also encountered the problem of data fragmentation,resulting in low data quality and poor data availability.Author ambiguity,which means that different authors share the same name,is one of the typical problems in digital libraries.Author ambiguity seriously affects the content quality and service experience of digital libraries.Author disambiguation aims to identify different authors shared the same name and their published papers.Due to the massiveness,low quality,and interdependence of data,it is a challenging task to disambiguate authors.The current mainstream methods may be suboptimal to disambiguate authors because of the poor ability to express features and the introduction of low-quality relationships.Thus,the performance of author disambiguation methods can be significantly improved.This thesis has achieved the better performance for author disambiguation via improving the ability to express features,and reducing the negative impact of low-quality relationships between authors.The main contributions are as follows:�Disambiguating authors based on the fusion of multi-type features.To overcome the limitations of poor ability to express features and low-quality relationships introduced by undisambiguated collaborators,we propose an author disambiguation method namely CMFAD,which integrates both implicit and explicit features.Firstly,CMFAD designs a classifier that integrates multi-type features to predict the probability that two papers belong to the same author.To train the classifier,the feature set consists of both implicit and explicit ones,where the implicit features capture the semantics of paper titles and collaborative relationships via employing the models,and the explicit features are extracted manually.Then,CMFAD proposes a probabilistic reasoning mechanism to resolve the conflict of classification results.�Disambiguating authors in an incremental and unsupervised manner.Considering that the current mainstream methods capture low-quality relationships and ignore the higher collaborative relationships,we treat the author disambiguation as the reconstruction of collaboration network,and propose an incremental,two-stage and unsupervised author disambiguation method namely IUAD.Specifically,in the first stage,IUAD analyzes the effect of frequent collaborative relations,and then mines these relations to build a stable collaboration network,which takes full advantage of the higher collaborative information;in the second stage,IUAD designs a probabilistic generative model that utilizes the exponential distribution family to integrate the collaboration network topologies,research interests and research communities,which improves the recall well.In addition,for the newly published papers,IUAD does not need to retrain the model,and can disambiguate these papers incrementally.�Optimizing author disambiguation method based on labeled data.To further reduce the time consumption of IUAD in the stage of global collaboration network construction,we propose a method which optimizes our proposed method IUAD by introducing some labeled data,namely IIUAD.It makes full use of high-precision rules and labeled data to achieve more efficient candidate pairs pruning,which further improves the efficiency of our proposed method.

Keywords/Search Tags:

Author Disambiguation, Implicit Features, Explicit Features, Collaboration Network, Probabilistic Generative Model

PDF Full Text Request

Related items

1	Research On User Social Relationship Prediction Based On Location
2	Research On Disambiguation Of Same Authors In Academic Collaboration Network
3	A Study On Methods Of Author Name Disambiguation In Academic Literature
4	Hybrid Movie Recommendation Based On Deep Neural Networks And Neural Collaborative Filtering
5	The Research On Academic Paper Author Name Disambiguation
6	Research On Author Disambiguation In Scientific Literature
7	Research On Author Organization Features Of Malware On Windows
8	Research On Author Name Disambiguation Method Based On Network Representation Learning
9	A Research Of Frame Disambiguation Based On SVM And CRF Model
10	Research On Scholar Disambiguation Based On Heterogeneous Information Networks And Fine-Grained Features