Author name ambiguity is a common phenomenon in scientific and technological literature.The reason for this phenomenon is that many different authors share the same name,or an author’s name can be written in multiple forms.In the literature system,this phenomenon will lead to the wrong aggregation of literature by one author and other authors with the same name,which will affect the correct classification,st atistical analysis and knowledge mining of literature.Therefore,author name disambiguation has become a crucial task in the construction of document database.In recent researches on author name disambiguation algorithms,attributes of a paper are often used as the feature representation of the paper,but deep features and potential relationships are ignored.Therefore,how to use new feature extraction methods or model fusion technology to extract better feature representation of papers,so as to achieve more effective disambiguation of ambiguous author names has become the research focus of this paper.First of all,the existing methods obtain the feature representation of the literature by constructing the heterogeneous network of the literature.In order to solve the problem that semantic information is often ignored when constructing heterogeneous relation networks in literature,this paper proposes an author name disambiguation method based on network embedding and semantic features.A heterogeneous network is constructed to obtain the relational features among literatures,and then a semantic feature extraction module is added to integrate the relational features and semantic features to obtain more effective features.Then,a density-based clustering algorithm is used to cluster the features and assign the ambiguous documents to the correct author cluster.Secondly,in view of the insufficient feature extraction of existing author name disambiguation algorithms,this paper proposes a multi-feature fusion method based on capsule network,which uses supervised learning and unsupervised learning methods to learn the feature representation of literature.In the aspect of semantic feature extraction,a new feature extraction method is introduced,which uses the combination of OAG-BERT and Capsule Network to dig semantic features at a deeper level.Through model fusion technology,multiple features of literature are weighted and fused to obtain richer and more effective literature features and make full use of existing literature information.Then,the same clustering algorithm mentioned above is used to disambiguate the literature.Finally,the model proposed in this paper is verified experimentally on real data sets(AMiner and Cite Seer X).Compared with the baseline method,the proposed model is superior to the baseline method in author name disambiguation task. |