Research On Incremental Thesis Homonym Disambiguation Method Based On Pre Training Model And Decision Tree

Posted on:2023-12-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Z Zheng

Full Text:PDF

GTID:2568306848962209

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology,the scale,storage method,and acquisition method of information have undergone great changes,and various academic search engines have also appeared.These search engines have also become the main way for scholars to obtain various paper information.Although these search engines have brought great convenience to scholars,there is still a phenomenon that documents with the same name author are not assigned to the correct author,which makes the retrieval of documents by name less accurate.In recent years,a large number of scholars have conducted research on the disambiguation of the same name,but there are still problems such as underutilization of paper information and neglect of new papers.Starting from the two directions of incremental disambiguation and making full use of paper information,with the purpose of making full use of information and paying attention to newly added papers,this paper studies the problem of disambiguation with the same name of authors.The main work is as follows.First of all,this paper proposes a feature extraction method based on the combination of XLNet pre-training model and artificially defined rules to solve the problem of insufficient utilization of paper information.The method first uses artificially defined features to extract the information of the author’s name,institution and other fields in the paper,uses XLNet to extract the information of the paper’s title,abstract and other fields,and then uses XGBoost and the extracted features to predict the correct author that each paper should belong to.Finally,the comparative experimental results on the constructed dataset show that the proposed framework outperforms the comparative methods in incremental disambiguation.Secondly,this paper proposes a cold-start disambiguation method based on agglomerative hierarchical clustering to solve the problem that incremental disambiguation cannot assign all papers.This method is placed after the incremental disambiguation method.papers are post-processed.The method performs agglomerative clustering of unsuccessfully assigned papers,and then adds papers to the main cluster through incremental disambiguation to obtain the main cluster as a new author.Finally,the comparative experimental results on the constructed dataset show that the cold-start disambiguation framework proposed in this paper can make the final incremental disambiguation results better.Finally,this paper combines the AMiner data set with DBLP to construct a new data set for the experiments in this paper.The final experimental results prove the feasibility of the incremental disambiguation algorithm proposed in this paper.

Keywords/Search Tags:

Author Name Disambiguation, Incremental Disambiguation, XLNet, XGBoost, Clustering

PDF Full Text Request

Related items

1	The Research On Academic Paper Author Name Disambiguation
2	Research On Author Disambiguation In Scientific Literature
3	The Research Of Chinese Author Name Disambiguation Based On Hierarchical Clustering
4	Graph Neural Network Based Author Name Disambiguation
5	Research On Author Name Disambiguation Algorithm Of Scientific And Technological Papers
6	A Study On Methods Of Author Name Disambiguation In Academic Literature
7	Research On Author Name Disambiguation In The Literature Database
8	Research On Name Disambiguation Method For Author Retrieval Of Sci-tech Literature
9	Design And Implementation Of Author Name Disambiguation System Based On Two Step Clustering
10	Author Name Disambiguation Based Rule And Graph Model