Font Size: a A A

Research On Big Scholarly Data-based Paper Author Name Disambiguation And Literature Recommendation Method

Posted on:2023-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuFull Text:PDF
GTID:2568307103994599Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of science and technology has led people’s life into a better direction,all because of the diligent exploration of future technologies by researchers day and night.However,as the number of researchers continues to grow,the number of published scientific literature explodes every year,and this increasingly large content is generally referred to as Big Scholarly Data.Existing scholarly information retrieval websites can archive and provide access to the literature,but the traditional technical methods used in websites have begun to be incapable of facing such a huge data scale.Name disambiguation and literature recommendation are two important technical components in academic information retrieval websites.The former ensures that all kinds of literature are archived under the correct scholar’s name,and the latter can greatly reduce the time researchers spend on searching literature.Considering the reality of academic information retrieval websites,existing name disambiguation methods still have the following issues:(1)most of the methods are cluster disambiguation,which will overwrite the previous disambiguation results;(2)incremental disambiguation can retain the previous disambiguation results,but not fully use the features.Similarly,most of the content-based recommendation methods lead to a narrow recommendation range,and the research contents are too similar to inspire researchers.Based on the technical background as above,this thesis will conduct the following studies.(1)A multi-dimensional feature fusion-based incremental paper author name disambiguation method is proposed to address the problems of existing name disambiguation methods,such as difficulty in retaining disambiguation results and inadequate use of features.This paper first divides the solution process into two stages,name matching and paper archiving,based on the method principle and task requirements.In the name matching stage,the method uses a two-step name matching rule to improve the recall accuracy of the candidate authors of the papers to be disambiguated as much as possible while ensuring the recall rate.In the paper archiving stage,unlike the traditional methods that only extract features in a single dimension,the method integrates the temporal dimension information into the feature extraction process,combines the data mining algorithm to extract seven categories of features between the paper to be disambiguated and the candidate authors,and finally uses the Blending model integration method to further improve the performance of the model.Through experimental validation,the final recall ratio of name matching in the first stage reaches 99.72%,and the Weight F1 value of paper archiving in the second stage reaches 0.954,which improves the performance by nearly3% compared with other existing models,verifying the superiority of the proposed method.(2)Aiming at the problems of narrow recommendation range and ineffectiveness of existing literature recommendation methods,a literature recommendation method based on Entity Interaction Knowledge Graph is proposed.This paper first defines various interaction behaviors between users and items as entity interaction attributes according to the method principle,and divides the whole recommendation process into three steps based on the task requirements.In the first step,the triple of entity interaction facts containing user-item interaction behaviors is extracted from the dataset;in the second step,the triple of entity interaction facts is embedded into the specified dimensions using a translation embedding algorithm;in the third step,the items that users may be interested in are predicted by a link prediction method based on the similarity of user behaviors.The experimental results on the two datasets show that,using HR(Hit Ratio)as the evaluation metric,the method has a maximum of 6.6% and 2.4%,and an average of 5.7% and 0.6% improvement compared with other methods;using NDCG(Normalized Discounted Cumulative Gain)as the evaluation metric,the method has a maximum of 3.6% and 2.5%,and an average of 3.5% and 2.0%improvement compared with other methods.The experimental results verify the effectiveness of the proposed method in literature recommendation.
Keywords/Search Tags:Big Scholarly Data, Name Disambiguation, Literature Recommendation
PDF Full Text Request
Related items