Font Size: a A A

Design And Implementation Of Key Modules In Paper Author Name Search System

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2518306605470164Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In academic paper search,searching by the author's name is a common method.However,due to the seriousness of the name duplicate phenomenon,when a user searches for a name,the system will return all the academic papers of different people with that name,and the user needs to filter the information they need by themselves.This phenomenon is called name ambiguity.Name ambiguity will affect the search quality,and it is not convenient for people to quickly get all the research results of a certain scholar.Therefore,the issue of name disambiguation has always been a hot topic for scholars at home and abroad.This paper comes from the actual project of the company.The background of the project is that the company needs to provide doctors insights to customers,including the number of doctors' papers in the Pub Med literature database.Therefore,it is necessary to complete the name disambiguation of the author in the Pub Med,that is,to determine the specific author of a paper belongs to Which doctor in China.Aiming at the above goals,this paper proposes a paper author retrieval system that completes name disambiguation,so that staff can retrieve the information they need by author's name.The main work of this paper includes the following three aspects:(1)Acquisition and processing of data required by the system.It mainly includes two kinds of data: doctor data and paper author data.This paper uses the Scrapy framework to complete the acquisition of multi-source doctor data,and completes the fusion of multisource doctor data,constructing a complete and unique doctor ontology.Then extract the author's personal information from Pub Med's paper information,which mainly includes the author's affiliation,department,zoning information,and personal e-mail address.A series of problems including filtering of English name authors,Chinese and English correspondence of affiliation,and standardization have been solved.(2)Disambiguation of the author's name.A two-stage name disambiguation method based on the combination of entity link disambiguation and author clustering is proposed.The name disambiguation of the author is completed by matching the author to the doctor ontology.First,filter candidates from the doctor's knowledge database based on the author's personal attributes,and then use word2 vec to train the department similarity calculation model to score the candidates to complete entity link disambiguation.Then,according to the author's co-author information,the authors who did not disambiguate in the previous stage were clustered together with the doctors of the same name,and more authors were disambiguated.(3)The design and implementation of the author's name retrieval system.This part first uses Elasticsearch to create a system index,which separates the search business.Then the Flask framework is used to implement the main functions of the paper name retrieval system.Including search term preprocessing,user authentication,and providing data modification for authenticated users,to simplify the use of staff.Based on feedback,this system can help users easily obtain information about the number of papers published by doctors,saving staff a lot of time,and by providing a user interface for data modification,it is convenient for staff to manually modify the disambiguation errors of the program.The closed-loop disambiguation of "program disambiguation-manual perfection" meets business needs.At the same time,the test results show that the accuracy rate of the disambiguation algorithm implemented in this paper has reached 79.6%,the recall rate has reached 83.6%,and all the modules of the system have also passed the test.However,there are still some shortcomings in this paper.For example,the name disambiguation process does not deal with the case where the author's name is abbreviated in pinyin.This is also a direction for further research.
Keywords/Search Tags:name disambiguation, co-author, information extraction, Chinese-English matching
PDF Full Text Request
Related items