Font Size: a A A

Research On Key Technologies For Building Chinese Biomedical Expert Database Based On Large-Scale Literature Mining

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:A Y TangFull Text:PDF
GTID:2480306548993989Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Biomedicine is an important field related to the level of human medical diagnosis and life health.Biomedical literature,as an important information carrier of the biomedical field,records the research and development of biomedical technology.In recent years,the number of documents in the biomedical field has grown exponentially.For the new scholars who are eager to join the field of biomedicine,they need to conduct extensive data collection,reading,statistics and analysis to find outstanding scholars and groups.Facing the huge amount of biomedical literature,the existing online digital literature library supports the document retrieval based on author-related information(name,institution,email address)and literature information(abstract,title),but it often faces the situation that the author has the same name and the same homonym,which is easy to cause ambiguity.Author Name Disambiguation(AND)is a prerequisite for constructing a domain expert graph.Author name disambiguation refers to grouping the collection of documents of the same name in the real world.Grouping the literature of the same name authors quickly and accurately can improve the data management efficiency of online literature data libraries and the accuracy of data obtained by users.The main goal of this thesis is to improve the accuracy of author name disambiguation and the speed of author disambiguation with the large-scale biomedical literature,and to build a database of experts in the field of biomedicine.1.Disambiguation method of the same name authors based on literature mining.By digging information such as the author's name,institution,co-authors,and abstract of the literature,we modeled the characteristics of the pairwise documents of the same author.In the modeling process,this paper comprehensively considers multiple expressions of document features,and additionally introduces domain information related to the biomedical field as auxiliary features for modeling.2.Parallel acceleration method for author disambiguation on large-scale literature.For the large-scale of biomedical literature,the author name disambiguation calculations are expensive and time-consuming.In order to improve the calculation efficiency,this thesis proposes a parallel acceleration method based on the same name collection and a parallel acceleration method based on the literature pairwise similarity matrix.In practice,a hybrid acceleration strategy combining two methods is used.3.Building a database of Chinese experts in the field of biomedicine and implement a well-known expert discovery system.Using the disambiguation result of the author name,the author's field of each publication is generated,and the score Top5 fields are obtained as the author's field.Combining the author's basic information(institution,email,co-author,etc.),we build a domain expert database,and finally build an expert discovery system.In order to measure the effectiveness of the author name disambiguation model,the validity of the model is verified on a public dataset.At the same time,in the parallel acceleration part,the hybrid parallel strategy is used to ensure the parallel efficiency and effectively reduce the computing time.
Keywords/Search Tags:Biomedical Literature, Author Name Disambiguation, Parallel Computing, Expert Database Building
PDF Full Text Request
Related items