Research On Key Technologies For Building Chinese Biomedical Expert Database Based On Large-Scale Literature Mining

Posted on:2021-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:A Y Tang

Full Text:PDF

GTID:2480306548993989

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Biomedicine is an important field related to the level of human medical diagnosis and life health.Biomedical literature,as an important information carrier of the biomedical field,records the research and development of biomedical technology.In recent years,the number of documents in the biomedical field has grown exponentially.For the new scholars who are eager to join the field of biomedicine,they need to conduct extensive data collection,reading,statistics and analysis to find outstanding scholars and groups.Facing the huge amount of biomedical literature,the existing online digital literature library supports the document retrieval based on author-related information(name,institution,email address)and literature information(abstract,title),but it often faces the situation that the author has the same name and the same homonym,which is easy to cause ambiguity.Author Name Disambiguation(AND)is a prerequisite for constructing a domain expert graph.Author name disambiguation refers to grouping the collection of documents of the same name in the real world.Grouping the literature of the same name authors quickly and accurately can improve the data management efficiency of online literature data libraries and the accuracy of data obtained by users.The main goal of this thesis is to improve the accuracy of author name disambiguation and the speed of author disambiguation with the large-scale biomedical literature,and to build a database of experts in the field of biomedicine.1.Disambiguation method of the same name authors based on literature mining.By digging information such as the author’s name,institution,co-authors,and abstract of the literature,we modeled the characteristics of the pairwise documents of the same author.In the modeling process,this paper comprehensively considers multiple expressions of document features,and additionally introduces domain information related to the biomedical field as auxiliary features for modeling.2.Parallel acceleration method for author disambiguation on large-scale literature.For the large-scale of biomedical literature,the author name disambiguation calculations are expensive and time-consuming.In order to improve the calculation efficiency,this thesis proposes a parallel acceleration method based on the same name collection and a parallel acceleration method based on the literature pairwise similarity matrix.In practice,a hybrid acceleration strategy combining two methods is used.3.Building a database of Chinese experts in the field of biomedicine and implement a well-known expert discovery system.Using the disambiguation result of the author name,the author’s field of each publication is generated,and the score Top5 fields are obtained as the author’s field.Combining the author’s basic information(institution,email,co-author,etc.),we build a domain expert database,and finally build an expert discovery system.In order to measure the effectiveness of the author name disambiguation model,the validity of the model is verified on a public dataset.At the same time,in the parallel acceleration part,the hybrid parallel strategy is used to ensure the parallel efficiency and effectively reduce the computing time.

Keywords/Search Tags:

Biomedical Literature, Author Name Disambiguation, Parallel Computing, Expert Database Building

PDF Full Text Request

Related items

1	Research On Biomedical Word Sense Disambiguation Based On Graph Attention Network
2	Extraction Of Biological Entity Relation Based On Literature Mining And Its Application
3	Research On Biomedical Word Sense Disambiguation Based On Attention Neural Network Model
4	Research On Extracting Causal Relationships From Biomedical Literature
5	Research On Supporting Author Identification In The Field Of Biology Based On Author Contribution Statements
6	Marine Environmental Numerical Prediction Data Processing Method Based On The Construction Of Synergistic And Parallel Computing
7	Research On Methods Of Topology Parallel Checking Of Cadastral Database Based On CUDA
8	Research On Efficient Parallel Computing For Ground Water Flow Simulation
9	Research On Pre-training Language Model For Biomedical Literature Mining
10	Construction from biomedical literature, analysis and visualization of mammalian regulatory intracellular networks