Font Size: a A A

Research On Transfer Learning-oriented Distant Supervision Relationship Extraction For Plant Phenotypes

Posted on:2024-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:S JiangFull Text:PDF
GTID:2530307064497064Subject:Engineering
Abstract/Summary:PDF Full Text Request
Phenomics is a systematic scientific method for studying the phenotype of organisms,which uses high-throughput technologies such as genomics,proteomics,metabolomics,etc.,to comprehensively analyze the phenotype changes caused by the interaction between genotype and environmental factors,in order to reveal the relationship between phenotype and genotype and their role in biological evolution,development,growth and adaptability mechanisms.Plant phenomics is the study of the physical and biochemical traits of plants at the whole plant level.Its ultimate goal is to understand the complex relationships between genotype,phenotype,and environment,and to use this knowledge to improve crop productivity,sustainability,and adaptability to environmental pressures such as drought,pests,and climate change.Currently,a major challenge in phenomics research is how to handle and interpret the large amount of data generated to understand the physiological and ecological characteristics of organisms.Large-scale phenotype data requires analysis and mining using statistical and machine learning methods,but these methods need to better adapt to the characteristics of phenotype data and accurately interpret and predict the physiological and ecological characteristics of organisms.Relation extraction,as an important natural language processing technology,can automatically extract relationships between phenotype features and genes from a large number of scientific literature,which helps to quickly discover the connections between phenotype features and genes and better understand the physiological and ecological characteristics of organisms.However,due to the lack of large-scale high-quality labeled datasets in the phenotype field,this poses challenges for the application of traditional supervised relation extraction methods.Distant supervision relation extraction,as a semisupervised learning method,uses known knowledge base information to train relation extraction models,thus reducing the burden of manual annotation of training data.This method can automatically generate a large amount of training data,effectively alleviating the problem of the lack of training data in the plant phenotyping field.However,distant supervision methods are still limited by the domain knowledge graph as the supervisory source.The focus of this paper is on how to achieve relation extraction in the plant phenotyping field without using any labeled data.To address this issue,we propose to combine transfer learning with distant supervision relation extraction,using data resources highly relevant to plant phenotyping in the biomedical field to train the relation extraction model in the plant phenotyping domain.The contributions of this paper are as follows:(1)A domain-adaptive distant supervision relation extraction model is proposed in this paper.By using adversarial training,the source domain distant supervision data and the target domain unsupervised data are mapped to the same feature space,achieving knowledge transfer from biomedical field to plant phenotype field.By relaxing the requirement of the knowledge base through transfer learning,this model solves the problem of lacking relevant supervision sources in the task domain,which to some extent expands the application scenarios of distant supervision methods.(2)A data denoising strategy for distant supervision methods in transfer learning scenarios is proposed.In order to align the source domain distant supervision data and the target domain unsupervised data in the feature space,a multi-instance method is utilized for denoising the distant supervision data.For unsupervised data,multiple sentences with the same entity pairs are aggregated based on the similarity between instances to highlight the significant relationships between entities,which helps the model to align the two sets of data in the feature space.(3)A multi-source domain-adaptive distant supervision relation extraction model is proposed in this paper.We combine multiple source domain adaptation techniques and extend the model to the multi-source scenario through multi-way adversarial training,solving the problem that source domain data in transfer learning may come from multiple different domains and achieving distant supervision relation extraction in multi-source transfer learning scenarios.In summary,this paper provides solutions to the problem of lacking training data and supervision sources in the plant phenotype field through single-source and multisource transfer learning methods.Experimental results show that our proposed methods outperform corresponding baseline methods.
Keywords/Search Tags:Distant Supervision, Relation Extraction, Plant Phenotype, Transfer Learning
PDF Full Text Request
Related items