Font Size: a A A

A Study On The Fusion Of Multidimensional Data In Tumor-Related Research And Its Applications

Posted on:2024-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J LengFull Text:PDF
GTID:1524307094976489Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The tumor is a difficult problem that poses a great threat to human health and is an important challenge currently faced by medical research organizations and the public health field worldwide.With the rapid technological development,numerous precision instruments in the biomedical field have been innovated,and high throughput sequencing technology has made a qualitative leap.Thus,these technologies promote the generation of multi-omics and multi-attribute drug data.Multi-dimensional data fusion methods based on artificial intelligence algorithms can assist scientists in comprehensively studying complex cancer pathogenesis,exploring the interrelationships between related biomolecules and their functions,and solving the problem of anti-tumor drug response prediction in biomedical research with a low-cost and time-consuming approach.However,multi-dimensional data are complex,high-dimensional,and heterogeneous.Meanwhile,how to extract valuable information from these data and use them for biomedical problem-solving has become a greater challenge at present.In this study,we investigate multi-omics and multi-attribute drug data fusion characterization techniques and establish and evaluate the current representative multi-omics data fusion methods based on deep learning.In the meantime,we propose two machine learning methods with interpretability to help drug response prediction research.These methods can facilitate drug response prediction studies such as sensitivity prediction and synergy prediction of antitumor drugs,and help clinicians better develop personalized therapy plans for patients.In the second section of this study,we have conducted a comprehensive survey and summary of tumor multi-omics data fusion methods based on deep learning in recent years and grouped these methods into six major categories.A benchmark evaluation method is also established.We selected sixteen representative models from these six categories and classified these methods into supervised and unsupervised models.And using supervised algorithms to perform classification tasks and unsupervised algorithms to perform clustering tasks.The data fusion ability,feature representation,and interpretability performance of the models are evaluated on three different types of multiomics datasets using their respective evaluation metrics,and finally,we introduced a unified score to evaluate these methods.Our benchmark study provides bioinformaticians with guidance on the selection of multi-omics data fusion techniques and methods.Based on the second chapter,we extend and apply the multi-omics data fusion techniques in the third and fourth sections of this study.First,multi-attribute drug characteristic data such as molecular fingerprints and physicochemical properties of antitumor drugs are added to the multi-dimensional data.We proposed a dataset containing multi-omics data and multi-attribute drug data.Meanwhile,we use artificial intelligence computing methods to conduct anti-tumor drug response prediction studies on this dataset to provide guidance for personalized therapy.In the third section of this study,we propose a model named Cascade Synergistic Deep Forest(CSDF)based on the deep forest algorithm framework for anti-tumor drug sensitivity prediction.We are committed to reducing the treatment burden on patients and providing clinicians with more specific and effective personalized therapy plans.This is a multi-attribute drug data fusion algorithm for antitumor drug sensitivity prediction,and a dual data cascade collaborative structure algorithm is designed based on the deep forest framework to construct augmented features and adjust training samples in real-time,supplemented with weighted decisions.The enhanced feature part enhances the learning of drug data by each layer of learners.The strategy of real-time adjustment of training samples according to the data distribution and training errors can enhance the recognition accuracy of the model for small classes of significant samples(synergistic classes).In addition,the model is integrated with decision trees as the base unit to ensure its good interpretability basis.Meanwhile,we build a model named Genetic Programming-based Feature Establishment algorithm(GPFE)to perform anti-tumor drug synergy prediction in the fourth section of this study.We are committed to improving treatment efficacy,reducing drug resistance and toxic side effects,and providing a basis for personalized therapy.This is a multi-dimensional drug data fusion algorithm for antitumor drug synergy prediction.We apply binary operators to construct drug feature pairs and design a two-layer feature selection algorithm for extracting high-value features.Then,we construct integration rules that take into account model performance and variability.The feature construction method associates two drug feature data to make the model better adapted to the multidrug synergy prediction problem.In the two-layer feature selection algorithm structure,the first layer is used to remove redundant features and enhance the generalization ability of the model,while the second layer extracts important features from different modal data based on genetic programming algorithms to achieve collaborative inference of multiple multi-dimensional data in the prediction task while ensuring the interpretability of the feature extraction process.The integration rules designed in this study enhance the integration effect and further improve the model accuracy by improving the base learner performance and enriching the base learner diversity.
Keywords/Search Tags:Tumor, Multi-dimensional data, Personalized therapy, Drug response prediction, Deep forest, Genetic programming
PDF Full Text Request
Related items