Font Size: a A A

Application Of Data Mining In Protein Post-translational Modification,Disease Diagnosis And Prognosis

Posted on:2018-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:T HouFull Text:PDF
GTID:1314330515975746Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
In the field of biomedicine,there are endless data on transcriptome,proteomics,and clinical records of patients based on omics.How to dig out novel and useful information from massive biomedical data to reveal biomedical mechanisms has become one of the hot spots of concerns.Though data mining has been widely used in all directions of bioinformatics,there are still a variety of challenges and opportunities to face.In this study,data mining was used to analyse post-translational modification,disease diagnosis and prognosis based on the level of protein,microRNAs and clinical data.In this subject,we collected the lysine acetylated data as much as possible,the homology reduction of peptides was carried out at the level of proteins and peptides.Many biological characteristics of acetylated sites have been investigated,such as the amino acid physicochemical property(AAPP),position-specific symbol composition(PS SC)and transition probability matrix(TPM).A novel lysine acetylation prediction system named LAceP was constructed.Compared with other methods,LAceP had the higher accuracy,stability and wider application that because it can predict and analyze the acetylated sites of various organisms.In addition,in order to facilitate the use of biological scientists,LAceP was made into an open and free web server,and the users could enter the sequence online to do predictive analysis quickly and easily.LAceP provided a novel analytical method for protein acetylation,which helps researchers to understand the mechanism of protein interactions.The method of high throughput sequencing provides new research ideas for the diagnosis of certain diseases.In this paper,based on the high-throughput sequencing technology of microRNAs,we proposed a novel diagnostic method based on two-layer logistic regression model for HBV-related diseases.A total of nine effective plasma microRNA biomarkers were selected through sample collection,data processing,model selection,feature selection and model optimization to distinguish HBV-related chronic hepatitis and cirrhosis samples as well as healthy controls.The first layer utilized three microRNAs to distinguish HBV-related disease samples from healthy controls.Then the second layer divided the HBV-related disease samples into cirrhosis and chronic hepatitis samples by using eight microRNAs.The test on two independent cohorts showed high accuracy and robustness of our model.Functional analysis of the selected microRNAs and their target genes confirmed that they were significantly associated with HBV-related diseases and related functional pathways.Disease prognosis is one of the most talked about topics in the field of biomedicine except the disease diagnosis.In addition to the patient's physical fitness,the factors affecting the prognosis included the treatment,disease conditions,social life and other factors.Patients of age 50 or younger with stage I endometrioid adenocarcinoma(EEAC)were explored from the Surveillance,Epidemiology and End Results program database over the last decade.The propensity score matching and some statistical methods were used to do data mining.The effects of ovarian preservation or oophorectomy on the prognosis of young patients were analyzed retrospectively.The results showed that patients with ovarian preservation significantly tend to be younger at diagnosis and more likely diagnosed as earlier stage,to have better differentiated tumor tissues and smaller tumors,as well as less likely to undergo radiation and lymphadenectomy.After propensity score matching,the differences of all characteristics between ovarian preservation and oophorectomy were not significant and potential confounders in the two groups were decreased.At the same time,much noise was removed by the randomization process of propensity score matching.Multivariate statistical analysis of the data after noise reduction showed that there was no significant difference in overall and cancer-specific survival between ovarian preservation and oophorectomy.Ovarian preservation is safe for young women with stage I EEAC,the patients can consider accepting a more conservative treatment to maintain their normal quality of life in case of ensuring the treatment outcome.This study has a certain guiding significance to disease diagnosis and treatment.Overall,in this study data mining was used to analyse protein,microRNA and clinical medical data.A novel lysine acetylation prediction system named LAceP with high accuracy and good stability was put forward and the web server of LAceP had a certain practicality.Based on the two-layer model,nine microRNA markers were used to diagnose HBV-related liver diseases with high accuracy and robustness,which could distinguish HBV-related chronic hepatitis and cirrhosis clearly and had certain clinical application value.In addition,based on the propensity score matching algorithm,the proposed suggestion that stage I young endometrioid adenocarcinoma patients preserving ovaries was safe had a good clinical significance.
Keywords/Search Tags:Data mining, Acetylation, HBV related diseases, microRNAs, Endometrioid adenocarcinoma
PDF Full Text Request
Related items