| Background and Objective: Multiple myeloma(MM)is a highly malignant hematological tumor with a high recurrence rate.The current research indicates that abnormally expressed long non-coding RNA(lncRNAs)have carcinogenic and/or tumor inhibitory effects in the occurrence and development of tumors,which means that they have the potential as novel independent biomarkers for tumor diagnosis and prognosis.However,the significance of lncRNA characteristics based on expression profiles for prognosis prediction of patients with multiple myeloma(MM)has not been studied.The aim of this study was to identify prognostic-specific lncRNA based on GEO myeloma chip data,providing potential prognostic markers for patients with myeloma.Gene expression data for myeloma patients were obtained from the GSE4581 and GSE57317 datasets in the Genome Expression Omnibus(GEO)database,and predictive models were constructed and validated using Cox regression analysis,Kaplan-Meier analysis,and receiver operating characteristic(ROC)analysis,using single-sample gene set enrichment(ss GSEA)and Kyoto Encyclopedia of Genes and Genome(KEGG)analyses to predict the function of specific lncRNA.Analysis of the data set identified key genes and was used to construct a risk scoring system for myeloma prognosis that stratified patients with different survival rates in the training set into high-risk and low-risk groups.Results were validated by test set,entire test set,external validation set,and myeloma subtype.This study provided evidence that key genes in the model can be used as markers to predict the prognosis of myeloma and become potential targets for clinical diagnosis,and cell function studies further indicated that key genes in the model may be involved in the progression of myeloma.This study has provided a new perspective for the clinical diagnosis and treatment of MM,as well as a new direction for analyzing the potential mechanisms of the occurrence and development of myeloma.Methods:1.Download the gene expression data sets(GSE4581 and GSE57317)and corresponding probe sequences of highly purified bone marrow plasma cells in patients with myeloma in the GEO database,download the latest lncRNA reference sequence and gtf file in the Gencode database,and extract the expression data and gene expression data of the probe as well as the follow-up information of the sample in combination with the above files.GSE4581 was randomly divided into a training set and an internal verification set in a 1: 1 ratio,two groups containing 127 samples and 128 samples,respectively.GSE57317 was used as the external verification set containing 55 samples.2.The univariate Cox proportional risk regression analysis was performed on the re-annotated lncRNA expression levels and survival data in the training set samples by the R-packet survival Coxph function(p <0.01 as the threshold),and the part of lncRNAs;with the most significant prognosis was screened out after sequencing.3.rbsurv analysis was performed on 75% samples randomly selected from the training set samples,and the maximum gene number was selected as 30 using triplicate difference verification to conduct 1000 times of rbsurv analysis,the dimension reduction result of each time was finally summarized,the times of each probe in the 1000 times was counted,and the standard deviation of these lncRNA probes was calculated respectively to screen out LNC RNA whose standard deviation was greater than the median standard deviation of all probes and whose frequency was greater than 300;4.Kaplan-Meier survival curve analysis was performed on the selected key lncRNA to screen out the lncRNA related to the survival of tumor patients,and multivariate Cox survival analysis was performed on the lncRNA with the KM curve p<0.05,and the lncRNA with the lowest AIC value was reserved as the final model.5.respectively calculating the Risk Score of each sample according to the expression level of the samples,drawing the risk score of the samples,and calculating the expression change of the screened key lncRNAs along with the increase of the risk value;The ROC analysis of prognostic classification of Riskscore was performed using time ROC of R software package,and the classification efficiencies for prognostic prediction in one year,three years and five years were analyzed,as well as the zero-mean normalization(z-score normalization)of Riskscore.Samples with Risk Score greater than zero after zscore modeling were divided into high-risk group and sample low-risk group with risk less than zero,and KM curves were drawn.The above analysis was performed on the verification set samples and the training set and all samples of the verification set using the same model and the same coefficients as the training set 6.The score of each sample in the training set on different functions was calculated using the R software package GSVA,the correlation between these functions and Risk Score was calculated,the biological functions with the correlation greater than 0.3 were selected,and the risk score distribution of the samples was plotted.7.The 13 immune scores of the above key lncRNAs were calculated according to the method described in PMID:28428277,and the significant differences of the immune scores in the high and low risk samples of the training set were further analyzed.8.The samples were divided into seven subtypes according to the clinical information,and the risk scores of the above key lncRNAs were analyzed to predict the efficacy in different subtypes.9.Using the same model and the same coefficient as the training set,we calculated the Risk Scores of all samples of the external data set according to the expression levels of the samples,and drew the risk score distribution of the samples;ZSCO core was performed on Riskzzscore.Samples with Riskscore greater than zero after ZSCO core transformation were divided into high-risk group and sample low-risk group with risk score less than zero,and KM curve was drawn.10.Two published risk models,one being 16-16-gene signature(PMID :31612041)and the other being 6-6-gene signature(PMID: 31357080),were selected and compared with the risk models constructed in this study,and the corresponding genes in the models were constructed according to this study.Multivariate cox regression analysis was used to recalculate the risk scores of the training set samples and evaluate the ROC of the two models.According to the optimal threshold,the samples were divided into high-risk and low-risk groups,and the difference in OS prognosis between the two groups was calculated.11.Detection of expression of genes corresponding to the above models in bone marrow samples of patients with multiple myeloma and healthy control samples by q RT-PCR assay,and alteration of expression of these genes in multiple myeloma cells The CCK-8 and plate cloning assay were conducted to analyze the effect of expression on cell proliferation.Results: 1.The microarray data(GSE4581 and GSE57317)downloaded from the GEO database are re-annotated into 4094 lncRNA data after being compared with the latest lncRNA reference sequence and gtf file in the Gencode database;2.After performing univariate Cox proportional risk regression analysis on the 4094 re-annotated lncRNA expression levels and survival data in training set samples by using R-package survival Coxph function(p <0.01 as the threshold),72 lncRNA with significant prognosis were obtained,and the first 20 lncRNA with the most significant prognosis were selected for subsequent analysis;3.rbsurv analysis was performed on 75% samples randomly selected from the training set samples,and the maximum gene number was selected as 30,which was used for 1000 times and three-fold difference verification was conducted.It was found that the appearance frequency of most probes was around 10%.After the standard deviation of these lncRNA probes was calculated,11 lncRNAs;with the standard deviation greater than the median standard deviation of all probes and the frequency greater than 300 were selected;4.Eleven lncRNAs derived from rbsurv analysis.Eight survival-related lncRNA s of tumor patients were obtained through Kaplan-Meier survival curve analysis and screening,and seven lncRNA s for constructing prognosis model were further determined through Cox survival analysis.5.After the risk scores of each sample were calculated by the gene expression levels in the chip samples,the Risk Score distribution map was drawn,and the high expression and high risk correlations were identified among C5orf17,AC092718.2,AC108002.2,AL033530.1,AL589765.7,and TSPOAP1-AS1,and high expression and low risk correlations were identified for MIRI 194-2HG.Further ROC analysis of prognosis classification showed that the prognosis model we constructed had high AUC under-line area,where the AUC values were 0.71 in five years.After zscore-processing Riskscore,the samples were divided into 53 high-risk groups,74 low-risk groups and KM curve was drawn,and the extremely significant difference was found between the two groups.All samples in the verification set and the training set and the verification set adopted the same parameters and model analysis to obtain basically consistent results.6.Single-sample GSEA analysis showed that most biological functions were negatively correlated with the risk score of the samples,and a few biological functions were positively correlated with the risk score,and 18 KEGG Paths with the correlation greater than 0.3 were selected.7.By calculating the 13 immune scores of the above key lncRNAs,it was found that only IF_I and Cytolytic showed significant differences in the high and low risk groups.Further analysis of the significant differences in immune scores among the high and low risk samples in the training set showed that IF_I and riskscore showed a significant negative correlation.8.There are significant differences between risk scores of 8-and 7-lncRNA prognosis models and the prognosis of seven subtypes,among which,the prognosis of PR subtype is the worst,the prognosis of CD1,CD2,HY and LB subtypes is similar,and the prognosis is good,while the prognosis of MF and MS subtypes is in the middle 9.Analysis results of external data sets using the same model and the same coefficients as those in the training set showed that the OS of samples with high model Riskscore was significantly smaller than that with low score,and ROC curve analysis showed that the AUC value for five years was 0.76.After zscore-processing the Risk Score,the samples were divided into high-risk group and low-risk group,and KM curve was drawn to obtain a result that was consistent with the training set;10.The ROC and KM curves of the 10-and 7-lncRNA prognostic models compared with the 2 published associated risk models 16-16-gene signature(PMID:31612041)and 6-6-gene signature(PMID: 31357080)showed a 3-year AUC of 0.83,p < 0.0001;OS prognostic results from the 6-gene signature analysis showed a 1-year AUC of 0.71,but no significant prognosis p=0.211.11.Results of q RT-PCR assay showed that MIR194-2HG was low-expression and C5orf17 was high-expression in bone marrow samples of patients with multiple myeloma.The CCK-8 and plate clone formation experiments showed that cell proliferation was significantly effected after the down-regulation of C5orf17 and MIR194-2HG in multiple myeloma cells.Conclusion: In summary,our experimental results prove that the 7-lncRNA prognostic risk model can predict the prognosis of myeloma well,and the in vitro experiments further indicate that this 7-lncRNA may be involved in the progression of myeloma. |