Font Size: a A A

Research And System Implementation On Diagnosis Model For Schizophrenia Using SNP Data

Posted on:2020-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2404330596991445Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Schizophrenia is a chronic genetic disease that has a great impact on society because of its high incidence and long morbidity,and its pathogenesis,which is not fully known,is a big challenge for the entire medical field.While the genome-wide Association Study(GWAS)based on Single Nucleotide Polymorphism(SNP)has yielded significant results in the diagnosis of schizophrenia,it is hampered by its longtime cycle and dependence on a large number of samples.With the advent of the era of big data and the rapid development of data mining technology,researchers can use machine learning or deep learning to mine disease pathogenesis and design diagnostic models from a large amount of data.In this thesis,schizophrenia was taken as the main research object,and the selection method and diagnostic model of SNP were discussed.Firstly,data clustering and feature selection are carried out based on the improved fuzzy clustering algorithm.Then the proposed deep learning model is used for SNP classification.Finally,an intelligent diagnostic prototype system for schizophrenia is designed and implemented.The specific work is as follows:(1)Aiming at the problem that there are many SNP sites but most of them cannot represent the pathogenic mechanism,and the redundant features will cause "dimensional disaster",which will seriously affect the effect of the later diagnosis model,a new clustering method based on fuzzy clustering was proposed and applied in the SNP selection.On the one hand,SNP weight factor is introduced into the loss function of the fuzzy C-Means algorithm to solve the problem that the existing SNP clustering algorithm fails to consider the difference in importance of SNP sites;On the other hand,the key SNP neighborhood regularization term is proposed and introduced into the loss function of fuzzy clustering to solve the problem of the relationship between highly important SNP and others in its neighborhood.The experimental results show that the proposed clustering method has better convergence than others,and the performance of the SNP subset constructed based on the proposed algorithm is greatly improved compared with other methods in classification experiments using multiple classifiers.Among them,support vector machine is the best classifier in classification accuracy,with an average increase of 5.83% compared with the second best selection method MRMR,and so is decision tree in F1 score,with an average increase of 5.51%.(2)In this thesis,a new classification model Bi-SNP was proposed to solve the problem that the SNP data sequence is too long and the existing classification methods or models ignore the spatial distance and other information inside SNP sequences,which will increase the complexity of the model and reduce the classification effect.The model is designed based on bi-stream.On the one hand,the "sliding window sampling" method was adopted to reconstruct several shorter sub-sequences from raw long sequences,and then the LSTM model based on the attention mechanism was used for feature learning for each sub-sequence.On the other hand,a new data transformation method is proposed to turn each sample into a SNP-Chromosome mapping matrix and then a CNN model is used for local feature learning.The features learned from the previous two branches are integrated and transmitted to the LSTM model for further learning,and a random forest classifier makes the final prediction.The experimental results show that the Bi-SNP model with Attention mechanism has obvious advantages compared with others participating in the comparison.Compared with other best performing models Bi-Stream-CNN,has an average increase of 3.25% in classification accuracy and 4.36% in F1 score.(3)On the basis of the above research,this thesis has also completed the design and implementation of an intelligent diagnostic prototype system for schizophrenia based on SNP data.
Keywords/Search Tags:Schizophrenia, Single nucleotide polymorphism, Random forest, Support vector machine, CNN, LSTM, Deep learning
PDF Full Text Request
Related items