Research Of Quality Control Methods For SNPs Based On Clustering Algorithm

Posted on:2014-11-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Sun

Full Text:PDF

GTID:2250330425483701

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Single nucleotide polymorphism (SNP) has been used widely in biological research, servering as the third generation genetic marker. Genome-wide association studies (GWAS) apply SNP as genetic marker in case-control studies, in order to detect and locate genes that are correlated to complex diseases, helping to provide evidence for disease diagnosisã€individual treatmentã€medicine development, etc. SNP quality is the key factor for GWAS. In fact, the obtained SNP data is prone to error because of hardware or software problem during experiment. For these reasons, it is necessary to perform quality control process on SNPs.In this paper, the main work is to seek for effective SNP quality control methods in GWAS. There are three basic parameters to measure SNP data quality:genotyping call rate, minor allele frequency and HWE. The current quality control method is "supervised" expert filter which set the parametersâ€™thresholds manually. To deal with this problem, new quality metrics are reset to be more stringent. And two new quality control methods based on clustering algorithms are proposed in this paper.(1) Quality control method based on weighted fuzzy kernel clustering algorithm. There are several attributes for SNP dataset. Attributes impact differently between normal SNPs and noise SNPs cluster. In this paper, the weighted fuzzy kernel clustering algorithm is used to detect normal and noise SNPs by computing the imbalance between attributes. Compared to other clustering methods, this algorithm is especially suitable for high dimensional and non-sphere dataset. Results show that this method performs well.(2) Quality control method based on SNN clustering algorithm. For the problem of high dimension of SNP dataset, the filtering of SNPs can be done in two steps. Firstly, use principal component analysis to reduce data dimension and map the SNPs onto a two-dimensional floor plan. Secondly, run SNN clustering on this plan. SNN can find out clusters with different sizes, shapes and density in datasets with noise, and detect noise SNPs automatically. Experimental result shows the efficiency of this method.

Keywords/Search Tags:

Single nucleotide polymorphism, Genome-wide association study, Quality control, Clustering, Principal component analysis

PDF Full Text Request

Related items

1	Genome-Wide Interaction Study Of Single Nucleotide Polymorphisms
2	Research On Genome-wide Single Nucleotide Polymorphisms’ Interaction Detection Methods
3	An Application Of Tabu Table Based Negative Feedback Ant Colony Optimization Algorithm In Genome-wide Association Analysis
4	An Optimal Principal Component Regression For Genomic Control In Genome-wide Association Analysis
5	The Research On Epistasis Detection Algorithm In Genome-wide Association Study
6	Genome Wide Association Study Based On Convolutional Neural Networks
7	Association Studies On Missing Heritability Of Complex Phenotypes
8	A Multi-objective Ant Colony Optimization Algorithm For Genome-wide Association Studies
9	Screening Of Children’s Height-related Genes And Analysis Of Their Mode Of Action
10	Research On Genome-Wide High-Order Epistasis Identification Methods