| Genome-wide association studies (GWAS) were popular for identifying genetic variants which were associated with disease risk. At present, many approaches have been proposed to test multiple single nucleotide polymorphisms (SNPs) in a region simultaneously considering disadvantages of methods in single locus association analysis. The analysis method based on SNP set was relatively common, such as the kernel function method, principal component analysis, partial least squares method and supervised principal component analysis. However, these types of dimension reduction methods had their respective advantages and limitations. Given the loss of power caused by low minor allele frequencies (MAF), we conducted an extension work on PCA and proposed a new method called weighted PCA (wPCA). Comparative analysis was performed for weighted principal component analysis (wPCA), two logistic kernel machine based test (LKM) and principal component analysis (PCA) based on SNP set in order to investigate the advantages of the weighted method.In this study, we used the three methods to perform two analyses based on simulated data sets and a real data set, and we also applied the four methods to analyze two SNP sets extracted from a real GWAS dataset of non-small cell lung cancer in Han Chinese population. The main contents are as follows:(1) Simulation datasets were generated by the program written by ourselves with different LD and MAF structures. Simulations were used to demonstrate the validity of the weighted analysis method in identifying main effects.(2) We generated simulated data based on the phased haplotypes of CHB samples from the website of the International HapMap project (HapMap Data Rel 24 Phaseâ…¡,Nov08, on NCBI B36 assembly, dbSNP b126). Simulations were used to demonstrate the validity of the weighted analysis method in identifying main effects again.(3) We also applied the weighted methods to a real GWAS dataset of non-small cell lung cancer in Han Chinese population.The main results of the study are as follows:(1) when the MAF of the causal SNP is low, weighted principal component and weighted IBS are more powerful than PCA and IBS kernel at different LD structures and different numbers of causal SNPs, and the wPCA is the most powerful method.(2) Application of the three methods to a real GWAS dataset indicates that wPCA and wIBS are powerful than no-weighted methods, and the wPCA is also the most powerful method. |