| According to biology central dogma, eukaryotic gene expression process mainly includes transcription, splicing and translation. Gene splicing is that introns are spliced from pre-mRNA and exons are joined together to generate mature mRNA, which is the pivotal step between transcription and translation. Multiple different mature mRNAs can be generated from a gene through alternative splicing, which can code different proteins. Over 95% human genes can be with alternative splicing.There are different gene alternative splicing events, among which cassette exon is the most common splicing type, where a exon can be contained within different mature mRNAs. How to identify cassette exons can be helpful to understand gene splicing regulatory mechanism. In this study, we will study genomics features defining cassette exons at genomic level, where arranging and analyzing the genomics features of constitutive exons and cassette exons, and then constructing classifier based on statistically significant features, and applying such classifier to identify cassette exons to explorer gene splicing regulatory mechanism.Firstly, different from researchers studied on exons of different animal organizations by analyze exons length, content of GT, regulatory elements, splice site signal strength and so on the genome of the characteristics. In this paper, we extracting five types of human genomics features for constitutive exons and cassette exons:sequence length, nucleotide composition, strength of splice sites, distribution of splicing regulatory elements, and evolution conservation, to further study of exons. 107 statistically significant genomic features were gotten finally.Besides, for these extracted characteristics need to choose the suitable classification algorithm, to achieve the best classification results. So we applying Support Vector Machine, Decision Trees, Multi-Layer Perceptrons, Neural Network, and Naive Bayes to construct classifier for cassette exons based on 107 statistically significant genomic features. By comparing the performance of different classifying methods, support vector machine model outperforms other methods, with over 90% precision.Finally, for these extracted characteristics, we applied the connection between the genetic structure and disease to explain some disease mutations such as the causes of SMA and IGHD â…¡ to understand this classifying model from biology angel.The methods of this thesis proposed to extract the exons different characteristics to research exons. Our result show based on these genomic features can be helpful to study different type of gene splicing events. |