Font Size: a A A

Research On Prediction Methods For The Genotyping Of Cervical Cancer Human Papillomavirus

Posted on:2016-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:C K KuangFull Text:PDF
GTID:2284330467482254Subject:Biology
Abstract/Summary:PDF Full Text Request
Cervical cancer is one of the most leading causes of cancer morbidity and mortality ofwomen in worldwide. Approximately,500,000new cases of cervical cancer are diagnosed eachyear with280,000deaths, and thus, it has become the second most common cancer of women.Moreover, younger women often tend to present with cervical cancer. Several basic and clinicalstudies have found that HPV infection is the primary promoter factor induces cervical cancer. Upto now, more than200types of HPV have been identified. They can be classified into high-riskHPV types and low-risk HPV types according to toxicity or the strength of the pathogenicity tothe human body. During cervical low-grade lesions, women who were infected with positivehigh-risk HPV types are more dangerous than those with low-risk HPV types. Therefore, HPVtypes play an important role in judgment and guidance of cervical cancer treatment, and thus, ithas become a hot issue for some researchers at home and abroad. In this study, we focuses onprediction methods for high-risk HPV types of human papilloma virus and the main contents canbe summarized as follows:1. We reviewed extraction methods for sequence information and classification algorithmsin cervical cancer type model, in which several widely used extraction methods for proteinsequence information and classical classification algorithms were detailed introduced. We alsofurther discussed their advantages, disadvantages and application. This provides us a theoreticalbasis and foundation for future study.2. We proposed a prediction method for high-risk types of cervical cancer humanpapillomavirus using ‘protein sequence space’ model. With help of amino acid mutation matrixand set methodology, we defined and constructed the ‘protein sequence space’. Homologousinformation of HPV protein sequences was extracted using word statistical model of the ‘proteinsequence space’. Using support vector machine as a predictor, we constructed a prediction modelfor high-risk types of cervical cancer human papillomavirus. By prediction accuracy andF1-score, we further discussed the influence of different mutation matrices and word lengths onprediction model. The experimental results show that the prediction model achieved the bestperformance on the E6dataset with p40mutation matrix. The prediction accuracy of the best performance is95.59%, with90.91%F1-score. In addition, the prediction of four ‘unknown’HPV types also verifies the effectiveness of the proposed prediction model.3. We proposed a prediction method for high-risk types of cervical cancer humanpapillomavirus using position-specific model. Given a DNA sequence, we first designed positionmatrices of a particular base and measured the randomness degree of the nucleotides’distributionin specific local area with help of Shannon entropy. We found that the distributions of fournucleotides in the first position on the right of the nucleotide C and on the left of the nucleotideG are extremely uneven, and most of conserved patterns in the high-risk HPV sequences aresimilar to that of the low-risk HPV sequences. Based on Markov model, we constructed aposition-specific statistical model to describe the local dynamic distribution of all bases in thespecific positions around the given base and illustrated its applications of the proposed method inthe prediction of cervical human papilloma virus genotyping. We evaluated the proposed methodon three classes of HPV: Alpha, Beta and Gamma. The results indicate that the proposedprediction model has a strong ability of distinguishing different classes of cervical cancer humanpapillomavirus. Its prediction accuracy of overall classes is up to97.18%, and that of Alpha,Beta and Gamma class are98.41%,100%and91.89%, respectively.
Keywords/Search Tags:Human papilloma virus, ‘Sequence space’ model, Markov, Position-specific model, Support vector machine
PDF Full Text Request
Related items