Font Size: a A A

The Research Of Alu Sequence In Human Genome

Posted on:2012-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:L Z WeiFull Text:PDF
GTID:2120330335472602Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Alu repeats are short interspersed elements (SINES) in primates genome with about 1000 000 copies. This kind of DNA share a cleavage site AG/CT for restriction enzyme Alu I, so called Alu repeats. A typical Alu is 282nt in length, consisting of two similar, but distinct monomers. Alu elements present in almost all known human genes intron. Alu repeats were becoming important research contents because of its universality of the existence and biological functions.The latest researches indicate that Alu repeats are probably related with gene regulation network and regulate the interspersed elements in cooperating expression. As we know, information encoded in DNA is translated into amino acid sequences by triplet codons. What kind of codes are used to translate the information hidden in Alu elements? So we studied the reading frame of exonized Alu with Non-uniform index HI and found that Alu repeats may have eight-nucleotide reading frame or maybe use octamers as coding pattern, compared with the results of exon and intron sequence. This results indicated that Alu repeats may participate in the regulating of gene expression. In addition, we analyzed other short interspersed elements and found that the eight-nucleotide reading frame of Alu sequence was probably unique. Further, by calculating the probability of 4 kinds of bases at each site we did not find special rules. This is consistent with the conclusions obtained by statistical analysis of coding region, which showed that the usage frequency of the codons in DNAs is a non-uniform distribution. At the same time, we found that the composition of base in the Alu sequence is asymmetric, and the content of G+C is more than the content of A+T.Then, based on the asymmetry of base distribution and Conservative structural characteristic of the Alus, the method of increment of diversity (ID methods) was applied to identification and analysis of Alu sequence. Using the frequency of 1-mer,2-mer,3-mer,4-mer,8-mer, Alu sequences were identified from negative sequences which are extracted from introns and exons. Using three-fold cross-validation, most of sensitivities, specificities and overall accuracies were more than 99%,96%, and 90% separately. The best accuracy was obtained when k is 4. The results showed the conservation and correlation of base composition in Alu sequences.Above methods were used to predict Alu sequences in chromosome 1 in human genome. The sensitivity was more than 97.41%, the specificity was more than 77.18%, and the correlation coefficients was more than 0.35. The results indicated that the increment of diversity can be used to predict Alu sequences.
Keywords/Search Tags:Alu sequence, Non-uniform index, reading frame, increment of diversity
PDF Full Text Request
Related items