Font Size: a A A

Study Of Gene Sequence Based On Rough Set Theory

Posted on:2009-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WuFull Text:PDF
GTID:2178360245483216Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Rough Set Theory (RST) is a mathematics method for dealing with uncertain, imperfection and imprecise data. It is widely applied in data analysis and processing area. Bioinformatics is an interdiscipline combined with mathematics, computer science and biology. It is studying the biology information by different methods, analyzing the meaning of biology data and applying research results into practice. Bioinformatics is one of the hot spot in the cross-disciplines all over the world. Based on RST, the promoter and gene expression sequence data in bioinformatics are mainly studied in this thesis.It is an advanced method for RST being applied in promoter sequence research. Promoter identifying research is moved from the biology experiment methods to computer simulating identification. It is an important sequence segment for promoter to instruct the transfer of gene. Promoters can be on several locations in DNA sequence. Therefore, it is difficulty to locate the exact positions for promoters in DNA sequence. Applying the statistics and RST into the promoter research is good for locating the uncertain position, identifying and predicting the promoters in DNA sequences.RST is also excellent in the study of gene expression sequence and generation of decision rules. Based on RST and information theory, a heuristic algorithm for gene analysis and selection is proposed in this thesis. Based on this algorithm, genes related to the disease can be selected from mass gene sets and redundant information can be reduced. In the research of theory with this heuristic algorithm, related genes can be reserved from the gene database and minimal decision rule sets can be generated. Rules could be provided to the experts for building decision rule sets.There is an simulation on the medical datasets, Leukemia datasets, which is widely used to study gene sequences. The studies in this thesis are further illustrated by the simulation results.
Keywords/Search Tags:rough set, bioinformatics, promoter, rough entrop
PDF Full Text Request
Related items