Font Size: a A A

The Application Of Association Rule Mining In Viral Genetic Data Analysis

Posted on:2015-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2298330434465327Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The H7N9influenza virus outbreak in Zhejiang, Shanghai, Jiangsu, Anhui onFebruary2013, caused widespread concern. Influenza A virus, also known as A-typeinfluenza virus, which is mainly present in the animal, once the variation can causethe World Influenza pandemic; the last new virus H7N9epidemic, how to use aneffective data analysis tools to analyze and process the large amount of biologicalsequence data has become A major challenge in influenza virus era. In this case,combined the data mining and bioinformatics, is one of the most suitable and effectiveway.According the analysis and research of the previous information, this paper willbuild a Influenza virus database around the H1N1, H3N2virus,based on the study ofthe principles of the relational database design, conceptual design, logical design, putforward the corresponding E-R diagrams and relational table design. The data comesmainly from the influenza A virus gene sequences in the Genbank database, first tosearch for sequence data we need comes through the database search tools Entrz tosearch for the relation sequence data, then save the data retrieved in XML format,which is designed to facilitate the different database integration of heterogeneous data,where we use template-driven mapping, data exchange and mapping data in thedatabase and XML documents. Based on the above work, eventually constitute a localsecondary viral gene sequences genbank format for storing sequence database.This paper also focuses on a problem,it is data mining association rules, which ismainly Apriori algorithm as an example to be introduced. The mining associationrules is based on biological information. In this paper, After the fatal flaw Apriorialgorithm, an improved algorithm based on biological sequences frequent item sets iscarried up. The main idea is to use the algorithm model metrics and more support,mainly including local support, distribution support and general support, with thesethree support were measured by a corresponding sequence patterns frequentlyoccurring in a given sequence degrees, as well as how often a sequence number in aspecified set of sequences that appear, and how frequently occur in sequence modethe entire sequence set. Such a sequence of frequent pattern mining method that canbetter adapt to the mining sequence and conserved sequence repeats, compared toApriori algorithm, which has a stronger degree of professional and specialty.
Keywords/Search Tags:bioinformatics, data mining, Apriori algorithms, frequent item sets
PDF Full Text Request
Related items