The Application Of Association Rule Mining In Viral Genetic Data Analysis

Posted on:2015-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:T Wang

Full Text:PDF

GTID:2298330434465327

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The H7N9influenza virus outbreak in Zhejiang, Shanghai, Jiangsu, Anhui onFebruary2013, caused widespread concern. Influenza A virus, also known as A-typeinfluenza virus, which is mainly present in the animal, once the variation can causethe World Influenza pandemic; the last new virus H7N9epidemic, how to use aneffective data analysis tools to analyze and process the large amount of biologicalsequence data has become A major challenge in influenza virus era. In this case,combined the data mining and bioinformatics, is one of the most suitable and effectiveway.According the analysis and research of the previous information, this paper willbuild a Influenza virus database around the H1N1, H3N2virus,based on the study ofthe principles of the relational database design, conceptual design, logical design, putforward the corresponding E-R diagrams and relational table design. The data comesmainly from the influenza A virus gene sequences in the Genbank database, first tosearch for sequence data we need comes through the database search tools Entrz tosearch for the relation sequence data, then save the data retrieved in XML format,which is designed to facilitate the different database integration of heterogeneous data,where we use template-driven mapping, data exchange and mapping data in thedatabase and XML documents. Based on the above work, eventually constitute a localsecondary viral gene sequences genbank format for storing sequence database.This paper also focuses on a problem,it is data mining association rules, which ismainly Apriori algorithm as an example to be introduced. The mining associationrules is based on biological information. In this paper, After the fatal flaw Apriorialgorithm, an improved algorithm based on biological sequences frequent item sets iscarried up. The main idea is to use the algorithm model metrics and more support,mainly including local support, distribution support and general support, with thesethree support were measured by a corresponding sequence patterns frequentlyoccurring in a given sequence degrees, as well as how often a sequence number in aspecified set of sequences that appear, and how frequently occur in sequence modethe entire sequence set. Such a sequence of frequent pattern mining method that canbetter adapt to the mining sequence and conserved sequence repeats, compared toApriori algorithm, which has a stronger degree of professional and specialty.

Keywords/Search Tags:

bioinformatics, data mining, Apriori algorithms, frequent item sets

PDF Full Text Request

Related items

1	Search Of Algorithms For Mining Maximum Frequent Item-sets
2	Research On Correlative Algorithms Of Association Rule Mining
3	Research On Mining Algorithms Of Maximal Frequent Item Sets
4	A Frequent Item Sets Mining Algorithm With Constraint
5	Based On The Maximum Frequent Set Data Mining Association Rules Algorithm
6	Improvement Of Frequent 1-Item Set Generation Method And Experimental Study
7	Mining Of Maximal Frequent Item Sets Based On AFOPT
8	Research Of Closed Frequent Item Sets Mining On Distributed Environment
9	Research And Improvement The Algorithm Of Mining Frequent Item Sets In Text Association Analysis
10	Research And Application Of Association Rull Mining Algorithm In The Data Mining