| With the development of the Internet,humanity has entered the era of big data,it produces vast amounts of datas.The traditional data analysis methods and techniques has been difficult to processing,so the data mining technology is applied.Association rule mining is an important branch of data mining.It has been widely used in various fields,especially in the field of bioinformatics.It is a new interdisciplinary subject,which has become one of the most abundant opportunities and challenges in association rules mining.The current algorithm for mining association rules from transaction data is centralized majority of certainty and enumerations.When it is applied to the intensive data sets of item spaces with hundreds of items,it is difficult to handle the calculation.In this paper,we used a random sampling process based on Gibbs sampling,extract the rule antecedents of the given rule consequent from the item space randomly,and find the most important as-sociation rules in the original transaction data set from the simplified transaction data set generated by the sample.Firstly,the importance measurement of association rules is selected,and the algorithm process based on Gibbs sampling is given.Then,two sets of data sets are generated by simulation,which are smaller and larger in the item space.Using the method proposed in this paper and the Apriori algorithm to excavate the most important association rules.Finally,the proposed Gibbs sampling method is used to analyze a set of splice site data sets for a set of DNA sequences.And we found the splice sites belonging to the El class and the IE class which have a significant association with which fields of the gene sequence and what bases are taken by these fields.Through the simulation experiments,we can find that the random sampling method based on Gibbs sampling can simplify the item space,and the most important association rules can be found from the simplified data set with the limit probability of 1.In the empirical part,it is found that the splicing site of DNA sequence is El class,which is mainly related to thirty-first field bases is G,thirty-second field bases is T and thirty-fifth field bases is G.The IE class is maily related to twenty-first field bases is T,twenty-ninth field bases is A and thirtieth field bases is G. |