Font Size: a A A

Research On Sequential Pattern Mining Across Multiple Sequences With Wildcards

Posted on:2014-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:X W MaFull Text:PDF
GTID:2268330401988763Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, data mining has increasingly attracted the attention in theinformation industry. This is because large-scale data exist in all real-worldapplication databases, and they can be widely used and transformed into usefulinformation and knowledge urgently. Frequent pattern mining has been proposed. Itis an important part of Data mining, and the main task is to mine the frequentpatterns in data sets.In the real world, each event is seldomly happen independently, while theyassociate with each other, such as DNA, and protein sequence. Therefore, studyingstructures of sequences can effectively help us to mine significant patterns, hiddenin the single data stream.In this dissertation, we aim to study "mining sequential patterns acrossmultiple sequences with gap constraints ". Although there are many sequencialpattern mining algorithms containing the wildcard, they are restricted by the size ofthe alphabet. Therefore, the time and space consumptions are demanded heavily.This paper will introduce the one-off condition, which essentially improves theperformance of time and space. Mean while, it can increase the flexibility ofpatterns. In view of this, our contributions are mainly as follows:(1) We first discuss and study frequent pattern mining and sequential patternswith gap constraints mining.(2) We propose an algorithm named M-OneOffMine algorithm. This algorithmsatisfies the one-off condition and Apriori properties. Experiments conducted onDNA sequences show that in the multiple sequence set, our algorithm has bettertime performance than the state-of-the art MCPaS algorithm. Meanwhile theexperiment on a single sequence set shows our algorithm has conducted a betterperformance on the time and space overheads compared to the MPP algorithm.(3) We design a web system of frequent pattern mining with wildcards. Inthis web system, we mainly show some classical algorithms, and it provides aplatform for research and academic exchange.
Keywords/Search Tags:frequent patterns mining, multiple sequences, wildcards, gapconstraints, one-off condition
PDF Full Text Request
Related items