Font Size: a A A

The Research And Application On Sequential Pattern Mining

Posted on:2012-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:L L YinFull Text:PDF
GTID:2218330338970698Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
What is data mining? Understanding from its literal meaning is mining to data, the purpose of it is the course that reflects useful knowledge effectively by mining database or other information database. In the current, it is a very popular research field, the discovery of sequential pattern is one of its important research subjects.Since sequential pattern mining is raised, it has become the focus of research field, because it can be used in many fields. With the research of sequential patterns algorithm, many better and relatively mature algorithms appeared, but most of these algorithms are facing the entire database for mining, this causes some useless candidate sequences or some are not interested by users in mining process, so it will lead to take a lot of time and space in mining process which is based on support degree. Such as analysising the behaviors of customers, it doesn't need to compare products which are purchased in January and products which are purchased in December. How to put time limiting factors into sequential pattern mining? This question becomes an important direction of research at present. This paper introduces the sequential pattern mining based on constraints briefly, and analyzes the relevant time constraint conditions, and then it proposed the method of fast and effective generation of candidate-sequences based on time limit, this method can quickly locate the sequences which will be united, and it can avoid the unnecessary scanning and inspection, thus speeding up the generation of candidate-sequences.Sequential pattern mining has important application in biological research field. With the development of medical technology, many specie's genetic sequences have been measured, so the database for biological sequences which come from all over the word is becoming stronger in the world. If we can get the rule of biological sequences from these huge amounts of data, we can summarize genetic characteristics of some species, and find some gene which can lead to disease, This is useful to disease prevention and treatment.Due to the particularity of biological sequences, the method which uses a single support cannot meet the purpose of biological sequences mining in some ways, so this paper puts forward the biological sequences pattern mining method based on multi-support, this method uses characteristics of tree level, according to two properties to prune it, finally it will find frequent pattern which meet minimum support, experiments show that this algorithm reduces the complexity of time and space.
Keywords/Search Tags:data mining, sequential patterns, time limit, biological sequence pattern, candidate-sequences
PDF Full Text Request
Related items