Font Size: a A A

Construction Of Interval Databases And Its Applications In Knowledge Discovery

Posted on:2006-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y F YinFull Text:PDF
GTID:2168360155971502Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Association rule mining is one of important topics in data mining. It is a procedure of identifying strong interactions among itemsets in databases. For example, mining a transaction database in a supermarket can discovery associations (customer behaviors) within different commodities, such as bread and milk, coffee and sugar, toothpaste and toothbrush. While they are commonsense, association rule mining can find many other interesting interactions, such as 'beer and diaper'. This leads to a deeper research on development and wide applications of association rule mining. For example, it can be helpful in solving stock control, sales promotion, and customer behavior analysis in supermarkets. With the development of the supermarket and the commodity industries, binding sale, namely binding commodity, is rapidly popularized and becomes an important meanings of gaining profits. Association rule mining assists in binding sales. After our deeper understanding and researching, a kind of novel association rules of the form A→[B, C], referred to as interval rules, is proposed and studied in this thesis. There are many advantages for using interval values to represent binding commodities. Firstly, an interval contains more information than a single value. Because a single value offers the single value itself, while an interval offers the distribution of the values, i.e., any numbers in the interval can be taken. To follow up, an interval is more expressive than a mean, that is to say entropy of an interval is larger than the entropy of a mean. Thirdly, in an interval database it can be discovered which commodities are fit to be bound. This is much more important in reality. Based on the research about the interval clustering algorithms, this paper proposes to take the two fields in traditional database as a new field, and uses one of them to express the 'left field'of the new field (the left boundary of the interval), and uses the other to express the 'right field'of the new field (the right boundary of the interval), So we obtain the interval database. This paper conducts a deeper research on the algorithms of strong association rules (affinity rules), and offers the function formula for strong association rules. On the basis of the research about these function values, a complete interval allocation lattice system is constructed, and a property satisfied by the complete interval allocation lattice system, i.e. A∧C=B∧C and A∨C=B∨C ?A=B, is used to bind commodities. The essence of interval association rule mining is how to find out the binding commodities, i.e., to decide which commodities should be bound. The main contributions of this paper are divided into the following four parts: (1) Limitations of the traditional data mining, i.e. pattern missing, are studied in views of physics, math, and biology. (2) For the pattern missing problem, a kind of new data structure, i.e. interval database, is designed. (3) The concept of interval association rules mining is proposed, and a deeper research is conducted from the real meanings of the interval. (4) An algorithm for mining interval rules is designed, simulated and tested. For future work, some suggestions for improvements are given as well.
Keywords/Search Tags:interval, association rule, knowledge discovery, interval clustering, binding commodities
PDF Full Text Request
Related items