Font Size: a A A

Generating constrained association rules from semistructured data

Posted on:2000-03-19Degree:Ph.DType:Dissertation
University:Northwestern UniversityCandidate:Singh, Lisa OberoiFull Text:PDF
GTID:1468390014460847Subject:Computer Science
Abstract/Summary:
With the explosion of the World Wide Web (WWW) and the advent of digital libraries, the demand for knowledge discovery in databases (KDD) techniques that identify hidden knowledge from semi-structured data has increased. The goal of this dissertation is to develop a framework and design algorithms that have the ability to accurately and efficiently generate association rules using concepts and structured data values extracted from semi-structured documents. Specifically, this dissertation contains three major contributions.; The first contribution is the design of a system architecture that supports data mining of semi-structured data, including Web documents. The architecture incorporates a rule generator, a concept library and an information discovery module. Advantages of this architecture include its ability to maintain a compact representation of document sets, distinguish general and specialized concepts and relationships, update data for more dynamic domains, and exist above pre-existing databases.; The next contribution is the development of partially and fully constrained association rule algorithms that discover rules using subsets of unstructured concepts and structured attribute values. The user prespecifies a set of constraints, concepts and/or attribute values that are used to guide the data mining procedure. To date, most association rules algorithms traverse all the transactions or documents numerous times to generate large itemsets. Because semi-structured data is typically sparse and transactions are long, this dissertation proposes algorithms that employs an inverse transaction model in an attempt to take advantage of the data characteristics. Performance results show that when the data is sparse, these algorithms are faster than Apriori-based algorithms for traditional transaction data, especially long transaction databases. Further, since we maintain concept relationship information in the concept library, we can also efficiently generate rule sets involving concepts having a semantically strong relationship to the initial set of constraints.; The last contribution is the development of a prototype I&barbelow;nductive R&barbelow;ule I&barbelow;dentification S&barbelow;ystem (IRIS) for semi-structured data. Once the user specifies the structured value/concept constraints, IRIS obtains data based on the constraints from the concept library and the database. It then generates fully constrained association rules.
Keywords/Search Tags:Data, Association rules, Constrained association, Concept library, Structured, Constraints
Related items