Font Size: a A A

Multi-relational data mining using vertical database technology

Posted on:2005-11-02Degree:Ph.DType:Dissertation
University:North Dakota State UniversityCandidate:Ding, QiangFull Text:PDF
GTID:1458390008979344Subject:Computer Science
Abstract/Summary:
Data-mining algorithms look for patterns in data. Most data-mining algorithms look for patterns in a single relation or table, while most real-world databases store information in multiple tables [HK01]. Multi-relational data mining (MRDM) has gained more and more interest recently and will continue to be significantly important in the future. MRDM approaches have been successfully applied to a number of problems in a variety of areas, e.g., in the area of bioinformatics. In this dissertation, we use the breakthrough vertical database and data-mining predicate-tree (P-tree)1 technology to solve the non-scalability problem which exists for MRDM in standard horizontal databases. We develop methods to generate data-mining-ready vertical materialized views to facilitate fast vertical mining. We store the vertical materialized views into P-tree format. P-trees are lossless and compressed representations of the original data that record count information in order to facilitate efficient data mining. Our vertical method of MRDM offers these advantages. First, we can generate vertical materialized views directly by replication or Boolean operations. Second, by fully vertical partitioning (to the bit position level), we only need to read what is needed; therefore, I/O is minimized. Third, we encode attribute values into bit vector format in a highly compressed manner. Finally, data mining can be formulated as highly parallelizable, logical operations facilitating fast implementations, and index is totally eliminated.; 1P-tree technology is patent pending at North Dakota State University.
Keywords/Search Tags:Data, Mining, Vertical, MRDM
Related items