Data mining of large relational databases

Posted on:2003-10-02

Degree:Ph.D

Type:Dissertation

University:University of California, Los Angeles

Candidate:Giuffrida, Giovanni

Full Text:PDF

GTID:1468390011489278

Subject:Computer Science

Abstract/Summary:

Knowledge Discovery from Databases and Data Mining (KDD/DM) is a young multidisciplinary area that combines experiences from, besides others, statistics machine learning, databases and data visualization. KDD/DM grew at break-neck pace in recent years driven by the needs of an industry which, over the past decades, accumulated tremendous amounts of data, and now lacks the capability of effectively (and efficiently) gathering relevant information from it. The relational model has largely shown its strength in structuring and retrieving data when the type of information we are looking for is well known. So, while a question like “How much did my customers spend on product X in region Y?” is a straightforward task for a relational database, the same is not true for the question: “What are the reasons for the strong sales of product X in region Y?” The industry has largely recognized the value of a system able to “answer” the second type of question; the new wave of KDD/DM applications addresses this issue.; KDD/DM is mostly rooted in the machine learning discipline and, consequently, inherited many legacies that do not necessarily fit in the domain of large databases. Also, KDD/DM grew in a sort of uncoordinated way fueled by fast growing commercial interests and good successes in the research community. Even though nowadays many tools and algorithms can be found, in both commercial and research environments, no real standards have been yet proposed. For instance, there is no standard way of structuring the database and no standard language for data mining. We believe that the integration of data mining and databases has a lot to offer in tackling these issues.; We address these issues by setting the following three objectives for this dissertation. One, prove that efficient and effective data mining can be achieved on top of standard DBMS. Two, introduce a couple of general heuristics that help to reduce the search space when mining large datasets. Three, promote integration of data mining and statistics by presenting two applications, one that combines data mining and statistics and one that compares them.

Keywords/Search Tags:

Data mining, KDD/DM, Statistics, Large, Relational

Related items

1	Research On Algorithm For Relational Data Classification Based On Background Knowledge
2	The Theory And Application Of Multi-Relational Data Mining
3	The Research Of Multi-Relational Data Mining Technology And Its Realization In Tax Assessment
4	A Method Of Multi-Relational Data Classification Of Continuous Attributes
5	Research On Application Of Statistics Based Data Mining In Customer Relationship Management
6	Research On Multi-relational Association Rule Mining
7	Research On Some Problems Of Statistical Relational Learning
8	Storing Large Scale Semi-structured And Unstructured Data On RDBMS
9	The Research Of Multi-realation Assoaction Rules Algorithms In Data Mining
10	The Research Of Multi-Realation Assoaction Rules Algorithms In Data Mining