Font Size: a A A

Data mining of large relational databases

Posted on:2003-10-02Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Giuffrida, GiovanniFull Text:PDF
GTID:1468390011489278Subject:Computer Science
Abstract/Summary:
Knowledge Discovery from Databases and Data Mining (KDD/DM) is a young multidisciplinary area that combines experiences from, besides others, statistics machine learning, databases and data visualization. KDD/DM grew at break-neck pace in recent years driven by the needs of an industry which, over the past decades, accumulated tremendous amounts of data, and now lacks the capability of effectively (and efficiently) gathering relevant information from it. The relational model has largely shown its strength in structuring and retrieving data when the type of information we are looking for is well known. So, while a question like “How much did my customers spend on product X in region Y?” is a straightforward task for a relational database, the same is not true for the question: “What are the reasons for the strong sales of product X in region Y?” The industry has largely recognized the value of a system able to “answer” the second type of question; the new wave of KDD/DM applications addresses this issue.; KDD/DM is mostly rooted in the machine learning discipline and, consequently, inherited many legacies that do not necessarily fit in the domain of large databases. Also, KDD/DM grew in a sort of uncoordinated way fueled by fast growing commercial interests and good successes in the research community. Even though nowadays many tools and algorithms can be found, in both commercial and research environments, no real standards have been yet proposed. For instance, there is no standard way of structuring the database and no standard language for data mining. We believe that the integration of data mining and databases has a lot to offer in tackling these issues.; We address these issues by setting the following three objectives for this dissertation. One, prove that efficient and effective data mining can be achieved on top of standard DBMS. Two, introduce a couple of general heuristics that help to reduce the search space when mining large datasets. Three, promote integration of data mining and statistics by presenting two applications, one that combines data mining and statistics and one that compares them.
Keywords/Search Tags:Data mining, KDD/DM, Statistics, Large, Relational
Related items