Data mining techniques for structured and semistructured data

Posted on:2001-12-27

Degree:Ph.D

Type:Thesis

University:Stanford University

Candidate:Nestorov, Svetlozar Evtimov

Full Text:PDF

GTID:2468390014458288

Subject:Computer Science

Abstract/Summary:

Data mining is the application of sophisticated analysis to large amounts of data in order to discover new knowledge in the form of patterns, trends, and associations. With the advent of the World Wide Web, the amount of data stored and accessible electronically has grown tremendously and the process of knowledge discovery (data mining) from this data has become very important for the business and scientific-research communities alike.; This doctoral thesis introduces Query Flocks, a general framework over relational data that enables the declarative formulation, systematic optimization, and efficient processing of a large class of mining queries. In Query Flocks, each mining problem is expressed as a datalog query with parameters and a filter condition. In the optimization phase, a query flock is transformed into a sequence of simpler queries that can be executed efficiently. As a proof of concept, Query Flocks have been integrated with a conventional database system and the thesis reports on the architectural issues and performance results.; While the Query-Flock framework is well suited for relational data, it has limited use for semistructured data, i.e., nested data with implicit and/or irregular structure, e.g. web pages. The lack of an explicit fixed schema makes semistructured data easy to generate or extract but hard to browse and query. This thesis presents methods for structure discovery in semistructured data that alleviate this problem. The discovered structure can be of varying precision and complexity. The thesis introduces an algorithm for deriving a schema-by-example and an algorithm for extracting an approximate schema in the form of a datalog program.

Keywords/Search Tags:

Data, Mining

Related items

1	Applications Of Data Mining For The Competitive Intelligence System In The Enterprise
2	Based On Data Mining, Web Mining System
3	Study On Several Typical Data Mining Methods And Their Applications
4	Research On Technologies And Application Of Data Mining For PLM
5	Web-based Data Mining Technology
6	Research On The Technology Of Web Log Mining
7	Web-Based Data Mining Technology Research And Application
8	Research And Application Of Algorithm In Data Mining Based On Oracle Data Mining API
9	Data Mining Applications, Decision Support Systems In Auto Sales
10	Multi-Users Online Visual Data Mining System