Font Size: a A A

Discovery And Reuse For Data Mining Workflow

Posted on:2009-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:X A HanFull Text:PDF
GTID:2178360272486739Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the diversity of mining patterns, the non-triviality of process and the complexity of algorithm in data mining, it's always a time-consuming process to construct a new workflow from the very beginning, which often involves the participation of domain specialists and argorithm designers. The workflow is not only a process to discovery knowledge, but also contains common solutions to certain types of issues. Therefore, the reuse of existing workflows can decrease time and improve the quality in building new workflow significantly.In this paper, we implement a system for the discovery and reuse of data mining workflow using the ontology approach. My main work includes:1,Considering user's requirements and characters of the data mining workflow, we describe workflow in four levels: 1) natural language description, 2) declarative description as an atomic service, 3) declarative description as a composite service, 4) procedure description.2,According to the descriptions, we create a workflow ontology in which we define some classes, relations, axioms and use them to organize the workflow resourses. The ontology is implemented in owl language.3,We propose an architecture consisting of four modules: GUI, query preprocess, key words query and semantic query for searching workflows. In semantic query module, we carry out the semantic discovery using SPARQL language. Then we introduce inference to the workflow ontology where T-BOX is used to maintain the ontology, and A-BOX to answer user's query. In A-BOX, we implement three kinds of inferences: 1) vertical inference, 2) horizontal inference, 3) new relationship inference. In all, through these inferences, our system can support complex search, and the result is very precise.4,We implement a prototype system which has three levels: Storage Layer, Management Layer, GUI Layer. We also provide some standardlized terms to narrow the query words for improving the query performance.Now the system has supported the discovery, editor, and running of the workflow lifecycle, but it does not support the workflow publishment. Also the system does not support key word search. The two issues would be our future works.
Keywords/Search Tags:Data mining, workflow, discovery, reuse, ontology, inference
PDF Full Text Request
Related items