Font Size: a A A

Study Of Service-oriented Data Mining Key Techniques

Posted on:2007-12-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:1118360242961889Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In order to solve the problem of large-scale distributed data mining (DM), one kind of architecture is needed to facilitate data resource integration, offer high quality data mining service (DMS) with higher security and privacy protection. It will provide strong technology support to DM system development that Service Oriented Architecture, ontology and WEB service. According the feature of distributed DM and following user-centered idea, a Service-Oriented Data Mining Architecture–SODMA is proposed by applying Service Oriented Architecture, ontology and WEB service and so on. SODMA will pack the DM algorithms to WEB services, to realize privacy protection data integration using data integration ontology and privacy protection ontology, help mult-level users in different domain select appropriate and high quality DMS dynamicly with the help of user-centered DM ontology and DM quality evaluation ontology, offer high usability, high performance, high quality and security DMS in distributed heterogeneous environment.Heterogeneous data integration is the first key step of data preprocessing for distributed heterogeneous DM. The model of data integration needs to solve heterogeneity, integrity, privilege control and scale restriction. Based on the characteristic analysis of data warehouse, middleware integration and ontology-based data integration, referencing the achievement of existing semantic data integration and privacy protection DM, one data integration modle is offered based on agents and ontology which can implement effectively semantic data integration based privacy protection. In the model one privacy protection policy ontology is defined, privacy protection policy integration uses single ontology approach, data integration adapts hybrid ontology approach combinding global-as-view and local-as-view, schema obfuscation and role obfuscation help improving privacy protection.DMS is a complicated intensive application involved data, computation and knowledge, and requires professional domain knowledge to use. Existing"system-centered"DM solutions often focus heavily on algorithms, systems engineering challenges, without first thoroughly exploring how end-users will employ the new DM technology and make the system hard to operate and use. Some systems help user selecting proper and high quality DMS with DM ontology, but DM ontology only enumerate the DM algorithms and can't ensure high quality of services (QoS). Baesd on the production of DM technology and system, following user-centered idea, the user-centered data mining ontology is presented which not only offers abundant data mining algorithms for different function and different type of handling data, but also provides multiple data mining application solutions for different application domains. It can help the multi-hierarchy users in different domains select their DMS easily. It is also discussed that the implementation of the ontology based on WEB ontology language (OWL).The DM algorithms and domain application solutions is the instances of the user-centered data mining ontology and the core of the user application. An example solution for money laundering is introduced, including the hierarchy of the domain, some DM algorithms and the mapping of application solutions and their used algorithms. There are 3 main application solutions: identification of suspicious money laundering trades, trade network analysis and money laundering mode mining. A visual link analysis is proposed to make trade network analysis interactively and visually. Link Discovery Algorithm Based on Graph Entropy is put forward to identify critical nodes from the complex networks of money laundering crime. A improved Frequent Subgraph Discovery Algorithm Based on Apriori Idea is presented to mine frequent subgraphs from simple graphs efficiently which can be used for structure analysis of trade network analysis and mining new money laundering modes.User can define DM task using domain data integration ontology and DM ontology. Next select proper DMS, but most users haven't such professional knowledge. One service selection mechanism is needed to help user selecting high quality DMS in view of user usability. A more all-around DMS Quality Evaluation ontology(OntDMQ) is proposed by synthesizing WEB service QoS, DM unique characteristic, subjective factor such as user feedback and service dynamic characteristic. The evaluation method of QoS is discussed. The QoS-based dynamic service selection method is presented that user can define the QoS constraint of DMS referencing OntQE and adjust the factor fff ,the system select the most appropriate DMS in the services fiting the user requirements according to computing compositive quality value.Found on above achievements, one prototype system of privacy protection data integration and DMS selection has been finished in foreign exchange money laundering domain. The system characteristic is summarized.
Keywords/Search Tags:Data mining, Service Oriented Architecture, Ontology, Semantic data integration, Privacy protection, Quality evaluation, Service selection
PDF Full Text Request
Related items