Font Size: a A A

Research And Development Of Service-Oriented Distributed Data Mining In Grid

Posted on:2010-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhaoFull Text:PDF
GTID:2178360275951455Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today,we could not image the emerging Gigabytes,along with the needs of digital management and the requirements of post-industries process.Although data mining technology has already been widely applied into a lot of fields,such as diagnosis,marketing and sales,screening images,load forecasting and so on,the traditional data mining mode is not well qualified for the explosive-growth data and high-complex computing model.So the distributed and parallel technologies should be used,and the traditional mode should be replaced by a new processing mode. Nowadays,Grid Computing and Web Service technology represents one of the most promising advancements for the data mining research.Grid technology could integrate geographically dispersed heterogeneous resources to form a super-computing platform,and enables a high degree of sharing for a variety of resources,including computing resources,storage resources,algorithm resources, information resources.And Web Service,as a newly Web application mode and distributed computing model,is gradually becoming a mainstream technology for interoperation of heterogeneous and distributed systems.Therefore,it is a new concept to apply Web Service technology into data mining field on Grid.It could compensate for dynamic and distributeness of grid resources by means of sharing resources,which could be reached by publishing,discovery and management of Web Service.The dissertation is a relatively initial exploration on how to apply Grid and Web Service into data mining area.Our Works includes as following:(1) The paper presents a new service-oriented data mining pipeline based on workflow and designs a late-model service-oriented distributed data mining architecture,which could enable users to achieve the all execution of data mining tasks interactively.(2) A portal of service-oriented distributed data mining is implemented by the mechanisms of publishing and discovering of Web Service.A private UDDI registry, which could supply service storage for providers and basic lookup for requesters,is set up in order to store and manage the service resources.A new quality of service model is put forward to measure satisfaction of users in service discovery.(3) A series of generic,extendibles data mining tools based on Weka library is developed,including:data selector,data transformation,algorithm selector,algorithm parameter configuration,attribute selector and result presentation.Those tools could enable users to participate in all process of data mining interactively,at the same time dynamically combine with the data mining services,achieve the construction of data mining pipeline.(4) Along with the integration of Grid and Web Service,more and more resources could be provided by Web Service.The paper designs and implements a service-oriented distributed data mining system,which could import,compose and invoke data mining service from PDDM,underlying distributed problem solving environment-Triana and open source data mining Library-Weka.Furthermore,the users could flexibility construct the data mining pipeline in workflow-compliant to implements interactive,distributed and parallel data mining.(5) The dissertation validates the SODDM platform,analyses PDDM performance from quantitative by WAT through the number of concurrent users.At the same time,a virtual grid experiment environment is build on campus network,to evaluate the usability and universality of the DDMWS,a demonstration of DDMWS for executing classify,clustering,association rules is presented.The results show that all kinds of distributed/parallel data mining tasks could be solved effectively.Based on the conclusion of our works and research at home and aboard,the feature of our works includes as following:(1) With the characteristic of data mining area,a group of QoS parameters and computing model are collected.The definition of QoS formula make sure that requests could get satisfied data mining services.(2) A private UDDI registry is set up to manage and coordinate the data mining services,implement publishing and discovery of web service.As of services discovery,it could,discover the data mining services,who satisfy a certain QoS.(3) A new data mining pipeline on workflow is proposed,and develops a series of data mining tools by Weka.Those toolkit could dynamically compose with data mining services within DDMWS,construct data mining pipeline,and achieve the execution of data mining tasks.
Keywords/Search Tags:Web Service, Grid Technology, Distributed Data Mining, Weka, Triana
PDF Full Text Request
Related items