Font Size: a A A

Research And Design Of Distributed Clustering Algorithm Based On SOA

Posted on:2010-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:J H XieFull Text:PDF
GTID:2178360275453374Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technologies, it is more convenient to get agreat amounts of imformation through network. Howerver,as the imformation turns large-scale and complicate, it is more and more difficult to extract useful imformation. Date mining provides a facility to obtain some potential and worthful imformation from huge amount of imformation. As a widely used method of data mining, clustering is applied to lots of fields, such as data analysis, Pattern Recognition, Graphics Image Processing and so on.Nowadays, with the fast development of storage technologies, large-scale datasets are stored in a distributed way. It usually applys distributed clustering algorithms to cluster large-scale and distributed datasets. SOA is a service-oriented architecture and provides a new architecture for distributed clustering analysis. Distributed clustering algorithms could realise the cluster analysis for distributed datasets efficiently. Distributed cluster analysis methods are the hot subject in cluster analysis field nowadays.Therefore, this thesis applys SOA to realise the distributed cluster analysis for the distributed datasets. The main work of this thesis are showed in the following:(1) This thesis introduces the backgrounds, related work, research purpose and significance of distributed clustering algorithm based on SOA.Then it presents basic technologies of SOA and distributed data mining.(2) This paper analyses DBDC to study the specific clustering procedure of distributed clustering algorithms, which mainly contains local data mining and global data mining.And local clustering is the basis of the whole algorithm for its clustering results will influence the final ones of the algorithm directly.Local data mining mainly includes three processes, such as local DBSCAN, choosing representatives and updating of the local clustering. SDBDC is the scalable algorithm of DBDC. And for the defects of DBDC, SDBDC made some improvements of local clustering and global clustering. But SDBDC also has some drawbacks on algorithm efficiency. As a reslut, combining the advantages of DBDC and SDBDC, this thesis improves the process of choosing representatives, with purpose of increase of efficiency and ensuring the quality of algorithm.(3) For realizing the distributed clustering algorithm, this paper designs algorithm into web services using the SOA and Web Services technologies, and proposes distributed clustering web services model based on SOA. This services model contains two groups of services, such as local clustering services and global clustering services.And local clustering services include local DBSCAN service, representatives chosen service and the local clustering updating service. Global clustering services mainly contain global DBSCAN service.(4) According to the distributed clustering web services model based on SOA, this thesis firstly uses Weka to dasign the distributed clustering algorithm, then encapsulates it into web services and deploys them by Axis, and finally groups theses services into workflow using Triana and realizes the tasks of distributed clustering analysis.The features of the research of this thsis are showed in the following:①Analysing the advantages of DBDC and SDBDC, it improves therepresentives chosen process of local clustering and proposes a new scalablealgorithm of DBDC.②A distributed clustering web services model based on SOA is proposed withthe combination of SOA and the distributed clustering algorithm technologies. Thispaper realizes the prototype system based on this model, and then tests it by Triana.The result shows that it is feasible and effective to using the system to clusterlarge-scale and distributed datasets.
Keywords/Search Tags:distributed data mining, distributed clustering algorithm, SOA, Web Servies, DBDC, SDBDC
PDF Full Text Request
Related items