Research On Parallel Clustering Algorithm Based On "90-10" Rule

Posted on:2009-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:C P Li

Full Text:PDF

GTID:2178360242991022

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the developing of the computing technology ,information data are more and more. How to extract useful information from large amounts of data becomes an urgent problem, The Data Mining Technology are brought under this condition, it is a very valued new area of research area, and it is a crossed subject that adopts theory and technology of database and artificial intelligent and machine learning and statistics and so on. Clustering analysis is one of the important areas in the data mining research. Especially, it is applicable in many fields, such as image processing, intrusion detection, bioinformatics and others.However, because of the complexity of the distribution of clustering object at high dimensional feature space, the uncertainty and the flexibility of the evaluation of clustering results and high computing complexity of cluster problem as an optimized problem, clustering algorithm still are confronted with lots of problems and challenges.a new parallel data preprocessing algorithm based on"90-10"rule is proposed in this paper. It make an improvement on the existing algorithm,This algorithm can reduce the scale of data and runtime. accounting for one tenth of it in the best situation. Presently the parallel hierarchical algorithms based on SIMD are not adaptive and can not process memory conflicts among different processors. To overcome this shortcomings and Inorder to use the result of data preprocessing better, a new parallel algorithm based on SIMD-EREW is proposed in this paper. Firstly, for the objective of decreasing the number of vertexes before the minimum spanning tree is constructed, the preprocess method based on 90-10 rule is improved, then it is used to preprocess input data points. The proposed algorithms can cluster n objects with O(p) processors in time, where 1â‰¤pâ‰¤longn,0.1â‰¤Î»â‰¤0.3. Performance comparisons show that it is the first parallel hierarchical clustering algorithm algorithms in the sense of that it is both adaptive and without memory conflicts, and thus it is an improved result over the past researches.To validate the performance and extensibility of our algorithm in practice, we made several experiments and simulations based on the synthetic and benchmark data sets from Internet on IBM P690 of high performance computing centre. The results showed that our algorithm has a great superiority in both computing time and space, and consequently a stronger adaptability and operability for large scale data sets.

Keywords/Search Tags:

Data preprocessing, Hierarchical clustering, Memory conflicts, Parallel algorithms, EREW-SIMD

PDF Full Text Request

Related items

1	Research On Parallel Hierarchical Clustering Algorithm
2	The Research And Realizing Of Parallel Cluster Analysis Algorithm
3	An Adaptive Parallel Hierarchical Clustering Algorithms Without Memory Conflicts
4	Researches On On-chip Parallel Data Access Techniques For SIMD DSPs With Very Wide Data Path
5	Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms
6	Research And Implementation Of Hierarchical Clustering Alogithm Based On MPI
7	Architectural Optimization Of Data Storage And Organization For SIMD Media Processors
8	Parallel algorithms for large-scale graph clustering on distributed memory architectures
9	Research On Parallel Optimization Technology For Accelerating Data-intensive Algorithms
10	The Design Of A 64-bit High-Performance DSP Parallel Memory Unit