Research On Dimensionality Reduction And Clustering Algorithm Of Commercial Data Streams

Posted on:2012-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:Z Mei

Full Text:PDF

GTID:2189330332483136

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

In the late 20th century, data streams are widely used in business areas as a new and more realistic data model. The data streams have characteristics of large number, unlimited, concept drift, rapid change, need rapid response, and large cost of random access. In adition, it contains valuable information of enterprise, such as the operation laws, management requirements, influencing factors, and variation trends, better reflects the business operation, service contents, service targets and other dynamic changes. At the same time, these infinite and variability data streams also brought some challenges to computer storage space, computing speed and communication capacity. The data mining technology has made a lot of results in mining static data sets, but expanded to the dynamic data streams mining, especially the dynamic commercial is still a great challenge.In the dynamic data streams environment, the rapid growth of data and higher dimension lead to current existing algorithm function against small amount and low-dimensional space of data declined rapidly, and similarity measure of low-dimensional space will be no longer exists. This paper uses sliding window as data streams uniform management model. First, in view of dimensionality reduction for data streams, this paper comment and review thoroughly for high dimensional reduction from two aspects, they are feature extraction and feature selection, and analyse the latest six research trends on dimensionality reduction. At the same time, in view of data clustering, this paper make a comparative analysis of clustering algorithms from both aspects of traditional static and dynamic data streams. Then we design two methods of the high dimensional reduction based on review of previous research in chapter II. The first is based on rough set theory for dimensionality reduction, it compress from two aspects of affairs and dimensions. On the one hand, it compresses the affairs under maintaining the dimensional characteristics, increase dentification capabilities between affairs. On the other hand, through testing the hypothesis between the correlations of dimensions, effectively removed the dimensions influenceless on the decision result. The second is a method of commercial data processing based on equivalence class of rough sets, which uses the characteristics of the relative independence between condition attributes in decision-making table to carry on reduction. It's a a new dimensionality reduction algorithm, and make the sample analysis on partial data of customer's evaluation table, the experiment show that the algorithm can reduce dimensionality effectively on the premise that preserve the original information. Finally, a method for data streams clustering in the constraints of limited resources is investigated, and design an improved clustering PDStream algorithm for dynamic data streams based on principal component analysis and density. It uses two-stage model for clustering operations, uses summary data to execute simply second clustering and update the clustering results. Experiments show that, PDStream algorithm has the superiority of handling massive data and the characteristics of high-quality clustering, and apply PDStream algorithm to a commercial field based on life cycle of data mining, achieved anticipated effect.

Keywords/Search Tags:

high dimensional data streams, dimension reduction, rough set, density, data streams clustering

PDF Full Text Request

Related items

1	Research And Application On Co-Clustering Algorithms For High Dimensional And Very Large Data
2	High-dimensional Data-driven Credit Risk Evaluation Of Online Loan
3	Smoothed Generalized Empirical Likelihood And Elliptical Sliced Inverse Regression In High-dimensional Data
4	Effects of stakeholder involvement in reduction of sedimentation in northern virginia streams
5	A Multiple Streams Analysis Of Corporate Pension System Change
6	Theory And Application Of Structured High-dimensional Multiple-index Models
7	A study of impact of urbanization on ephemeral streams in headwater watersheds in eastern Pima County, Arizona
8	The Process Of Land Transfer Policy. Multiple Streams Perspective
9	Research On Visual Analytics Towards Ecological Economics Data
10	Research On Spectral Clustering Methods And Their Applications In Financial Time Series Data Mining