Font Size: a A A

Clustering Analysis Research Based On Uncertainty Data

Posted on:2006-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2168360152494975Subject:Agricultural mechanization project
Abstract/Summary:PDF Full Text Request
As progressing with computer hardware, data collecting method and mass storage device, database and information industry was rapidly developed. Large number of databases information was used in event management, information index and analysis of data. The vast data was described as "data rich, info poor". Sophistical methods of data analysis were needed because of plentiful data. Fleetly increasing data was collected and stored in a large dataset or a lot of databases. To analyze these vast data beyond human's capacity if we donledepend on any analytical tool. To make these large of data becoming useful resource, data mining technology was presented to mine or extract knowledge from large, incomplete, noisy, fuzzy data sets and databases. Data mining have a great future.Giving attention to uncertainty of data, this paper focus on clustering analysis of data mining. Field concept and cloudy model were used to research clustering. Gauss influence formula was applied to analysis data field, data radiation, spontaneous clustering, data characters of cloudy model. Clustering analysis was improved based on field.conception and cloudy theory and applied in atmosphere quality cluster analysis.Paper introduces data mining beginning and development, and summarizes orientation of development and research content of data mining. Domestic data mining research was described too. Data mining technology is knowledge about multi-subjects. Different knowledge and information were mined or extracted by diverse mining methods because different user needs different information. Frequent data mining theories were introduced, including association rule, spatial sequence rule, classification analysis, predication, clustering analysis, time sequence analysis, rough method and cloudy theory etc. Problems were questioned through studying.Data clustering analysis is one of the most active research subjects. It can be pretreatment step of other algorithm. Cluster mining demands flexibility, capacity of treating diverse attributes, discovering diverse shapes clustering, simplification of parameter input, disposal noisy and multi-dimensional data. To analysis partitioning method, hierarchical method, density-based method, grid-based method and model-based method. To point out k-mean and k-mediod algorithm defect which number of cluster must be indicated before clustering. It debases flexibility of data mining process. CLARA algorithm is one of partition method. CLARA can be applied to large dataset, but it is defect because initial centers were chosen randomly. CLARA clustering algorithm advanced based data field. Improved CLARA algorithm is more efficient and improves quality of result of data mining.
Keywords/Search Tags:clustering, uncertainty, data field, cloudy theory
PDF Full Text Request
Related items