Font Size: a A A

Study On Continuous Attributes Discrtization Algorithm Based On The Nearest Neighbor-clustering

Posted on:2010-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:G Q JiangFull Text:PDF
GTID:2198360302976228Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Data mining is one of the most active fields in database and artificial intelligence nowadays. It is regarded as the core of knowledge discovery in database and aimed for discovering hidden, potentially, unknown and useful knowledge in data. In essence, data mining is to find common patterns and rules from big database.In the real database, the data records are composed of some attributes with continuous value, since some of the existing method of data mining are capable of dealing with the discrete attributes only, it is necessary to discretize the continuous attributes firstly. Due to the above-mentioned fact, the study of the algorithm for continuous attributes discretization has become an important fundamental work to the research of data mining, which can give a deep influence on the result of data mining process. Continuous attributes discretization as an important part of data mining has become an very important research direction. For some means, the quality of discretization determines the quality of the data mining.In this paper, a continuous attributes discretization algorithm is proposed which is based on the nearest neighbor-clustering algorithm. The attributes discretization algorithm is a global discretization based on the overall attributes, and is completed with a two-step strategy.The prime task of this article is as follows:Firstly, analysising some related background materials for the continual attributes discretization, introducing the development of continual attribute discretization at home and abroad, and pointing out the challenge and shortcoming for continual attributes discretization.Secondly, analysising some related knowledge for data mining, such as the definition of data mining and the basic process of data mining.The most important is in the chapter two giving the mathematics description of the attribute discretization, the meaning and importance of continual attributes discretization ,the target of continual attributes discretization, the classifications of continual attributes discretization and some continual attributes discretization methods used presently.Thirdly, introducing the concept of cluster analysis, the process of cluster, some used cluster algorithms, and giving more analysis for the nearest neighbor-clustering.Fourthly, giving the inherent mechanism analysis between the nearest neighbor- clustering and attributes discretization.Finally, some improvements for nearest neighbor-clustering is proposed, then designing a novel attributes discretization algorithm which is based on the nearest neighbor-clustering, and apply that in the weather information decision system and UCI machine database study.
Keywords/Search Tags:continuous attributes discretization, cluster analysis, nearest neighbor-clustering, data mining
PDF Full Text Request
Related items