Font Size: a A A

Research On Cluster Boundary Detecting Technology For Categorical Data

Posted on:2013-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2248330371476209Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and information technology, human society steps into the information age. At the information age, a notable feature is messages which individuals face everyday are abundant and demands for information are increasing. How to extract useful knowledge from these messages becomes a problem which should be solved by people. Data mining technology is a good way to extract the useful information from messages.Clustering is an important branch of data mining, cluster boundary detection is further refinement for clustering. The cluster boundary detecting technology is an new emerging research field which has widely applied at many prospects, which can be applied to many fields such as:biology, epidemiology, statistics, genetic engineering and so on. The cluster boundaries are data objects with multiple clustering characteristics, and they are valuable. For example:the company may be interested in consumers, who show great interest in products of the company but they did not buy anything before the product market survey. According to the survey information of the above consumers, the company may adjust the strategy of company to cater for those consumers than blindly fight users who are no interest in the products of the company. The boundary consumers are more chance to become customers of the company. The cluster boundary-detecting algorithm can effectively get the boundary objects. The article carries on research on the following two respects of cluster boundary detecting technology:Firstly, for the research on the cluster boundary is mainiy concentrated on the numeric attribute data, the research on the categorical attribute data is poor. Some relevant research on boundary detecting for categorical attribute data is proposed at the paper. Boundary degrees which represent the closeness between categorical attribute data objects and the class which they belong to. as well as a formal definition of the categorical data clustering boundary is given, and a categorical data boundary detection algorithm, called CBORDER which use the idea of evidence accumulated, is given. Experiments on real data sets show that the algorithm can effectively extract boundaries of high-dimensional categorical data.Second, the existed algorithms usually have a low quality on the results and poor ability to remove noises. To deal with the above shortcomings, a new algorithm, which combined with the BRIM algorithm in the positive and negative half of the neighborhood thought and the idea of joint entropy in the EDGE algorithm, named EBRIM, is provided. Experiments on synthetic data sets show that EBRIM algorithm can extract higher quality boundaries than the above two algorithms. It also can more effectively remove noise significantly better than the above two algorithms.
Keywords/Search Tags:Data mining, Clustering, Boundary points, Categorical dataset, Evidenceaccumulation
PDF Full Text Request
Related items