Font Size: a A A

Research On Several Key Issues In Unsupervised Knowledge Discovery

Posted on:2006-02-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:W D DaiFull Text:PDF
GTID:1118360212989250Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, various types of data has been produced and accumulated in every industry with the rapid development of computer and communication technology representative of information science and technology, which makes the discovery of worthy and new knowledge from the large amount of data in urgent need by using the method of knowledge discovery in order to facilitate the comprehension of interrelated nature between things reflected by data from comprehensive perspectives and in multi-levels. The traditional linear-transformation methods in knowledge discovery, such as PCA and CMDS, are insufficient in dealing with nonlinear and strong correlated high dimensional data, which puts more restricts on their applications. During the process of data mining, most density-based methods have the limitation of global density threshold, sensitivity to input parameters, and so on. Aiming at solving the problem in the present research, this dissertation brings forward these corresponding methods and researchs their applications on document processing.In course of data pretreatment, the dissertation puts forward a new method of manifold study, that is, previous predictably increasing embedding PrePIE algorithm. This method combines global optimizing technology with local self-organizing principle, approaching global optimizing manifold reconstructible quality based on local optimizing embedding. The reconstructible quality of low dimensional embedded manifold is enhanced from the following three aspects, i.e. method of anchor set option, mode of anchor set embedding and global set embedding. As a result, the stability and usability of manifold low dimensional embedding are improved.In face of the current global density threshold limit in the method of density based clustering knowledge discovery, the dissertation sets out algorithm CABDET based on local density distribution self-regulating radius of neighborhood. Algorithm CABDET establishes the adjacency relationship between clustering objects, and adjusts the neighborhood radius of the present nodes by reviewing the local density of parent nodes in real time, repeatedly searching for their respective son nodes until no more new son nodes could be found.Nevertheless algorithm CABDET needs long running time and the phenomenon of cluster splitting is shown when it is set in minimal parameters. Therefore thedissertation brings forward a hierarchical density-tree clustering method LOCHDET based on local computing. This method transforms the global computing similarity to its local counterpart between objects through the pre-designated local computing coefficient, which greatly enhances the running efficiency of this algorithm and realizes the manifestation of row-based compressed storage experiment result on sparse similarity matrix. As for the test set of normal distribution, the ratio of speedup time between LOCHHDET and CABDET is 6~8. Furthermore, LOCHDET adopts the hierarchical clustering method to satisfy cluster mergence under certain circumstances, which dramatically improves the clustering quality and solves the problem of cluster split in algorithm CABDET.The dissertation researchs the holistic effectivity of PrePIE and CABDET on the normal text test set characteristic of high dimension with the objective measure of pattern interestingness F-measure value after discussing the LOCHDET's completeness of pattern and pattern evaluation. The result of the experiment indicates that PrePIE has the ability of reducing the dimension of nonlinear text data effectively, and CABDET is capable of discovering various clustering modes whose clustering effects are distinctly advantageous over those of DBSCAN.
Keywords/Search Tags:Cluster Analysis, Data Mining, Knowledge Discovery, Speedup, Manifold Learning, Text Mining
PDF Full Text Request
Related items