It's a real challenge for us to make Internet easier to use. The information in Internet is in short of organization, and full of a mass of pages, and on the other side, people want to obtain the information quickly and accurately. The technique of clustering, classification and abstracting based on AI, and so- called "Knowledge Indexing" technique, seemed as good approaches to solve such problems. This thesis aims to discuss the clustering/classification techniques with the background of information retrieval.At first, we summarize the key techniques used to do clustering/classification in different fields such as statistics, machine learning, pattern recognition, etc.We proposed a new classification algorithm based on theorem of "information granularity". We found that clustering corresponds with a special equivalent relation on the sample set, and a series of equivalent relation with different information granularity correspond with a clustering diagram. From the view of granularity, thing is more clear that clustering is a procedure in a uniform granularity, while classification in different granularities.After selecting terms to represent the sample, we can treat the samples as points in the term space, which has the same weight and different coordinate. Let's consider the energy field constructed by the universal gravity, we can obtain a topology structure from the relation among equilibrium curve with different energy. And the topology structure is corresponding with a special clustering diagram. We...
|