| With the developing and dissemination of the internet,"Retrieval"has become a part of daily live. Internet joins all over the world together, but how can we find what we need? The answer is retrieval.Literature retrieval is most usefulness for researchers in many retrieval systems. But now most retrieval systems can only have retrieval technique on matching of keywords, but it can't get the interests of the users. If the system can get them, it will be convenient for the users, because it can put the interested literatures in the head. Our teams have started to design a system which can get the interests by the behaviors of the users, and compose the user whose interests are similar to a user group. So they can exchange and sharing if resources. The paper discusses the basic part of the retrieval system which our team designs, my work contains text processing, clustering. I complete the process which can convert the words to vectors. It can control the stop word list, generate vector. I also improve the AP cluster.Affinity propagation (AP) clustering has one advantage: if you don't know the number of clusters, you have no use for specifying the number of clusters. Sometimes, we know the number of clusters, how can we use this to improve quality of AP clustering results. This paper proposes an improved AP method to deal with such circs. In comparison to AP, the improved AP has better performance on the data sets whose clusters number we have known. Experimental results show that the improved AP is effective and its quality of results is better than or equal to that of AP clustering. |