| INTERNET along with the rapid development of network resources in the geometric growth. People are not more satisfied with the services provided by major search engines, search engines began to pay attention to the topic-specific search engine. It pays attentions to a light of a particular area, a specific group or a specific theme and provides focused and in-depth information and services. The topic spider as an important part of the topic-specific search engine, it has a direct bearing on the quality of the collected resources. So the theme of how to design a high-quality topic spider research on the subject has become an important topic.This paper is the major theme of topic spider and realization of its main contents are as follows: the differences between the traditional web spider and the topic spider on structural design, design attention to the performance bottleneck analysis, search strategies and themes related to the themes of calculation algorithm study. Achieved the following results:(1) Design and study a topic spider of distributed, dynamic allocation, scalable architecture;(2) The definition of the concept of seeds website. And how fast and efficient seed collection site;(3) Consider the system needs, improve vector space model algorithm;(4) On the basis of theoretical studies, developed a topic spider for the topic-specific search engine. |