| As we all know, the Internet has become the largest information resource platform.it has becomean important way to gather public information. Information workers in different areas concernto different themes, they need to track some of the website to obtain useful information every day.Tobic site means theme of the site is more clearly and focused. But the online world changesquickly, there may be a new website set up every day, or they did not notice the sites. If they do notfind these new site which are related to their own research topics, it may lead to miss importantinformation in consequences. Therefore, information workers can not be limited to track a fixed or afew sites, they need to find new sources of information and web sites to track constantly.How to discover new sources of information. Because Internet have vast amounts of information, ifwe do this entirely artificial,it maybe a heavy workload and efficiency is not high. Therefore,we propose to the computer to automatically help us find these topics relevant websites.Based on studying the related theories, such as keyword extraction, information acquisition andthe similarity calculation, this paper designed a solution to find topic sites automatically. The websitesthat the users concered can be called “sample siteâ€, we draw topic information from “sample sitesâ€.Design topic description model to describe the website, use the keywords of the model to search thewebpage.extract topic relevert web sites from the large scale of webpages. Then calculatethe similarity with the sample site. The high similarity of the sites are recommended to the user forchoose.Finally, we designed experiments to prove the plan. Keywords extraction experiments showthat the improved IF-IDF algorithm is superior to traditional algorithms in extracting keywords. Theresult of Topic sites found experiment and similarity calculate experiment shows that the program iseffective on finding topic websites automatically. |