Font Size: a A A

Internet Medicine Information Monitoring System Based On Focused Crawler

Posted on:2012-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Y YanFull Text:PDF
GTID:2218330368993356Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the world wide web in recent years, network has becomeanimportant way to access to information and transmit information, information in the internetgrowth exponentially have followed. Althoughthe Internethas greatlyfacilitatedthedevelopment ofpeople's lives, however, because of internet's characteristic such as extensivesource, large range, low cost of issuing information, difficulties in monitoring, many fake goodssellers that has been strongly combat in the market by law enforcement agencies transferred theplatform forselling fake products to the network, a large number ofcounterfeit goodsappearonthenetworkwithimpunity.In order to combatrampantsellingcounterfeit drugs, monitoring the informationontheInternetdrug trade is necessary. The key problem of monitoring the drug trade informationon Internet is topic search, and the focused crawler can be used in topic search. Focused crawleraims at one certain filed or faces the specific topic to obtain the high recall ratio and precision.But most of search algorithms are used in large topic search, effect of the search strategy thatusedinspecificsmalltopicisnotideal.Themainworkinthepaperincludes:1.Forthedifferentnetworkstructure'scharacteristicsof forumwebsiteandthegeneralsite,differentpagesearchalgorithmswereproposed.2. Aiming at the problem , which effect of the search strategy that used in specific smalltopicisnotideal,proposedacombinedstrategythatsearchedspecifictopicontheInternetbasedon analyzing focused crawler's searching algorithm. The combined strategy includedpage-searching and relativity analysis. Page relativity algorithm adopted improved Fish-Search algorithm; Relativity analysis adopted distributed algorithm, hereinto the first step made use ofVector space modelalgorithm to find out the great topic in the rough. The second steprespectively adopted improved Native bayes classification algorithm and k Nearest Neighborsalgorithmtoselectthecorrelativesmalltopicfromthepreviousstep'sresult.3. On basis of researching, developed an information monitoring system facing themedicine on Internet. By testing the data of some websites and forums'page, the result shows,thecombinedsearchingstrategyimproves theharvestratioandsmalltopicsearch's efficiencyofthefocusedcrawlersystem.
Keywords/Search Tags:focused crawler, medicine information monitoring, pagesearch algorithm, correlationanalysisalgorith
PDF Full Text Request
Related items