Font Size: a A A

Application Research On Event Driven And Protocol Driven Of Given Field Oriented Of Topic Crawler

Posted on:2013-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhouFull Text:PDF
GTID:2248330392453467Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
According to the presented structure, network data is divided into the surface Web dataand the deep Web data. A large number of data on the Internet were hidden in the deep Web.Traditional topic crawler could only get surface data, not considered to crawl the deep webdata, so the recall rate is lowly. In addition, the common topic search engine return toomuch results that there are very messy, and not related to the topic. So, a topic crawler thatit can crawl the deep web data and the results data that are accurately and meeting to theuser’s requirements is main problem of the theme reptiles of topic crawler.The paper due tothe given topic, Used event-driven mode and protocol-driven mode, combined with thecharacteristics of the given topic, we built the system model. The work of the paper are asfollow:1. We studied the theory and the algorithm. We explored the trigger of Event Driven,Improved the recall rate of traditional topic crawler; We propose to combine Boolean Modelwith Vector Space Model on the preciseness of topic’s relation, and analyzed using BooleanModel and using combined with Boolean Model and Vector Space Model, the latter is moreaccurate on correlation.2.We propose a topic crawler model that it uses the events trigger mode, improved therecall rate. The model uses event driver to crawl deep web data. Use based on the BM linksstring matching algorithms combine with topic feature on links analysis, and use Bloomfilter to eliminate duplication, the method reduces the cost of match links string anddownload. And use based on content filtering algorithms that the Vector Space Model. usecar parameters as topic, Use interpreter parse returned asynchronous data, and use the parserwith the regular expression to parse tags. Finally achieve extraction and parsing of theinformation of the topic, improved the recall rate of pages.3.We build the topic crawler model that is bases on protocol driven, improved theaccuracy of result pages. We propose to combine the Forward maximum string matchingmethod with the Forward topic keywords matching method on segmentation, It can cut outmore topic keywords.Improved the accuracy of result information, and reduced mistakenlyfilter. We combine Boolean Model with Vector Space Model on link prediction. Improvedthe accuracy of link prediction. The topic crawler uses rotating machinery fault diagnosisknowledge as topic, expand to research the topic crawler. Eventually we can make the resultpages information as the knowledge library of expert system.
Keywords/Search Tags:Topic Crawler, Deep Web Topic Crawler, Event-Driven, Protocol-Driven, Chinese Segmentation, Vector Space Model
PDF Full Text Request
Related items