Font Size: a A A

Research On Design And Implementation Of The Extensible Distributed Vertical Search Engines

Posted on:2009-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2178360278957077Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
It is known that there are a lot of hidden resources in the Internet which are not easily explered by the users for many reasons. Because the quantity and quality of these hidden resources exceed the ordinary ones, researches on their exploration become increasingly important. General searching engines can not grasp the information fully due to the restrictions of the crawl depth. The general crawler is prohibited to access many web sits for the limited permission and can not adapt the diversiform web pages. The vertical searching engines are superior in mining hidden information compared to general ones. They adopt specific crawling strategy and analytical method for the characteristics of the resources and can extract highly accurate web information. They can provide the specially selected information in some field for the users.The technologies of the search engines are studied in this dissertation. A crawler based on the tree structure is proposed for the web sits of foreign military forums, through the analysis of the various focused crawlers' strategies. Usually forums accord strictly the tree structure in the network distribution, so the selection scheme of the crawling link can be added to crawl in the web pages containing information. In information classification, the forum postings contain a lot of useless information (post, malicious post), which statistically contain two features: few words and paragraphs. A method of information classification is proposed based on the fuzzy pattern recognition. Using the quantity of words and paragraphs as an effect factor, determining the effect and weight with the sample analysis method. The quality of the classification is improved effectively by calculating classification formula with S-function. In the index searching, a vertical search engine with Lucene's method is studied and a buffer method is proposed to solve the users' inquiries. The response speed is improved greatly by using OSCache. Based on the study of the search engines, a search engine is designed and realized using Java for military information in the Military.com forum. At last the structure and operational scheme of the various distributed search engines are studied and the system framework of the distributed vertical search engine is proposed based on the design of distributed CORBA model.
Keywords/Search Tags:vertical search engines, distributed, focusedcrawler, fuzzy classification
PDF Full Text Request
Related items