Font Size: a A A

The Key Technologies And Realization Of Vertical Search Engine For Expert Information

Posted on:2011-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:S B LiuFull Text:PDF
GTID:2178360305493588Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of awards assessment in science and technology, and evaluation in technical achievements and technology projects, etc, the justice and faireness of the work depends on whether the proper and authoritative experts would attend in. Vertical Search is a grand issue under research at present. Vertical Search Engine focuses on specific and deep vertical service, and is committed to information and content of the full depth in a specific area. The effect of expert information search will be greatly improved if Vertical Search can be introduced to the search service.The research and implementation of two key issues which are as follows are done, according to the general development process, combined with the inherent characteristics of expert information field.First, Deep Web spider. A large number of Deep webs present in expert information data source, making the crawling effct of ordinary web spiders not as good as ideal. In order to ensure the accuracy and integrity of crawl by expert information, Deep Web Web Spider of WatiJ was designed and implemented in this thesis, which is a Web-based automated testing tools. A principle is elaborated, which described how to achieve anthropomorphic interaction modes using WatiJ form, such as the user submitting queries, clicking the flip button circularly. Examples of the key steps to crawling in dynamic webs are given. Experimental results indicate that the spider is an effective solution for crawling in dynamic webs which are authorized data source.Second, Chinese words segmentation in the expert Information cell thesaurus. The effects of Chinese words segmentation are largely determined by whether the vocabulary-building is good or not. According to the characteristic of related fields of expert information, the composition of expert information on cell thesaurus is given in this thesis, as well as Lucene-based devices to implement the Chinese word segmentation, combining with the Maximum matching positive algorithm.In addressing two key issues, system design and implementation are presented in the last of this thesis, which is based on S2SH architecture. And the future prospect of the work is stated.
Keywords/Search Tags:vertical search, web spider, dynamic webs, cell thesaurus
PDF Full Text Request
Related items