Font Size: a A A

Research On Key Techniques Of Vertical Search Engine

Posted on:2008-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:X W WangFull Text:PDF
GTID:2178360212485012Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid growth of the Internet poses unprecedented scaling challenges for general-purpose search engines. In addition, general-purpose search engines provide service for all users, so the results from them are too exhaustive. Thousands of irrelative results obviously do not meet precise search needs. Therefore, Vertical Search Engine which provides service in a single field emerged.Rather than collecting and indexing all accessible Web documents to be able to answer all possible queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. As only related pages are crawled, accuracy and efficiency of vertical search engines have improved remarkably. Currently, accuracy of Chinese Word Segmentation and Correlation Prediction are still to be improved, search strategy of Focused Crawler has yet to be further improved to enhance search engine coverage and efficiency.In Chinese Word Segmentation, this paper presents new algorithm named Adaptive Chinese Word Segmentation based on Theme which use Candidate dictionary and professional dictionary to guide segmentation and ambiguity elimination. It proved to be effective in raising precision of the professional word segmentation.In Correlation Prediction Algorithm, three models are presented in this pager: Correlation Prediction Algorithm Based on Father (CPAP), Correlation Prediction Algorithm Based on Hyperlink (CPAH) and TPR Correlation Prediction Algorithm. The anchor text and Correlation of Father Pages are involved in the CPAP model; CPAH model calculates correlation by the quantity and quality of pages; TPR algorithm combines the correlation and authority of pages, thereby it effectively prevent "theme drift" phenomenon.In the Web Search Strategy, this paper presents a sparse tunneling technology. it effectively addressed the exponential increasing problem with original tunneling technology. Sparse tunneling technology explore the entire Web sparsely, thereby it greatly improved the probability of discovering new web communities.Finally the design and the realization of the system are introduced, including the system structure and method.
Keywords/Search Tags:Vertical Search Engine, Chinese Word Segmentation, Focused Crawler, Tunneling, Correlation Prediction
PDF Full Text Request
Related items