Font Size: a A A

Design And Implements Of Witkey Vertical Search Engine

Posted on:2011-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y W WuFull Text:PDF
GTID:2178360308963867Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Since 2006, Witkey as a network phenomenon had rapid growth up. The employers paid to solve their problems and millions of workers earn money through online witkey platform. Compared to popular search engine, vertical search engine is an effective search method that allows users query professional information. This thesis aims to research and develop such a vertical search engine of witkey, so that employers and workers can use it to query witkey information in a fast, accurate and efficient way.Major witkey sites were investigated quantitatively in this thesis, results showed that the task count meet power-law distribution. The task coverage of four witkey sites - Zhubajie, Vikecn, Taskcn, Toidea was about 94.54%. So we use these four sites as data sources of this thesis.A witkey vertical search engine was achieved in this thesis. The engine consisted few subparts: crawling, parsing, storage, indexing and ranking analysis and so on. The crawling system implied with self-developed spider implementation. The parsing system was built on XPath and few primitive were developed to extract data from XHtml. Based on db4o, a famous object database, the storage system stored intermediate data and some results. The indexing system parsed and storage index using lucene. The ranking system based on relevance of data and other attribute values of tasks or workers.This thesis emphasized the detailed analysis of the data extraction and rank strategies. Some ranking strategies, such as relevance ranking, reward ranking, credit ranking, deadline ranking, view count ranking, candidate count ranking, difficult ranking and risk ranking were developed on tasks. Ranking strategies, such as relevance ranking and income ranking, were developed on workers. Further ranking strategies were developed on both tasks and workers, for example, weighted ranking on query adhoc.This vertical search engine can be used as an effective tool by witkey workers and employers. At the end, this thesis summarized the results, proposed some insufficient and caught some prospects of this vertical search engine.
Keywords/Search Tags:Witkey, Vertical Search Engine, Information Extraction, XPath
PDF Full Text Request
Related items