Design And Implementation Of A Vertical Search Engine For Time-Sensitive Content

In order to promote the development of the family business and build the Yunnan Mobile community marketing system, we need to construct a unified business management platform. This platform is going to operate community information content and control information terminal system of families. The ultimate goal of the platform is to improve user’s daily life and also increase the stickiness. Community information platform is the core, and it will provide us various types of information including government news, people’s livelihood information, technology news as well as ticket price and also a lot of highly time-sensitive contents. How to enable users to get the information they want efficiently and accurately is the key problem.Based on the background above, this paper concerns time-sensitive data which have many characteristics. Then, through the analysis of data crawling, data extracting and cleaning, index creating, results displaying, some solutions have been designed. At the same time, results scoring and sorting are other related tasks. Through a lot of research about Lucene and Heritrix, and based on the technology accumulated in graduate period, the open source framework above are chosen to complete the design and implementation of a vertical search engine for time-sensitive contentAccording to the data characteristics of the community, this paper divides the related information to2types:the active time-sensitive information; the stable time-sensitive information. Then, requirements analysis and general architecture design are finished.During the detailed design and implementation process, time-sensitive content are crawled from the Internet according to its characteristics. The feature of crawling subsystem are customized and extended; secondly, based on information relevance and content freshness, information extraction and modeling are reached, and solutions for full-text indexing, maintenance and incremental index updating are proposed; according to relevancy and freshness of content, a sorting way for search results and a solution for results aggregating are given; finally the part of the system and all system has been tested, search processes evaluation are also shown, then, summary of work and future direction of this paper are pointed outStudy of this subject provides better information services for community users, which will provide a bright future for them.
Keywords/Search Tags:community platform, time-sensitive contentvertical search engines, data crawling index maintenance, resultsorting, content aggregation
