Font Size: a A A

Based On The Theme Of The News Search Engine Research And Realization

Posted on:2007-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZhaoFull Text:PDF
GTID:2208360185961287Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It is widely known that news is the main competition content among countries'door websites, and news is recognized the kernel competition yet even though there are numerous ways for profit nowadays. Today, news competition not only the work among websites, but also the search engines'competition in this field is fiercer. News search obviously means searching related news by keywords.Besides television, radio and newspaper, internet has already become an important media and the main source where people can obtain news. It is a trouble and popular problem that how to lookup news conveniently for one nowadays, and this is the basic task for news search engine. It is also concerned by news service that how to inform users in time when great events happened.Due to the universal search engine's shortcoming——trying to index all Web pages and service the queries related to the all topics. It is hard to keep up with the Web increasing pace, and its response ability to queries has become worse and worse. So topic search engine appeared, and has become the hotspot in Web information retrieval field for years. Taking the universal search engine, the special search engine and the news search engine as the mainline, this paper introduces the news search engine in detail, with the technologies such as natural language process, text categorization, and personalized retrieval, then studies and implements a prototype system of news search engine based on the topic. The significant works that come out of the dissertation are:①Discussing and studying the natural language process, text categorization, user interest mining, information push technologies etc. at length related in this system.②Referring to the shortcoming of the hypothesis that the terms are independent from each other in VSM (Vector Space Model), and with the smoothing technology, a method is presented to improve the result of text categorization with mutual words'information and sequence based on the bigram model from statistical language model when the news pages are classified.③Implementing the information active service function for the email users who subscribed to news against the passive shortcoming of majority web...
Keywords/Search Tags:search engine, text categorization, user interest mining, information push, topic
PDF Full Text Request
Related items