Design And Implementation Of Retrieve System Of Query Recommendation About Chinese News

Posted on:2015-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:J L Ji

Full Text:PDF

GTID:2298330422982074

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

When the foreigners query the information about chinese news, they like better queryexperience. Entrusted by the project team of â€œResearch of Cross-cultural Influence ofConfucius Instituteâ€, this paper realizes a simple retrieve system of query recommendationabout chinese news. The final purpose of the system is to help the users to make clear thequery intention when they are querying the information, and give further interesting terms tothe users. Finally, with the help of the query recommendation, the users can get the exact andcomprehensive webpage.The paper realizes three main modules of the system: webpage crawler, webpagepreprocessing, query recommendation. The complete system contains the webpage rankmodule, this module has been realized by others.The crawler module uses multithreading bases on HtmlUnit to get the webpage, and thesystem use Bloom filter to detect the same URL, this is an effective algorithm.The webpage preprocessing contains modules of the extraction of webpage, deleting theduplication of the webpage, webpage classification and webpage storage. The extraction ofwebpage module make use of features such as the density of the links, the density of words toextract the content. And we use the algorithm of Simhash to delete the duplicated webpage.The query recommendation module is the emphasis. Before the system give out thequery recommendation, the system must correct the error of the query words. The method ofquery correction is base on dual language model. This model uses the Bayesian probabilityformula and dynamic programming to correct the query words.To give the query recommendation, we must extract the import terms. The system use theopen-source of Stanfordâ€™s pos tagger to help to get the terms. The pos tagger is base onmaximum entropy model and the speed is fast.The system uses the termâ€™s vector which is formed from the context to represent the term,so we can compute the cosine similarity of the vector to give the recommendation to the user.Finally, the system realizes a kind of efficient index file to get the fast access of the data.And the index file can be batch updated.

Keywords/Search Tags:

query recommendation, web crawler, retrieval system

PDF Full Text Request

Related items

1	Information Retrieval And Query Recommendation For Information Precise Service
2	Design And Implementation Of Automatic Gift Recommendation System
3	Research On The Methods Of Intelligent Retrieval And Recommendation For Literature
4	Research On Semantic Processing Technology Based Information Retrieval Model
5	Research On Text Material Recommendation Method Combining Label Classification And Semantic Query Expansion
6	Design And Implementation Of Retrieval Question Answering System Based On Intelligent Recommendation
7	Design And Implementation Of E-commerce Price Comparison And Recommendation System Based On Web Crawler
8	Information Retrieval System Based On Document Query
9	Research And Implementation Techniques Of Information Retrieval Based On User Query Intention
10	Design And Implementation Of Web Crawler For Personalized Recommendation System