Font Size: a A A

Research And Implementation Of A Vertical Search Engine For Quiz Website

Posted on:2014-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:M LiangFull Text:PDF
GTID:2248330398472208Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Currently, with the proliferation of information on the Internet, people want to find information from the mass of information through a search engine, but the general search engine’s problem is more and more obvious, the accuracy and relevance of the information was declined, people need find specific information quickly and accurately with vertical search engine. Numerous quiz website emerged in recent years, differ from search engine get results through the machine,such sites provided "a question and many answers" based on to user’s knowledge,and in the same time these websites has characteristics such as "targeted","quick Q&A"," traffic flow",is an effective way for users to Q&A, but most of the Quiz website only provide search service in its own website, and the technology of vertical search engine for quiz website is not very mature, it’s inconvenient, vertical search engine for quiz website can meet the needs of the users to find specific answers.This page use car topic in quiz website as experimental data, the author did a lot of research on the key technologies of the search engine, and complete quiz website search engine, the main work of this paper are as follows:1) Based on the research on the quiz website, raised the evaluation mechanism for the importance of quiz site. In this mechanism, solved the problem of equal treatment of different the quiz website sources, and improved the vector space model, added quiz website feature to the vector representation, improved the TFIDF algorithm in VSM according to evaluation mechanism.2) Topic crawler of quiz website vertical search engine. After reasearch on the general model ot Topic Crawler,raised a method based on nutch configuration file in the link filter aspect,adopted Document Frequency algorithm to build topic thesaurus, proposed a method to established the initial seed module base on expert experience and meta-search engine, adopted text classification method to determin topic relevant degree, proved that the modified effect of text classification has a10degree of upgrading through the precision parameter.3) Information extraction module. Adopted extraction method based on Html structure,in the chinese word segmentation aspect,used Paoding word segmentation.4) Indexing and retrieval module. Proposed to add quiz information index field, emphasized the search focus, set the weighting factor of the quiz information index field according to evaluation mechanism for the importance of quiz site, made the result of quiz search engine more reasonable.Finally, realized the quiz vertical search engine by Nutch framework, and the experiment results show that the creeper grasping efficiency and the precision parameter for query was improved.
Keywords/Search Tags:vertical search engine, quiz website, topic crawlerinformation extraction, nutch, text classification
PDF Full Text Request
Related items