Font Size: a A A

Research And Implement Of Individualized Vertical Search Engine

Posted on:2017-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhouFull Text:PDF
GTID:2348330536968164Subject:Engineering
Abstract/Summary:PDF Full Text Request
There exist many problems about conventional search engines,such as ambiguous orientation,unprofessional search results and unreasonable ordering.The vertical search engine aimed at specific topics is developed to solve these challenging problems.(1)the research on topical crawel techniquesFirst,three algorithms are implemented.Those are Context Topic Description Algorithm based on SVM Taxonomy,Topic Relevance Algorithm based on SVM Taxonomy and Topic Crawel Algorithm based on SVM Taxonomy.After that,precision ratio and recall rate are used to evaluate the performance of topical crawler algorithms.The experiment results prove that proposed methods can achieve better performance in term of both the acquiring topic-related webpages and avoiding topic drift.(2)the research on webpage structural information extractionThe method for extracting webpage structural information is implemented by label sequence algorithm based on webpage source code.The method includes two parts: the sample training module and topical information extraction module.The sample training method first generates topical label sequence,position vector and the topical attribute file for every sample.Then,generative rules are stored to the rule base.The topical information extaction module first generates a lable sequence for the webpage,after which judges whether the content of regional lable contains topical information.Finally,the extractive information is stored to the topical information base.(3)the construction and implementation of the expert robot vertical search engineThe framework and the core modules are designed individually.Among these modules,the webpage ordering module considers about HITS ordering algorithm,PageRank ordering algorithm and the reference frequency for expert research achievement comprehensively.For duplicated webpages deletion module,it judges the similarity of documents by using the improved hash algorithm.The cash module enhances user performance through storing searched webpages temporarily,which can accelerate the webpage visiting speed.Finally the open source framework is leveraged to implement a real expert robot vertical search engine.
Keywords/Search Tags:vertical search engine, topical web crawler, structural information extraction, SVM taxnomony model, web label sequence, webpage ranking, duplicated webpages deletion
PDF Full Text Request
Related items