Font Size: a A A

Research And Improvement Of The Topic Meta Search Engine Sort Algorithm

Posted on:2017-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:R WangFull Text:PDF
GTID:2348330488488248Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The generation of the topic search engine was used to meet the requirements of kinds of organization or industry about searching information from the internet. And with the progressing of the Science and the diversified network information, there was no search engine could be applied to all topic information areas, but the appearance of the topic meta search engine was a great way to do that. The way of combining the topic search and the meta search improved the precision ratio and recall factor. And the word segmentation technology and result sorting rules had a great influence on the search engine.This paper as the open-source Nutch search engine be prototype, use themes extractor extract seed sites in multiple search engines, and then search keywords from the various sub-site, achieving the search's topic and diversification, improving its precision and recall.For the problem of Nutch search engine by word segmentation and sorting accuracy rate in poor results, this paper mainly completed the following two aspects: First, refer to various information and documents related to the Chinese word widget, by experiments Paoding,IKAnalyzer and other Chinese word breaker compared timeliness and accuracy of the other aspects of the selected word in a lot of texts and other time and accuracy better performance and a wealth of local lexicon ICTCLAS2015 segmenter, improved Nutch Chinese word segmentation module. Second, the paper proposes the use of Page Rank algorithm and added to the local browser bookmarks as a reference method combines factors of Nutch scoring mechanism was modified and improved the accuracy of search results. Validated the improved algorithm and by experimental data analysis shows that the improved algorithm can not only improve the ranking results page higher PR value, but also for local bookmarks have relevance ranking of search results has improved.By combining ICTCLAS2015 Series Chinese word widget, Chinese word segmentation algorithm of Nutch search engine system has been improved.On the basis of secondary development, combined with the website PR value and local bookmarks factor on Nutch sorting algorithm is improved by experimental tests show the improved algorithm in the search results more precise, more in line with the needs of users.
Keywords/Search Tags:topic meta search engine, ICTCLAS2015, Chinese Segmentation, Nutch, sort algorithm
PDF Full Text Request
Related items