Font Size: a A A

Research And Optimize On Vertical Search Engine Based On Coreseek

Posted on:2017-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:2348330515964243Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Internet has become a direct way for people to get information,and people can not live without it.General search engine has the broad information coverage and comprehensive content,can meet the basic needs of users to search information,but the results returned to the users include plenty of irrelevant information.Vertical search engine make up for this weakness,it narrow the information domain coverage,it just index information within a certain professional field or a subject field,ensure the relevance of search results,in additional,vertical search engine will do some information preprocessing to web page,and the structured data returned to the users,so that search results clearly show.The working principle technology and implementation process of the general search engine and vertical search engine were introduced,and then the basic concepts of web crawler,structured information extraction,Chinese word segmentation technology and Chinese full-text search engine tool were analyzed.The main work done in this thesis includes the following.The thesis used the MMSEG word segmentation algorithm for information processing,in order to more accurate segmentation of the nouns of book,expanded in LibMMSeg thesaurus.Do comparative experiments with improved algorithm with the original algorithm,the improved thesaurus have good segmentation ability for book authors,publishers and other terms.Modify the Coreseek sort algorithm,compared with Coreseek basic sorting algorithm,the experimental results show that CORE_RANK sorting algorithm is more adapted to the short text of book search,more satisfactory answer returned to the user.Finally,The thesis use DouCrawler crawler system,to crawl the web information of books website,structured extraction,word segmentation,create index for it,display search results and finish a search engine for book information.
Keywords/Search Tags:Vertical Search Engines, Crawler, LibMMSeg, BM25, Books
PDF Full Text Request
Related items