Font Size: a A A

Based On Reptile Technology Design And Implementation Of Small Vertical Search Engine

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:X K LiFull Text:PDF
GTID:2518306050480394Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The arrival of the information age has changed our lives dramatically.From the beginning,you can only read the news by watching newspapers and watching TV.Now you only need to open your mobile phone to know all the information.The change in lifestyle has made the Internet We are indispensable.But nowadays,there are more and more online resources,and it is more and more difficult for people to find the information they want.Therefore,the correct use of search engines can bring us a lot of convenience.When using a search engine,the user enters some keywords to perform a search,and the search engine returns a large amount of information related to the keyword to the user.In this search process,the most important thing is the crawler.Get the information you want online.A common search engine is to use a crawler to crawl the mass of data from the Internet to retrieve the information we want to use and return it to the user as quickly as possible.In addition,stock information is now one of the hottest words in our lives.Today there are more and more shareholders.Securities companies can now open accounts online.Most shareholders want to get a lot of stock information,but every time Each stock website has its own unique module,and users need to go to different websites to obtain information,such as: Snowball.com,Oriental Fortune,Netease Finance and other websites.Therefore,this article crawls down the most website-specific articles of each website,so that users can better search for a lot of information.The main contents are as follows:First of all,this article explains the definition and principle of vertical search engines,the characteristics of vertical search engines,and introduces some key technologies in web crawler engineering,methods for solving crawler problems,etc.,and some subsequent technologies,such as word segmentation technology.,BM25 algorithm.Secondly,this article mainly uses the Scrapy architecture for crawler development.For the crawler problems that the Scrapy architecture cannot solve,this article will use the most convenient and understandable methods known by the author to implement the most complex dynamic interface crawlers,such as Ajex dynamic encryption.Through Selenium and Chrome's developer tools,the problem of crawling dynamic web page data is solved.The information that you want to establish is that this article uses regular expressions and XPath selectors to filter Useless information.Then put forward the problems encountered in the crawler process,and give solutions.Then,according to the development process of the search engine,this article describes in detail how this article implements the search engine's most important index module,retrieval module,user interaction module,and used Jieba word segmentation,BM25 algorithm,and SPIM algorithm.Finally,test the system,for example,test word segmentation,crawler results,and search results.
Keywords/Search Tags:reptile, search engine, BM25, Jieba
PDF Full Text Request
Related items