Design And Implementation Of The Campus Vertical Search Engine Based On Scrapy

Posted on:2021-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:W Ma

Full Text:PDF

GTID:2428330602986160

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of campus digitization,the amount of campus information on the Internet is also increasing.It is more difficult for users to retrieve relevant campus information.At present,most teachers and students on campus use the search function on the school website to query the required information.However,the general search engine contains a wide range of information categories,covering a wide range of fields.It is difficult to search the subject.Some campus websites are used in the local area network,and it's information cannot be included by the general search engine.In response to the above problems,this article takes the campus website as the research object,and designs a campus vertical search engine based on Scrapy based on its structured characteristics.The search engine is composed of three important functional modules: page download,index retrieval,and search query.The main purpose is to provide convenient and fast professional search services for teachers and students on campus to promote the development of a digital campus.The research work in this paper includes:(1)The article develop a personalized web crawler program based on the Scrapy framework,and analyze Scrapy's own method of deleting duplicate links.When facing the situation that it will consume a lot of memory,it carry out the task of large-scale website information crawling.Adopting the Bloom filter to the Scrapy framework to improve the ability of the crawler to delete duplicate links.Then combined with the experience summarized from the actual application,the corresponding solution is proposed for the phenomenon.The crawler program is forbidden to be accessed by the website server due to frequent access to the target website.(2)Improving the PageRank algorithm increases the ranking quality of search results.Author research and analysis of PageRank and HITS respectively.Because PageRank has the characteristics of higher calculation efficiency and larger amount of calculation data,this paper uses PageRank as the sorting algorithm.In addition,according to PageRank sorting results,there are shortcomings such as bias to the original page,average division weight value,and theme drift.Adding time influencing factors to improve the page score;adding weight value influencing factors to improve the weight value of the out-of-chain web page;adding topic relevance Influencing factors to reduce the phenomenon of "topic drift" in search results.(3)Then,author use the Whoosh search library to build the index,and introduce Jieba word breaker in the index retrieval module to improve the word segmentation ability.At the same time,using Flask design the search query module of the search engine.By entering a query sentence,you can quickly get the search results with high topic relevance.It will provide users with good search services.(4)Finally,it test and analyze the system.By testing the Bloom filter applied to the Scrapy crawler program,the result can greatly reduce the memory space occupied by the program when running;by testing the improved PageRank algorithm,the result can optimize the sorting quality of the retrieval results.

Keywords/Search Tags:

Vertical search engine, Scrapy, PageRank algorithm

PDF Full Text Request

Related items

1	Research And Application Of PageRank Improved Algorithm In Vertical Search Engine
2	Research And Implementation Of Tax Vertical Search Engine And Improved PageRank Algorithm
3	Research And Implementation Of Vertical Search Engine For Network Literature
4	PageRank Algorithm Based On Chinese Research And Application Of Vertical Search Engine
5	Research And Implementation Of Medical Vertical Search Engine Based On Improved PageRank Algorithm
6	Improvement And Implementation Of Vertical Search Engine Based On Nutch
7	The Study And Realization Of Vertical Search Engine Oriented On The Car Subject
8	Research And Implementation Of Web Crawler On Vertical Search Engine
9	Research And Application Of Vertical Search Engine In The Tobacco Industry
10	Research And Design On The Search Engine Based On The Enhanced Similarity Pagerank Algorithm