Font Size: a A A

Design And Implementation Of A Distributed Vertical Search Engine For Blog

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:R LinFull Text:PDF
GTID:2518306338468674Subject:Computer technology
Abstract/Summary:PDF Full Text Request
To judge the success of a search engine,user satisfaction with search results plays a key role.Due to the excessive amount of content searched by general search engines,low relevance rate,high repetition rate,and complex content,if users want to query related webpages in a certain field or profession in the general search engine,the retrieval process still needs a lot of effort.To exclude useless information.The emergence of vertical search engines is precisely to solve this problem.Traditional search engines are not efficient in searching for information and data on blog web pages,and cannot meet the needs of users who want to search for specific blogs.This article focuses on the improvement of the search engine web page relevance ranking algorithm and the improvement of the new word discovery algorithm.On this basis,a distributed vertical search engine for blogs is proposed to obtain and analyze the information of blog web pages and build web pages.Indexing,combined with user historical search records,designed and implemented a distributed vertical search engine for blog web pages to improve retrieval efficiency and retrieval accuracy.The main innovations and work of this article are as follows:(1)In view of the low accuracy and low quality of the current search engine search results ranking algorithm,the web page relevance ranking algorithm is improved,combined with the characteristics of the blog web pages,a web page ranking algorithm based on BM25 relevance is proposed.The PageRank algorithm of the text redistributes the weight of web links,and at the same time uses the relevant attributes of the blog web page itself to comprehensively rank the search results to increase the importance of new web pages;(2)In view of the misclassification and low efficiency of the current new word discovery algorithm based on mutual information and adjacent moisture content,Trie is used to build an index tree to improve search efficiency,and at the same time,the N-Gram model is added to splice the fragments after word segmentation.Recognition of long words;new words are obtained by calculating the internal solidification degree and the adjacent moisture content of the spliced words based on the N-Gram model;(3)Carry out the detailed design and concrete realization of the search engine system.The modules implemented by the system include a web crawler module,a data index module,and a user search module.By crawling and analyzing web pages on the Internet,data indexes are established to realize keyword prompts,web search results sorting,and web personalized recommendation functions.Related technologies mainly used in the process of system design and implementation include webpage deduplication algorithm,Elasticsearch framework,new word discovery algorithm,webpage ranking algorithm,webpage recommendation algorithm,etc.(4)After the design and implementation of the search engine system,the practicability,effectiveness and real-time nature of the system have been verified through various tests and analyses.By re-sorting the search results,more satisfactory results are returned to users,and the user experience is impro ved.
Keywords/Search Tags:vertical search engine, Elasticsearch, new word discovery algorithm, web page ranking algorithm
PDF Full Text Request
Related items