Font Size: a A A

Design And Implementation Of Vertical Search Engine Based On Elasticsearch In The Construction Sector

Posted on:2022-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y X GuangFull Text:PDF
GTID:2518306575960979Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In daily life,people have become accustomed to obtaining information from the Internet,and search engines have become a bridge between people and Internet information.With the continuous development of informatization in various industries,data has also shown explosive growth,but there are many problems in these data,such as inaccurate,redundancy,and low relevance.Therefore,how to quickly and accurately provide users with the information from massive data has become an urgent problem for search engines.The vertical search engine is a tool that provides information retrieval in a specific field,and the retrieval results are more professional and accurate.For the traditional full-text search engine in the field of construction can not provide more professional answers and search efficiency is low.This article proposes a vertical search engine in the field of construction based on Elasticsearch.Firstly,this thesis first introduces the research background of search engine and the research status at home and abroad,and then introduces the relevant technologies to implement the search engine in this paper,focusing on the Elasticsearch engine technology,Scrapy framework,Django framework and Chinese word segmentation technology from the functions and principles.Secondly,in the data collection part,scrapy framework and depth-first crawling strategy are used to realize the information collection of architectural texts.At the same time,data cleaning and data processing are carried out according to the relevant strategies of data processing,thereby improving the real validity of the data.Then,in the part of Chinese word segmentation,in order to solve the problem of inaccurate recognition of unregistered words,a method of screening the generated preselected words based on the first word segmentation through mutual information and adjacent entropy is proposed.The candidate words that reach the threshold are added to the professional dictionary,and then the generated professional dictionary is used for the second word segmentation.Then,in the page sort part,learning and analyzing the Page Rank algorithm,summed up the Page Rank algorithm problems.The improvement is made from two aspects: firstly,the topic relevancy is calculated and the user feedback factor is added to correct the problem of topic deviation;secondly,the time feedback factor is added to speed up the floating speed of new web pages,to correct the problem of weighing old web pages over new web pages.Finally,the vertical search engine in the construction field is designed and implemented from the overall structure to each functional module,and the function and performance of the search engine are tested.The test results can meet the needs of users and have high practical value.
Keywords/Search Tags:Vertical Search Engine, Chinese Word segmentation, Web page sorting, Adjacency entropy, Customer feedback
PDF Full Text Request
Related items