Font Size: a A A

The Design And Implementation Of Vertical Search Engine For Resold House

Posted on:2019-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:X W ZhuFull Text:PDF
GTID:2348330569988473Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the arrival of big data era,various fields are filled with a large amount of information.how to retrieve useful information has become the top priority in all fields.Therefore,developing a resold housing search engine platform with vertical search engine technology has became a problem that needs to be solved in this field.Theme web crawler strategy integrates webpage acquisition,web page segmentation,topic relevance determination and information extraction.There are many obvious advantages of this strategy in the collection of web page in specific areas,such as the improvement of the utilization of network resources and the accuracy of information collection.In calculating the degree of correlation of web pages,this thesis considers the degree of similarity between the content block which the link belongs to and the topic,and filters the pages that are not relevant to the topic according to the weight which combines the similarity of the links with the similarity of the content of links.Also,the amount of traffic that the crawler service has on unrelated links can be reduced in this way.In this thesis,the full-text search framework Lucene,distributed crawler framework,and HBase cluster are combined to develop a vertical search engine for resold houses.The crawler framework captures data from multiple domestic resold housing websites in real-time.Till now,tens of millions of resold houses data have been crawled.The data is stored in HBase clusters for data analysis and mining.The vertical search integrates synonym and the related feedback algorithm Rocchio to extend the original query,and then optimizes Lucene's default query,also implements various retrieval functions such as field query,fuzzy query and so on.Finally,the function test of the web crawler and retrieval module in the vertical search engine of the resold house is done,and the result is compared with the search effect of the general search engine.The result shows that the search result set of this system is more accurate.At the same time,the thesis also tests the extended queries and domain queries.It is found that the result of the extended query has a higher recall rate.
Keywords/Search Tags:vertical search engine, topic crawler, relevance feedback algorithm, nosql
PDF Full Text Request
Related items