Design And Implementation Of The Vertical Search Engines With User Interest Model

Posted on:2018-04-27

Degree:Master

Type:Thesis

Country:China

Candidate:M X Yang

Full Text:PDF

GTID:2348330518494398

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

In recent years, the influence of the 'Internet Age' goes deeper, and variety of information is flooded in the network. At the same time, it also brings the trouble of information overloaded. Users can not acquire available information quickly. So that, the availability of information has been reduced, lots of useful information can not be found in time.resulting in a "waste of resources." This paper introduces the design and implementation of a vertical search engine which combines user interest model. The concrete work is as follows:First of all, this essay clarifies the key problems that the system is expected to solve. It gives a brief workflow of the search engine and introduces some key technologies involved in the development process.Above all, focusing on the solution to the problem of the URL deduplication.Secondly, this essay introduces the analysis and modeling process of the user interest model in detail, then describes the way of collecting data from user and the classification on user behavior in Python environment.And on this basis the author brought out a quantification method of interest model based on hybrid behaviors, which highlighted the specificity of page browsing time and evaluated interest model based on other behaviors in the case of abnormal page browsing time.Thirdly, this essay introduces the architecture design of the system,consisting of the web crawling module, indexing and retrieval module,page display module. The vertical search engine system is developed by using Scrapy, BeautifulSoup, Whoosh and Flask based on Python. In the process of development, the author points out the problem that the original URL deduplication method of Scrapy framework can lead to serious memory consumption, and then propose a method of using a Bloom filter as a improvement method. According to practical experience, the author developed two strategies to prevent the situation that the URL we are requesting is prohibited. In order to improve Chinese word segmentation ability for Whoosh, the essay proposed the use of open source jieba word segmentation components.Finally, the essay applied the test on the system, which was tested for 32 days. The system was evaluated from four aspects: recall rate, precision,response time and dead-link ratio. By collecting the user evaluation and feedback, the conclusion was drawn.

Keywords/Search Tags:

User Interest Model, Vertical Search Engine, User's Behavior, URL deduplication

PDF Full Text Request

Related items

1	Research And Implementation Of Book Search Engine Based On User Personalization
2	Research On Domain-Oriented Search Engines Based On User Behavior
3	Research On Domain-oriented Search Engines Based On User Behavior
4	Application Of User Behavior Analysis In Search Engine
5	Personalized Vertical Search Engine For Basic Education Resource
6	Based On User Behavior Log Analysis Of Search Engine Ranking Algorithm
7	Analysis And Research On Personal Search Engine Based On User Interest
8	Recommendation System Based On User Interest Model Of The Search Engine Results
9	The Research On Personalized Search System Based On User Interest Model
10	The Research And Design Personalized News Search Engine