Design And Implementation Of A Movie Search System Based On Distributed Crawler

Posted on:2019-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Shi

Full Text:PDF

GTID:2428330590482839

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of big data,the value of data is becoming more and more important.Massive data which also contains file information data has enormous research value and commercial value.In the past,administrators generally imported relevant data manually.Now,web crawlers can be used instead of administrators to crawl rich movie information data on the network.But traditional crawlers don't support distributed.It usually takes a lot of time to crawl enough data.Distributed crawlers can multiply data by multiple crawlers to work together to increase the efficiency of multiple crawlers.The movie search system uses distributed crawlers to capture movie data,and the distributed crawler uses the Redis database and the Scrapy crawler framework.The crawler is divided into the Master side and the Slave side.The Master side crawler is mainly responsible for parsing the webpage directory page,matching the catalog page link to Redis for subsequent crawling,matching the detail page link to Redis to the Slave side,and the Slave end crawler receiving the Master.The link of the end is to parse the webpage and download the data.After downloading the data,the data is formatted by the script and stored in the MySQL database for website access.There are usually problems when the crawler is running.So some middleware is designed to solve these problems.For example,by imitating different browsers to send access requests to prevent the crawler from being blocked by the website,the status code returned by the crawler access is handled differently,and the problem of crawler download errors is solved by proxy IP.The film search system is designed using Django's MTV model,which mainly includes movie search,movie evaluation,movie collection,user registration and login,and background management.After the user logs in,the user can search for the movie by keyword,or click the link to perform various types of query such as movie category,movie year,movie production area,etc.,which can satisfy the query requirements of most users.Finally,thefilm search system was tested and tested for performance,which verified that most of the functions of the website can work normally.The movie search system not only saves administrators the time it takes to import movie resources,but also provides a place for users to search for movie information and discuss movies with others.

Keywords/Search Tags:

Distributed crawler, Movie search system, MTV model

PDF Full Text Request

Related items

1	Design And Implementation Of Film Integrated Search System Based On Web Crawler
2	Research On Key Techniques Of Distributed Vectical Search Engine
3	Distributed Web Crawler System
4	Design And Implementation Of Distributed Network Crawler System
5	Design And Implementation Of Distributed Online Travel Search Crawler System
6	Distributed Web Crawler System Design And Implementation
7	Research Of A Distributed Web Crawler Search Engine Based On Web Information Collection
8	The Research On Web Crawler Technology Based On Distributed Calculation
9	Research And Implementation Of Distributed Web Crawler
10	Research And Implement Of Distributed Crawler System Supporting AJAX