Font Size: a A A

Design And Implementation Of A Movie Search System Based On Distributed Crawler

Posted on:2019-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ShiFull Text:PDF
GTID:2428330590482839Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of big data,the value of data is becoming more and more important.Massive data which also contains file information data has enormous research value and commercial value.In the past,administrators generally imported relevant data manually.Now,web crawlers can be used instead of administrators to crawl rich movie information data on the network.But traditional crawlers don't support distributed.It usually takes a lot of time to crawl enough data.Distributed crawlers can multiply data by multiple crawlers to work together to increase the efficiency of multiple crawlers.The movie search system uses distributed crawlers to capture movie data,and the distributed crawler uses the Redis database and the Scrapy crawler framework.The crawler is divided into the Master side and the Slave side.The Master side crawler is mainly responsible for parsing the webpage directory page,matching the catalog page link to Redis for subsequent crawling,matching the detail page link to Redis to the Slave side,and the Slave end crawler receiving the Master.The link of the end is to parse the webpage and download the data.After downloading the data,the data is formatted by the script and stored in the MySQL database for website access.There are usually problems when the crawler is running.So some middleware is designed to solve these problems.For example,by imitating different browsers to send access requests to prevent the crawler from being blocked by the website,the status code returned by the crawler access is handled differently,and the problem of crawler download errors is solved by proxy IP.The film search system is designed using Django's MTV model,which mainly includes movie search,movie evaluation,movie collection,user registration and login,and background management.After the user logs in,the user can search for the movie by keyword,or click the link to perform various types of query such as movie category,movie year,movie production area,etc.,which can satisfy the query requirements of most users.Finally,thefilm search system was tested and tested for performance,which verified that most of the functions of the website can work normally.The movie search system not only saves administrators the time it takes to import movie resources,but also provides a place for users to search for movie information and discuss movies with others.
Keywords/Search Tags:Distributed crawler, Movie search system, MTV model
PDF Full Text Request
Related items