Design And Implementation Of Inventory Data Collection Platform For Student Accommodation

Posted on:2018-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:L S Li

Full Text:PDF

GTID:2348330512497653

Subject:Software engineering

Abstract/Summary:

Beijing UNINOVA LTD is an 020 Internet start-ups which provide student accommodation rental information service specially for overseas students in United Kingdom.In the business model of the Internet,on the one hand it’s required that companies must provide services with good experience,on the other hand the company need to acquire accommodation information quickly and precisely.Currently the accommodation data is acquired cooperatively from Unite-Students official by email,or business competitors.Then staff will manually update rental information of accommodation.However,it’s inefficiant and with high administration cost,and that,in the peak season of accommodation rental,room margin and tenancy term are changed frequently.For business requirement,an automatic way is needed to deal with information synchronization of accommodation between different platform,to get the latest and precise information.Writing a web crawler to crawl web data is an effective mean.Between different accommodation platforms,though the information structure in webpage are similar,but the HTML presentation pages are different.Confronting the customize requirement of web crawling,the key problems in this project are to reduce the workload for writing crawler to reduce cost,designing the system architecture,controlling the crawler module complexity,decoupling the module functions,cleaning data,structuring and importing data.During internship period,I take part in the development of the accommodation backend data center.Refering back to one legacy project,the unfinished Pyspider web crawler application,a new system base on Scrapy was redeveloped.Differ from the main hosting site backend called Livety,data center is called Sharingan.Lively in charge of choosing the certain accommodation data to show it in front-end,and manage user,Sharingan take charge of storing,processing and managing data scraped from different platform as an accommodation database,deploying and scheduling spiders as a spider cloud platform.In the mean time,two backend communicate through a message system,realizing the system’s low coupling.In the development,the job content includes:(1)Modeling the accommodation relational database.Formulate a structured data storing model.As a result,it provides a foundation and standard for structuring and importing data.(2)Designing the architecture of data center.Base on integral requirement,with the practice of legacy spider system,set up a general model for web page crawling and scraping.Determined the architecture of new system,frameworks the project use,technology and integration scheme of function modules.As a result,development demand,general design of system architecture and modules are clearly defined.(3)Take charge of the implementation of concrete function modules,developing and integrating sub system including Scrapy spider ’s Fragment modules,processor modules,validator modules,spider scheduling,monitoring modules,database import module,message system in data center and so on.As a result,a preliminary practicable integrated system is built.(4)In charge of unit testing,integration testing,system testing of related modules,ensure correct operation of the system.Program error in system and modules are found and corrected by testing.After launching the system,it has a good running status.It scrapes data from several platforms to provide accommodation data service for interior presentation system.its expansibility lay a foundation for a data crawling center with high versatility serving more data consumer.

Keywords/Search Tags:

Web Crawler, Web Data Extraction, Student Accommodation, Data Center, Message System

Related items

1	Design And Implementation Of The Crawler Log Data Information Extraction And Statistical System
2	Based On The Specific Web Crawler API Weather Data Fetching In The Research And Implementation
3	Platform Of Fund Data Extraction And Analysis Base On Web Crawler
4	Research On Key Technologies Of A High-performance Web Crawler System
5	The Design Of Liquidation Center Data Processing System In The Public Transportation Card
6	Multi-source Data Collection Center System Analysis And Design
7	Design And Implementation Of ETL System In Data Center
8	Research And Implementation Of Microblog Crawler
9	Research And Implementation Of Data Collection And Processing System For Big Data Precision Investment Promotion
10	Research On Key Technologies Of Data Center Physical Infrastructure Monitoring System