Font Size: a A A

Design And Implementation Of Commodity-Oriented Vertical Search System

Posted on:2019-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:H T WangFull Text:PDF
GTID:2428330545965604Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of e-commerce,the comsuming habits of customers have changed,and the number of online shopping users has been growing.In order to purchase a product,it is often necessary to compare the price of the same product with different merchants,compare the preferential information,compare the favorable rates and so on.The application of commodity-oriented vertical search systems is also popular nowadays,but the relevant products in the market are characterized by a small amount of commodity data,low accuracy,and low timeliness.Therefore,high-precision and time-sensitive search engine services for commodities are needed.This article cooperates with a number of well-known e-commerce companies and conducts a series of processing on the product data.Through the using of the vertical search system for products,the article provides users with precise commodity search and purchase services with function such as product price comparison,preferential information comparison,and favorable rate comparison.The vertical search system for commodities is a real project from Baidu's vertical search product line and belongs to the Internet search engine field.The commodity vertical search system is based on commodity data to clarify user needs and provide vertical search as well as purchase services for the commodities.This project mainly consists of three parts:data design,data processing,and vertical search system design and implementation.First,it conducts related data processing works such as massive data introduction and data processing on the distributed platform.The data introduction is composed of data pulling and data crawling,i.e.,pulling the product data with partners and crawling the product data which the partner cannot provide.Crawling is achieved by using Python 's Scrapy framework.Data processing consists of data cleaning,data classification,data deduplication,and data integration.Chinese word segmentation,classification algorithms,Simhash and other related technologies are used through this process.After data processing,the vertical search system is designed and implemented,including index creation,search word processing,and search ordering.Finally,this project interact with the user through the front-end search interface.I am mainly responsible for the design and development of data pulling and data crawling functions of products,the design and development of data classification function modules,the design and development of data deduplication function modules including deduplication algorithm selection,Chinese word segmentation and keyword extraction,as well as the implementation of deduplication of massive data on a distributed computing platform,the design of commodity data integration program and the development of function modules.The vertical search system for commodities completed in this paper has achieved the expected results after functional testing of each module,and they could provide more convenient,more efficient,and more affordable commodity search and purchase services for users.
Keywords/Search Tags:Vertical Search Engine, Machine Learning, Text Categorization, Web Spider, Simhash
PDF Full Text Request
Related items