Font Size: a A A

Design And Implementation Of E-commerce Information Extraction System Based On Knowledge Graph

Posted on:2020-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:W H JinFull Text:PDF
GTID:2428330626950734Subject:Software engineering
Abstract/Summary:PDF Full Text Request
After decades of development,the Internet has produced a lot of information,and the use of this information will generate huge economic benefits.Web pages,as the carrier of such information,exist in the semi-structured form on the Internet.The WEB information extraction technology that extracts information by using web pages in the Internet as a data source is constantly developing along with the development of the Internet.As one of the many application modes of the Internet,e-commerce brings convenience to people's production and life,and also generates a large amount of product information.The extraction of product information in e-commerce websites has important value in many fields such as product recommendation and market analysis.The product search results page and the product information detail page are two important pages for displaying product information on an e-commerce website.In these two types of webpages,the search result page noise and the detail page noise caused by the advertisements and recommendation of the e-commerce platform and the merchant cause the existing WEB information extraction method to have a low accuracy.At the same time,these two types of web pages have similar page designs on the same e-commerce website and different e-commerce websites,but having different page structures leads to the invalidation of existing extraction methods.Aiming at the above problems,this paper proposes a method based on knowledge graph to extract e-commerce information by using rich concepts and instance information in existing knowledge graph.This method consists of two parts:knowledge graph preprocessing and page analysis extraction.The content includes the following four points:(1)In the knowledge graph preprocessing part of the extraction method,a knowledge graph preprocessing algorithm is proposed to provide feature information by calculating the field value of the attribute in the specified domain in the field.(2)In the page analysis extraction part of the extraction method,the product search result page and the product information detail page are input,and the page segmentation divides the page into several webpage blocks according to the characteristics of each type of page and the preprocessed knowledge graph.The webpage block is divided into a noise webpage block and a non-noise webpage,and finally,the non-noise webpage block is extracted to obtain the commodity information of the e-commerce.Since the product search result page and the product information detail page have different noise and page characteristics,the product search result page extraction algorithm and the product information detail page extraction algorithm are respectively proposed to solve the above problem.(3)In this paper,multiple sets of contrast experiments are set to verify the effectiveness of the extraction method.Experiments show that the proposed method effectively solves the search result page noise and detail page noise.At the same time,the method is better adapted to the situation where the page design appears similarly in the two types of pages,but the page structure is different.(4)Based on the extraction method proposed in this paper,the e-commerce extraction system based on knowledge graph is designed and implemented.After testing,the system meets various requirements and performance requirements.
Keywords/Search Tags:Information Extraction, Knowledge Graph, E-commerce Information Extraction
PDF Full Text Request
Related items