Research And System Development Of Content Duplicate Chechking In E-business Website Based On Semantics

Posted on:2018-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:P Xu

Full Text:PDF

GTID:2348330518496482

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

With the growth of Internet users and the flourish of e-commerce, e-commerce website is getting larger, electronic business data on the website shows explosive growth. As the electronic shopping has become a part of people’s daily life, data on the electronic business website has also become researchers’ an important research object of people’s daily economic activities. Thus, high efficient collection of electronic business website information is very important. However, there is not only a large amount of data, but also a large amount of redundant data. The large number of redundant data will seriously affect the time efficiency of data collection and reduce the accuracy of data. In order to enable users to better compare these information, it is necessary to check the repeatability of data.This paper first introduces the technology that is needed throughout the paper. Using automated test framework Selenium to realize data capture, which is the basis of the entire system. Then we introduce the semantic standard of wordnet. In this paper, we use its standard to establish the nodes of the semantic tree model. The standard semantic tree is used to compute the similarity between products.(1) using of selenium framework to crawl electronic business website information. Automated testing framework for general testing of web services, but this paper uses the capacity of analysis of page js, label and xpath to extract the elements of the page, combined with phantomjs browser core. Applying it to business data crawl, the rendering time of the front page could largely decrease and the crawl speed enhanced.(2) the construction of the semantic tree model characterization of electronic business website. In this paper, we investigate the structure of major electronic business websites, compared their similarity in hierarchical classification, and map them to the semantic tree of the same structure. Using wordnet standard semantics to unify each layer node’s description for the different electronic merchant’s website goods, and unify the merchandise information of different electric business website completely to the same semantic tree.(3) the use of semantic tree for goods check weight. Because the semantic tree has already defined the expression of the standard commodity. It is possible to determine whether they belong to the same or similar goods by comparing whether the paths mapped by the commodity on the semantic tree are the same.(4) electric business data acquisition system design and product similarity comparison system design. Because of the structure of the tree to describe the electrical business data, the design of the database storage structure using a hierarchical relationship model, which could greatly reduce redundant data storage. The entire service is designed to be multithreaded, allowing simultaneous crawling of data from multiple e-commerce sites. Since they are represented using the same model and stored in the same database, there is no need to worry about data obfuscation. The comparison of commodity similarity is to use the semantic model of this tree to achieve the comparison of each node.

Keywords/Search Tags:

e-commerce data mining, semantic tree, similarity comparison, duplicate checking

PDF Full Text Request

Related items

1	Research Of XML Semantic Clustering Based On Weighted Edge Set Comparison Algorithm
2	Research On Technologies Of Duplicate Record Data Cleaning In Big Data Environment
3	Plagiarism Detection Algorithm Based On BiLSTM And Its Application In Duplicate Checking System
4	Research On Cleaning Method For XML Similarity Duplicate Data
5	Reaearch And Implementation Of Duplicate Checking System Under Internet Environment
6	The Research On Semantic-driven Image Mining Using Statistical Learning
7	Research On Process Mining Method For Duplicate Tasks
8	The Research And Design Of Data Mining System Based On E-Commerce
9	Research On The Key Technology Of The Price Comparison System Based On Semantic Similarity
10	Studies On Algorithms Of Association Rule Mining In Data Mining