Font Size: a A A

The Design And Implementation Of Auto-categorizing Abundant Product Information For Compare Shopping Site

Posted on:2009-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y S MaFull Text:PDF
GTID:2178360242479369Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Because traditional shopping site has been unable to meet the needs of users, comparison shopping patterns have emerged. Consumers can compare a certain commodity's prices, freight, discounts, Three Guarantees and other services of hundreds of merchants in a comparison-shopping site, to find the most cost-effective merchant to purchase their products. However, comparison-shopping site needs massive amounts of products information covering all sectors of products, but some of merchants are unable to provide data file with complete product information, which leads to online failure of massive products information, and its own database of commodity information. Manual classification of these unmatched products will be a huge workload. Therefore, the development of a set of intelligent automatic product catalogue for the growing web site is seriously expected.The system is divided into three major modules, Generate Knowledge Dic DB, Generate Knowledge DNA DB and Classify Data. The first module is the program to normalize the data by dictionary which is manually reviewed (the dictionary records in the database) using the study machine. This module involved in English and Chinese word segmentation and technology. In the English word segmentation processing, we use the Replace some symbol, Stop word, Spell Check, Stem, and Alias, these several steps to achieve this ultimate segmentation results. It is ideal on the performance and accuracy now. For Chinese word segmentation processing, we learn from and develop some excellent algorithms and data structure models, but on account of using JAVA language, current performance is not very satisfactory. The second module is responsible for the DNA data weights records, and these data records place a leading role for the processing of third module. The third module calls the former two modules' interface to automatically classify the unmatched products.The success launch of this system changed the manual match of the massive product information to be automatically processing, which is tremendous savings in human and material resources for the company. It is of great significance. In the near future we will further improve the accuracy of the match and the performance of the system.
Keywords/Search Tags:Compare Shopping, Auto-categorizing, English Word-splitting, Chinese Word-splitting
PDF Full Text Request
Related items