Font Size: a A A

Synonym Recognition Based On User Behaviors In E-commerce

Posted on:2012-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:S J ZhangFull Text:PDF
GTID:2218330362950431Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, e-commerce begins to work up, so a wide range of e-commerce platforms providing online trading for individuals or enterprises come into being. E-commerce platforms need to understand the buyer's search intention, and show the corresponding goods. To achieve this goal, e-commerce platforms gradually turn to the semantic relationships, and synonymous relation is an important part of the semantic relations.Synonyms in e-commerce are the different expressions to the same thing and they are interchangeable in commodity description and searching. Synonyms in e-commerce have a lot of new words and typos, and are defined strictly. Those features make the effect using the existing methods for e-commerce is not significant and the identification more difficult.This paper focus on the user's behaviors in electronic commerce, and present a method to generate the synonym candidate set based on user behaviors. Generating the synonym candidate set includes accessing and filtering. On the one hand, we use the symbols which stand for parallel relationships to segment the titles to get candidate synonym pairs. On the other hand, we use SimRank method to cluster queries and get the candidate synonym pairs. Then, we split all the candidates into Chinese-Chinese pairs and English- Chinese pairs. Lastly, we use a series of rules to filter the Chinese-Chinese pairs and synonymous probability to filter the English-Chinese pairs.After the generation of the candidate set, we will identify synonyms. In according to the features of English-Chinese synonyms, we put forth three identification methods to identify the English-Chinese synonyms including pronunciation similarity method, Google Translation method and synonyms probability method. We focus on features extracting, especially the title features, query features, click features which reflect user behaviors, and using both Gradient Boosting Decision Tree model and Support vector machine model for synonym recognition.Experiments show that the pronunciation similarity method can effectively identify the transliteration of the English-Chinese synonyms, Google translation method can effectively identify the translation of the English-Chinese synonyms, and synonymous probability method can effectively identify high-frequency pairs in English-Chinese synonyms. With Gradient Boosting Decision Tree model, adding the user behavior features including the title features, query features, click features to the literal features make accuracy increased by about 25%, recall rate increased by about 24% and F value increased by about 30%. Compared with Gradient Boosting Decision Tree model and Support vector machine model, the former model's results are better than the latter in all respects...
Keywords/Search Tags:synonym recognition, user behaviors, e-commerce, Gradient Boosting Decision Tree (GBDT), Support Vector Machine (SVM)
PDF Full Text Request
Related items