Font Size: a A A

Designing A Ranking System For Product Search Engine Based On Mining UGC

Posted on:2016-08-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:SamiraFull Text:PDF
GTID:1318330476455882Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the spread of e-commerce platforms, it becomes extremely difficult for the consumer to choose the right product from a large number of identical products, and different sellers based only on his/her own experience, product pictures or product metadata. Consumers' reviews present a rich and valuable source of information for potential consumers and manufacturers, but reading all of the available reviews is a hard task and time-consuming. Thus, the automated mining of these reviews and extracting product features in order to generate a raking system present a valuable and useful tool for consumers to make a well-informed decision.Usually, customers read product reviews for two reasons, either to find a product that has the best reviews regarding the product features and associated services, for example a cellphone with positive opinions about it in general; or in order to look for a particular review related to a specific product feature which the customer is interested in, for example a review about the battery life of a cellphone product.Thus, in this research we will generate a features-based ranking system for product searching called “Tsearch”, that recommends products that provide, on average, the best value for the consumer based on mining customer reviews. The proposed system improves the search result by considering the product features preferred by consumers during the search process and by basing the search result on the previous consumers' opinions about the product instead of focusing only on the information provided by the sellers of the products. Furthermore, the proposed system provides a visual opinion summarization for each product item in order to help the customer to gain a general idea about the overall opinions and to determine the most important features that have gained much more attention within a particular product.The data was crawled from the leading e-commerce website in China taobao.com. This data set contains consumer reviews on 16 popular products from 5 domains( phone, digital camera, rice cooker, soymilk maker and laptop). Each product belongs to a single product category. Each product category contains a set of the top three popular products(brands) preferred by taobao.com consumers within that category.However, more than 16,071 product pages with more than 537,638 reviews have been crawled, then have been parsed to extract the product ID, title, meta-data and all reviews, in order to build a title-based search index. At the query time, the index is used for efficient retrieval of the matched products. The proposed system developed to be able to accept a search keyword written in Chinese, English or both and it supports comma separated string where the first part of the query is the search keyword and the rest of it are the product features user is interested in. The ranking is generated according to the relevance score of each product which calculated based on product features extracted from product reviews which is one of the main contributions of this work.Stanford typed dependencies representation is used in order to extract product feature-opinion pairs from customers' reviews. These dependencies were designed to provide a simple description of the grammatical relationships in a sentence that can easily be understood and effectively used by people without linguistic expertise who want to extract textual relations. Although the design of the Chinese dependencies is similar to those for English in its structure, there are many other syntactic structures that only exist in Chinese. In this dissertation we have considered five kinds of dependencies to select candidate product feature/opinion pairs, they are: nsubj, dobj, ccomp, nn, and attr. However, we classify the product features into two categories: product dependent features and product independent features. The product dependent features are the features that describe the product itself or its components. The product independent features are the features that describe the associated services. In the proposed approach, we have assigned different weights for each category, assuming that the customers are more likely to be concerned about the product dependent features more than product independent features.Also, a front-end and back-end website was built for easy site usage and administration. The main homepage contains an input field where the user can use it to enter his/her search query then hits a “Search” button. The search query will be submitted to the server-side of the system where it can be analyzed. The last step is to generate the ranked products list of the matching products items and send it back to the user. The ranking results considered all dependent and independent features within consumers reviews in response to customer query. The evaluation process of the proposed system will pass through two levels: the first one is to measure the accuracy of product feature extraction and classification, and the second one is to determine the efficiency of the search results and the usability of the visual summarization. The results showed a high level of accuracy in feature opinion pairs extraction and a high level of participants satisfaction with the ranking and the summarization.The main contributions of this work can be summarized in the following four aspects. First, from the ranking perspective providing an opinion search by first extracting the product features mentioned by current customers and then classify them into dependent and independent features in order to use them in the ranking system of the product search engine to our best knowledge did not mentioned in Chinese literature, this work is the first work that examine this issue. Second, from the mining perspective, a lot of researches have been conducted in Chinese literature especially at the document level and at the sentence level. But at the aspect level, they are still limited, where the mining at this level was in order to determine the whole document orientation while this work is to determine the population opinions about the different aspects mentioned in the different documents in details then use it as a key factor in the ranking algorithm. In addition, providing a visual summarization for each product page is another contribution of this research, where the user can clearly see the strengths and weaknesses of the product and its aspects in the minds of existing consumers. Finally, the proposed system is capable of real-time reviews processing and classification, and therefore it can capture the aspects that may be mentioned by the new reviewers and determine their opinions about them automatically.
Keywords/Search Tags:product search engine, user generated content, opinion mining, opinion summarization, Stanford typed dependencies
PDF Full Text Request
Related items