Font Size: a A A

Product Tag Extraction Based On User Reviews Under Distributed Crawler

Posted on:2020-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:W P ZhouFull Text:PDF
GTID:2428330590495388Subject:Information networks
Abstract/Summary:PDF Full Text Request
With the advent of the new era of the Internet and the increasing popularity of various intelligent terminals,online shopping is becoming more and more a mainstream shopping method for people.While consumers are doing online shopping,they also generate a huge amount of commentary data,which contains huge mining value: for commodity manufacturers,the review data can intuitively reflect the user's evaluation of the product characteristics and adjust the product characteristics according to the user's preferences,so as to develop their own products better;for the e-commerce platform,they can extract the product labels based on the comment data,which can improve the user's shopping experience and also make relevant recommendations based on user interests;for the consumers,the comment data is the main information that the consumers knows about the characteristics of the product.The consumers can refer to the comment data to select the product that he or she wants.Mining user's comment data and extracting product tags can be widely used in product recommendation,personalized search and other scenarios,which is beneficial for commodity manufacturers to analyze product data,which is conducive to improving the user's shopping experience and helping to increase platform user traffic.Therefore,research on user's comment data mining can improve the accuracy and comprehensiveness of product labels more effectively,and it has great value and far-reaching significance in real life.This paper proposes a product tag extraction system based on user comments under distributed crawler.In this paper,firstly,for the massive user comment data,a distributed crawler system based on the improved Bloom filter is built to efficiently capture and store user comment data.Then,the feature word is extracted from the user comment data in combination with the improved TF-IDF algorithm and the dependency grammar,and the feature word pairs of the commodity(object word,evaluation word)are extracted.Finally,the extracted feature word pairs are clustered and emotionally divided,and finally a comprehensive label of the product attribute label and the user emotion label is formed.The main innovations of this paper are as follows:1.Design a distributed crawler framework based on the improved Bloom filter URL deduplication algorithm.By increasing the Bloom filter dimension,the false positive rate is effectively reduced and the efficiency of the distributed crawler system is improved.2.Using the improved TD-IDF algorithm + dependent grammar analysis method to extract feature words from a large number of user comments.The TF-IDF algorithm is improved by buffering the IDF weights and adding the dispersion method.Combined with the dependency grammar analysis,a method for extracting feature words from user comment data is proposed.The method is more suitable for the characteristics of the comment data.Word extraction.3.The selected feature words are vectorized into expressions that can be processed by the computer,and the distance calculation function is determined.A hierarchical clustering model of K-means+AP is designed to label the feature words.
Keywords/Search Tags:distributed crawler, duplicated URL detection, word vector, TF-IDF, dependent grammar, tag extraction, emotion analysis
PDF Full Text Request
Related items