Product Tag Extraction Based On User Reviews Under Distributed Crawler

Posted on:2020-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:W P Zhou

Full Text:PDF

GTID:2428330590495388

Subject:Information networks

Abstract/Summary:

PDF Full Text Request

With the advent of the new era of the Internet and the increasing popularity of various intelligent terminals,online shopping is becoming more and more a mainstream shopping method for people.While consumers are doing online shopping,they also generate a huge amount of commentary data,which contains huge mining value: for commodity manufacturers,the review data can intuitively reflect the user's evaluation of the product characteristics and adjust the product characteristics according to the user's preferences,so as to develop their own products better;for the e-commerce platform,they can extract the product labels based on the comment data,which can improve the user's shopping experience and also make relevant recommendations based on user interests;for the consumers,the comment data is the main information that the consumers knows about the characteristics of the product.The consumers can refer to the comment data to select the product that he or she wants.Mining user's comment data and extracting product tags can be widely used in product recommendation,personalized search and other scenarios,which is beneficial for commodity manufacturers to analyze product data,which is conducive to improving the user's shopping experience and helping to increase platform user traffic.Therefore,research on user's comment data mining can improve the accuracy and comprehensiveness of product labels more effectively,and it has great value and far-reaching significance in real life.This paper proposes a product tag extraction system based on user comments under distributed crawler.In this paper,firstly,for the massive user comment data,a distributed crawler system based on the improved Bloom filter is built to efficiently capture and store user comment data.Then,the feature word is extracted from the user comment data in combination with the improved TF-IDF algorithm and the dependency grammar,and the feature word pairs of the commodity(object word,evaluation word)are extracted.Finally,the extracted feature word pairs are clustered and emotionally divided,and finally a comprehensive label of the product attribute label and the user emotion label is formed.The main innovations of this paper are as follows:1.Design a distributed crawler framework based on the improved Bloom filter URL deduplication algorithm.By increasing the Bloom filter dimension,the false positive rate is effectively reduced and the efficiency of the distributed crawler system is improved.2.Using the improved TD-IDF algorithm + dependent grammar analysis method to extract feature words from a large number of user comments.The TF-IDF algorithm is improved by buffering the IDF weights and adding the dispersion method.Combined with the dependency grammar analysis,a method for extracting feature words from user comment data is proposed.The method is more suitable for the characteristics of the comment data.Word extraction.3.The selected feature words are vectorized into expressions that can be processed by the computer,and the distance calculation function is determined.A hierarchical clustering model of K-means+AP is designed to label the feature words.

Keywords/Search Tags:

distributed crawler, duplicated URL detection, word vector, TF-IDF, dependent grammar, tag extraction, emotion analysis

PDF Full Text Request

Related items

1	Research And Implementation Of Distributed Web Crawler
2	Design And Implementation Of Distributed Web Crawler Based On Hadoop
3	Research And Implementation On Key Technologies For Distributed Universal Web Crawler System
4	Study On The Evaluation System Based On The Emotion Of The Online Comments Of Scenic Spots
5	Research On Multidimensional Emotion Classification Algorithm Of Internet Comment Based On Word Vector
6	Design And Implementation Of Large-scale Internet Information Real-time Extraction System
7	Study On Emotion Cause Pair Extraction Based On Fusion Word Vectors
8	Emotion Analysis Of Chinese Microblogs Using Extended Emotion Lexicon
9	Research And Implement Of Distributed Focused Crawler
10	The Research Of Micro-Blog New Emotion Words Recognition And Orientation Judgment Based On Word2Vec