As the coming of Big Data era, the amount of user reviews about some fashionproducts on each big e-commerce websites is hundreds of thousands. It will cost alot of time and energy to get information from these reviews, if only relies on humanto read these user reviews. In order to solve this problem, opinion mining came intobeing, and increasingly became hot spot in web information processing. Opinionmining is a new technology which combines text comprehension and data mining,mainly includes following steps: the abstract of web information, the classificationof useful and useless reviews, the sentiment analysis of reviews, the summary ofreviews. This paper revolves around the main steps of opinion mining, then does thefollowing research.Firstly, this paper gets a lot of user reviews about phones on Jingdong website byusing the web crawler technology, and stores these reviews in database. As the originreviews are intermingled with the useful and the useless. The useless reviews mayhave side effect to the follow-up steps. So this paper uses Support Vector Machinealgorithm to choose reviews which contain sentiment orientation about the product.The experiment adopts some features to classify reviews, such as emotional words,product features, product broken words, the co-occurrence of emotional words andproduct features. As we set each feature different weight, the precision of usefulreview classification reaches89.21%, which makes a foundation for the followingsteps.Secondly, this paper describes the reviews’ sentiment analysis module. Its goal issentiment block identification and sentiment analysis. In order to solve the problem ofmany colloquial and implicative sentiment blocks getting lost caused by traditionalmethod, which identify sentiment block by certain width of sliding window focusedon the sentiment word, this paper mainly research the sentiment block labelingtechnology based on Conditional Random Field. As the selection and dimensions offeature will have important effect on the word sequence labeling, this paper finallychoose the word, POS, emotional words, degree words, product features and productbroken words as CRF features, and the recall of sentiment block identification gets to 75.32%. The results of experiment show that the sentiment block identification basedon CRF performed better than the traditional method, whether on good and badsentiment blocks or on one, two even more words sentiment blocks.Thirdly, this paper visualizes the summary of each product’s overall rating. Basedon the aforementioned researches, a product review system based on opinion miningwas designed and implemented. It can comprehensively mining the customers’reviews about different product and the same product’s detail features, and show thevisualized results to users, by finding product feature and sentiment block pairs. |