Font Size: a A A

Topic And Feature Extraction In Online Reviews Based On Word2Vec

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:K YeFull Text:PDF
GTID:2348330485988123Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The development of IT and Internet brings about great innovation for the information interaction of human society, which also initiates the new way to trade, Ecommerce. As we all see, people are keener and keener to buy goods and services through the network with the prosperity of E-commerce. In another field, research, ecommerce has become one of the most famous topic in recent years too.As the wide application of Web 2.0 technology, users' behavior has been recorded on the web service. Online review is one of the most important data. Obviously, the users' reviews on the E-commerce website contains great value for research. These data are useful for both consumers and enterprises. For consumers, before purchase, others' opinions of the product are helpful for their decisions. In another aspect, enterprises often want to know the feedback of consumers after deals and want to know the users' preference when designing product. Exactly the online reviews indicate that the users' evaluation of the product directly. Consumers maybe described one of the feature of products was good or bad. Compared with other ways, such as questionnaire, online reviews reflect users' evaluation more directly and roundly.There are two challenges to research the online reviews. 1) Firstly, online reviews are generated by users' casually. Because of the differences of composition background and grammar habits of users, online reviews are complicated. The same feature may be written with different words. User generated content(UGC) brings about a big problem for feature extraction; 2) secondly, there are so many e-commerce websites, each website has ten thousands of goods and every goods contains large number of reviews, which means that the information overload when users analyze the online reviews. To sum up, exploring a method to extract the feature automaticity of online reviews is very useful.In previous research, there are two main directions. 1) Research based on text structure; Researchers parsed the text through grammar, syntax characteristics. Or considering the statistic of word frequency and other frequent pattern. 2) Latent semantic; Decipher the text through the probabilistic model. These two methods overlook the semantic in the text. In recent years, the neural network language model is widely used, which build text model on a high-dimensional semantic space. In this paper, a language model based on neural network language model is proposed. Based on the tool, Word2 Vec, exploited by Google in 2013, this paper achieved the topic and feature extraction in online reviews. Through the experiment, this paper proved the effectiveness and efficiency of the model.During the model evaluation, considering the deficiency of ground truth in big data. This paper proposed a modified perplexity to evaluate the model. As an unsupervised index, it can be generalized to other models on the environment of big data. The modified perplexity is another contribution of this paper.During the model evaluation, considering the deficiency of ground truth in big data. This paper proposed a modified perplexity to evaluate the model. As an unsupervised index, it can be generalized to other models on the environment of big data. The modified perplexity is another contribution of this paper.
Keywords/Search Tags:Online Review, Feature Extraction, Word Vector, Clustering
PDF Full Text Request
Related items