Font Size: a A A

The Research And Application Of HMM In Chinese Reviews Datamining

Posted on:2018-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330512483077Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the reputation of the rapid development of e-commerce, the information of user reviews are quickly built up, how to effectively use the various features of user reviews information is a matter of concern to potential consumers, while at the same time how to effectively use the information to track user reviews of goods, sales trends, market influence and other businesses are also of great concern. So, the work of Chinese user reviews mining has a very practical significance.In recent years, there have been a lot of comments with the Chinese reviews mining-related work carried out, some scholars use the LSA model or n-gram model on the document level for analysis, but only to give the document recommended or not recommended bipolar outcomes ,which lead to relatively coarse granularity and insufficient information extraction, and some scholars in the feature words to comment on the level of excavation, but there are still more or less a problem for identifying low-frequency words and phrase structure. Some work is also focused on the identification of the named entity to address the document in question to extract the words, but few take into account the feelings of the words tends to subjective information. With the large-scale machine learning technology matures,a number of machine learning algorithms gradually be put to review mining-related fields, and achieved good results. For example,named entity recognition hidden Markov models and hidden Markov models stacked based like, and natural language processing research based on the maximum entropy Markov model.On the basis of predecessors' work, this thesis takes into account the linguistic features of natural language. By incorporating the lexical features into the standard hidden Markov model. We propose a hidden Markov model based on part of speech vocabulary features. And have achieved good results on the area such as analysis and evaluation of the object in a polar comment extraction.The main work of this thesis are as follows:1 An in-depth study to explore the hidden Markov model of the three questions,namely, assessment of problems, sequence problems and learning problems. At the same time, the algorithm of solving the three problems is carefully understood and grasped. It is a forward-backward algorithm for solving the problem of assessment, the maximum likelihood estimation and expectation maximization algorithm for solving sequence problems, and Viterbi algorithm for solving learning problems.2 Proposed the hidden Markov model based on the lexical information and part of speech. And the calculation formula of the model is deduced, at the same time, the solution of major training problems of the model in engineering practice is provided too.Including the use of the Gould-Turing estimation to solve the zero probability problem in the model training, the use of logarithmic operations to replace the multiplication operation to solve the floating-point underflow problem, the use of LDA model to solve the problem of large number of unregistered words. At last, the data mining effect on the text of commodity review has been enhanced.3 A set of labeling rules for the contents of the evaluation of electricity providers is proposed and have plays a good role in the construction of model training data sets. At the same time, the Distributed Representation is used to combine the synonyms and the synonyms in the mining results, which effectively prevent similar comments both with high frequency characteristics flood those have lower frequency characteristics.4 This thesis makes a comparative evaluation of the proposed algorithm, and the model precision rate and recall rate and F1 value are compared. The experimental results show that the proposed algorithm has better effect.
Keywords/Search Tags:HMM, Chinese reviews mining, Viterbi algorithm
PDF Full Text Request
Related items