Font Size: a A A

Parameter Estimation Of Hidden Markov Model And It's Application In News Classification

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J F HuangFull Text:PDF
GTID:2428330605950714Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the advancement of the era and the rapid development of Internet technology,network data has being exploded all way alone.While enjoying the convenience brought by rich online information,we are in the face of disorganized and vast text data,and how to extract valuable information quickly and accurately has become the focus of people's attention.Hidden Markov Model is one of the most successful statistical models applied in text analysis.The hidden states of the HMM has a specific explanation of the underlying phenomenon while modeling,so states are very important for the model.Therefore,researches have both theoretical and practical significance;applying HMM to Movie Recommendation is of practical significance for users to achieve personalized and accurate recommendations;the application of HMM to news classification is conducive to efficient classification,organization,and management of massive text data.Therefore,this paper will study the statistical properties of Hidden Markov Model and its application in text categorization.First of all,based on the excursion theory of HMM,and the principle that Means of different observable elements' hitting time have the same distribution which are calculated from observable elements starting from the same state.We made the order estimation though clustering the average of the hitting time.In this paper,three numerical simulations were carried out.The results showed that the new proposed method has better performance and stability than other methods,the precision is greatly improved,and the computational complexity is also significantly reduced.Applying the proposed method to the Movie Lens dataset in real life,we made recommendations based on the history of the users' watching history,and the accuracy has improved significantly.This paper also conducted a news classification research based on HMM.We conducted an experiment selecting six categories of news data including agriculture,computer,economics,environment,politics and sports which are from the International Database,collected by the Department of Computer Information and Technology of Fudan University.And we also wrote a crawler to crawl the comments on the Weibo platform and Bilibili platform.The data processing steps of noise filtering,generating a custom dictionary,generating a word-term matrix,SVD dimensionality reduction,and K-Means were sequentially performed on the corpus.Finally,we designed the HMM classifier and compared it with the naive Bayesian model,the k-Nearest Neighbor model,and the Logistic model.The results of the experiments were analyzed from the accuracy,the recall and the F1-score value.The results also showed that the HMM classifier has better performance,compared with the naive Bayesian model,the K-nearest neighbor model and the logistic model.The accuracy,recall and F1-score value have been improved significantly.
Keywords/Search Tags:Hidden Markov model, Markov chain, Order estimation, Movie recommendation, Text classification
PDF Full Text Request
Related items