Font Size: a A A

Research And Impementation On Chinese News Sentiment Classification System

Posted on:2012-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ShiFull Text:PDF
GTID:2218330338953243Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Internet news information is timely and the content is rich, and more and more people used to read the news information from the Internet. Howev-er, there are many pieces of unhealthy information can affect people's mind a-nd misunderstand the young, so social media monitoring from news informatio-n has drawn much attention in the NLP filed, and opinion mining has become one of the key technologies for handling and analyzing the Internet news inf-ormation. One of the most widely-studied sub-problems of opinion mining is s-entiment classification.In the thesis, the sentiment classification objects generally contain words, s-entences, documents and so on. Since Chinese language is complex, the senti-ment classification research based on word level, sentence level, document lev-e1 faces a great many of difficulties. So the thesis studies the text sentiment c-lassification in Chinese news, and does the following innovative research respe-ctively both the sentence level and the document level.First, construct the corpus and build the sentiment dictionary based on Chi-nese news.Second, do the sentiment classification research based on sentence level.The thesis presents three sentiment classification models in sentence level. The Syntactic Path model:first, collect the syntactic path patterns containing t-opic words and sentiment words matching, and put the patterns into a database. Second, input the sentences, and construct the parse tree to determine whether a relationship between the topic word and the sentiment word exists. The Ve-ctor Space model:based on the topic word, calculate the distance between the topic word and the sentiment word forward and backward respectively, then c-alculate the sentiment score based the vector distance. The Force model:sear-ch the topic words, the sentiment words, the noise words to determine the sen-timent orientation. Third, do the sentiment classification research in document level.The document sentiment classification research is based on the sentence le-vel, the thesis presents three sentiment classification models in document level. The Semantic Orientation model:the approach has four steps:(1) the news d-ocuments are preprocessed; (2) the sentiment words and the negative words are integrated processed; (3) the topic words and the sentiment words are integra-ted processed; (4) the weight is calculated based on a sentiment word dictiona-ry and the context information. The Machine Learning model:use SVM classi-fier, first do feature selection to train the documents, then input test documents, and use SVM classifier to get the document sentiment orientation. The Force model:search topic words and the sentiment word in the document to determi-ne the sentiment orientation.Use the above models to test the constructed corpus based on Chinese ne-ws, the experimental results indicate that the Syntactic Path model performs th-e best in the sentence level and the Semantic Orientation model exhibits the b-est performance in the document level.
Keywords/Search Tags:Sentiment Classification, Semantic Orientation, Machine Learning, Chinese News, Document Level
PDF Full Text Request
Related items