Font Size: a A A

Research On Hanyue Online News Text Extraction Method And Emotional Tendency Classification

Posted on:2018-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WangFull Text:PDF
GTID:2438330572452583Subject:Measuring and Testing Technology and Instruments
Abstract/Summary:PDF Full Text Request
In recent years,due to the increasing warming of relations between the two countries,the two governments and enterprises to understand the needs of both sides of the domestic social,cultural and economic development is increasingly urgent.As everyone knows,online news is an important way to understand the domestic and foreign political economy.In this situation,it is necessary to analyze the tendency of Chinese and Vietnamese journalism.This paper focuses on Chinese and Vietnamese online news extraction method,Vietnamese sentiment dictionary construction and news sentiment classification methods,mainly to complete the work in the following three aspects:.(1)Fusion structure and content feature extraction of online news text elements.According to the relationship between the design structure of the web page and the content of the text,this paper proposes a method to extract the elements of the online news text based on the structure and content features.On the basis of contact element<title>in the head and body of "web page in HTML tags and content,with the label features and continuous reproduction of text extraction" title;news text corpus in parts of the DOM tree,extract the parts in the structure and content of the various characteristics of the pre training SVM classifier.Mark on the test corpus DOM node using SVM,then the definition is pre marked expansion,text node integration rules and obtain the candidate blocks,introducing the density value and impact factor from each candidate block in judging the relationship between the position of the text block;release time and release time of the title,text locking region,extract the release time by regular expressions.The Chinese and Vietnamese news websites,blogs,etc.,have been tested,and the results show that the method has a good effect.(2)The construction of domain characteristics of Chinese and Vietnamese senti-ment dictionary.In the process of word emotional transmission,the more heuristic information you inspire,the more accurate and reliable.Most of the words based on semi-supervised sentiment propagation method only uses the seed words as heuristic information,and the method of this paper not only uses the seed words but also uses another language as the heuristic information.In this paper,we propose a method based on the mutual reinforcement learning to automatically construct the Chinese Vietnamese emotional dictionary on account of the data set of the Chinese and Vietnamese news.The selection method of seed words to improve seed words should have the maximum coverage,and the clustering center meet this feature,you can use K-means to select Chinese and Vietnamese seed words;and improve the original model,the guiding principles in two words the weak link of information dissemination meaningless,connection model reconstruction plans different from the traditional bilingual dictionary;introduction to translation,the Chinese bilingual dictionary is the bridge as a two language transfer of emotional information,the emotional information can overcome the language barrier in communication between the two languages.(3)Chinese and Vietnamese news sentiment Tendency Classification.Based on the acquisition of the Chinese and Vietnamese basic emotional resources--the Han Yue emotion dictionary,this paper introduces an unsupervised emotion classification method combining the prior knowledge into the word weight.This method overcomes the shortcomings of traditional JST emotion classification model that document every word on the themes and emotions have the same effect in fact,the emotional words and their associated words have key influence on sentiment classification.Therefore,this paper adds the weights of the emotion oriented words in the sampling process,and uses the prior knowledge to assist the iterative process of the model,so as to improve the accuracy of the emotion classification model.Finally,experiments show that the emotion classification model presented in this paper is effective.
Keywords/Search Tags:Online news extraction, Sentiment lexicon, The sentiment/topic model, News sentiment classification
PDF Full Text Request
Related items