Font Size: a A A

A Study On The Method Of Extracting And Clustering The Viewpoint Of Han Yue 's News

Posted on:2017-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y YangFull Text:PDF
GTID:2278330488450000Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Vietnam is adjacent to China. Under the strategy of "the Belt and Road Strategy", communication between two countries is increasingly frequent and news is the major carrier of national condition and event opinion. Due to difference between Chinese and Vietnamese, it is hard to obtain and analyze different news affairs and viewpoints. In the paper, to solve this problem, news text was processed initially and all established knowledge of Chinese and Vietnamese was utilized to describe and identify the characteristics of perspective sentence in news. Secondly, it took advantage of features of current notion inter-translation of Wikipedia knowledge base to calculate the similarity between Chinese and Vietnamese. Finally, association between perspective sentences was combined with semi-supervised information to establish semi-supervised graph clustering model to cluster perspective sentence mixed with Chinese and Vietnamese news. Specific research work was as follows:(1) It proposed SVM-based opinion extract method that firstly analyzed characteristics like the position of perspective sentence in news text, its association with headline and emotional words to establish features of perspective sentence extract; next, by using perspective sentence in news annotated manually, SVM model was trained to achieve the effect of discrimination of perspective sentences. Eventually, according to experiment, effect of the method was testified and the experimental result showed that this method could effectively extract perspective sentences.(2) Similarity calculation method for Chinese and Vietnamese words based on Wikipedia was proposed. In the method, it depended on characteristics of Wikipedia’s multi-language concept description feature, translation correspondence in many concepts, different conceptual pages for different languages and words, and certain co-occurrence relation in words and other concepts, and firstly concept set that there was corresponding relation in Chinese and Vietnamese was extracted from Wikipedia to build bilingual conceptual feature space. Secondly, according to the word frequency in relevant conceptual description text and co-occurrence feature of words and concept in other conceptual text, conceptual vector of words was established.Finally, word similarity of two vectors was calculated by included angle cosine. The experimental result indicated that the proposed method had positive effect on word similarity calculation of Chinese and Vietnam and conceptual co-occurrence relation could increase the accuracy rate of word similarity. This method built a bridge of incidence relation between properties for cross-language semi-supervised graph standpoint clustering method.Semi-supervised graph model of news perspective sentence cluster of Chinese and Vietnamese was established and similarity and incidence relation between sentences was taken as characteristics, and incidence relation included co-occurrence of name, position and time. In the establishment of semi-supervised graph cluster model, Wikipedia was used to calculate word similarity of Chinese and Vietnamese and to compute the similarity among cross-language perspective sentences and similarity value of corresponding property in different sentences to construct sides of perspective sentence. "Must-link" and "cannot-link" was introduced as supervisory information to obtain perspective sentence cluster in environment mixed with Chinese and Vietnamese. It was shown in the experiment that the method could effectively cluster perspective sentences in bilingual environment.
Keywords/Search Tags:Word similarity, Wikipedia, View other extraction, Attribute graph clustering
PDF Full Text Request
Related items