Font Size: a A A

Research On The Calculation Method Of Han-Thai Bilingual News Text Similarity With News Elements

Posted on:2017-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z X HouFull Text:PDF
GTID:2358330488464875Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text similarity computing is of the important topics of Natural Language Processing. It is widely used in text mining?Text duplicate checking? machine translation and text classification. Text similarity computing is metric parameter of matching degree between two or more texts. Text similarity computing on monolingual has considerable results. People are used to getting information from internet along with the social development. They cannot be satisfied with a single mode language information. The diversified internet languages meet the requirement that people need massive information. So, they attached great importance to cross-linguistic news search and cross-linguistic text detection.Firstly, this paper is research on text similarity computing of Chinese news text. The conventional methods VSM that used to do text similarity computing is with high dimension and difficulty on computing. News report has five factors by analyzing the news report text. Then put forward news text similarity computing method which integrating news factors according to this feature. This method fully considers the effect to text similarity computing by five factors, and effectively reduce the interfere with low similarity text, meanwhile, it improve efficiency of the traditional text similarity computing. The method in this paper extracts the news factors of news text, and classify them to different sets, then utilizes set similarity computing and data fusion to compute the similarity of two news texts, and do the reduction experiments of spatial vector cosine coefficient and Jaccard coefficient based method. It validate the validity and accuracy of the method in this paper to news text similarity computing by the contrast experiment.This paper researches on Chinese-Thai cross language news text similarity computing method on the basis of Chinese text similarity computing method. Combining Chinese text similarity computing method, it also considers five news factors, Computing the set similarity of HowNet. It is different from Chinese text similarity computing method, during dealing with the Thai news text, we need to use translation tool, and map the tagged set factors that will to be compared to Chinese set via the intermediate layer.After Semantic disambiguation, it can be transformed to the set similarity computing of Chinese news text. The transformation and disambiguation have the aid of mutual information and the speech of the tagged words. Choose the meaning of words with double screening, which are mutual information and the Part of speech tagging.that can ensure the accuracy of the word language transformation, so can ensure the accuracy of text similarity computing. we can draw the conclusion that the method of integrating news factors to cross language news text computing is better than the common text similarity computing method on computing the similarity of news text from the experiments.
Keywords/Search Tags:text similarity, cross-language text similarity, news factor, Semantic disambiguation, data integrating
PDF Full Text Request
Related items