Research Of Cross-Language Text Correlation Detection Technology

Posted on:2015-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z Peng

Full Text:PDF

GTID:2298330434954131

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

; Text similarity detection has always been an important task in the process of NLP. Nowadays, the research of monolingual text similarity detection algorithm has matured, but with the development of cross-language information retrieval and the international academic communication becomes more and more closely, the number of cross-language similar texts get even larger. So the measure of correlation between cross-language texts becomes particularly important.We summarize the methods of monolingual text similarity detection in this thesis, besides, we study the cross-language information retrieval and some algorithms of cross-language text correlation detection in popular. Considering the current situation in our country that cross-language plagiarism is mainly between English and Chinese, we do the following researches mainly:Since in the process of cross-language text correlation detection, machine translation technology is needed to search the candidate set of similar texts quickly, so we explore the feasibility to make use of the mainstream machine translation tools in English-Chinese text correlation detection applications. In the process of machine translation, we test different text granularity and analyze the experiment results respectively. According to the results, we propose a binary text and sentence-based algorism which both has appropriate precision and recall rate, besides, itâ€™s more efficient. At last, we develop a system to get the candidate set quickly by combining the algorism and Minwise Hash.Since the performance of the cross-language text correlation detection algorism CL-ESA is mainly influenced by the index documents collection, we propose an algorism based on clustering to assist to build the index documents. The algorism use clustering in the process of building index documents collection to make sure that the documents have better distinction and quality. Experiment results show that our algorism not only raises the recall rate of CL-ESA but improves the time performance.

Keywords/Search Tags:

cross-language correlation detection, machine translation, CL-ESA algorithm, text clustering

PDF Full Text Request

Related items

1	Building Comparable Corpora Based On Cross-language Text Similarity Metrics
2	Research On Unsupervised Neural Machine Translation
3	Low-Resource Machine Translation Techniques For Distant Language Pair
4	Research On The Method And Technique Of Chinese And Thai Cross - Language Topic Detection
5	Design And Implementation Of A Cross-lingual Text Summary System Based On Deep Learning
6	Research Of Ch-En Cross-Lingual Plagiarism Detection Based On Translation Features And Contents
7	Research On Bilingual Topic Model And Its Algorithm In Cross-language Information Retrieval
8	Query Translation Based On Visual Information For Cross Language Retrieval
9	Research On The Application Of Machine Translation In Cross-lingual Document Classification
10	Research Of Some Key Issues In Highly Adaptive Example-Based Machine Translation