Font Size: a A A

A Privacy-Aware Online Statistical Machine Translation

Posted on:2017-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y X FangFull Text:PDF
GTID:2308330485461038Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine translation is very helpful and frequently used in people’s daily lives.The most promising approach to machine translation, the Statistical Machine Translation (SMT), learns statistical models from bilingual text corpora to perform the translation. Generally speaking, the larger the corpora are, the better the performance is. Since SMT requires huge amounts of text data, the services of SMT are always cloud-based and the users have to upload their texts needing to be translated to the servers to obtain the translation results. However, privacy concerns might hinder users from seeking the help of SMT when they have confidential texts to translate. There is little work addressing this problem, and we are in need of a SMT framework that protects the users’private texts.This paper proposes a privacy-aware SMT framework that decouples the decoder from the servers who hold the statistical parameters learned from the bilingual text corpora. The decoder is located in the client side and the translation process is performed in the client, so that the client does not need to upload the text to the servers. Instead, the client sends queries to the servers to obtain the statistical parameters needed during the translation. These queries still contain parts of user’s text and thus should be protected. Basically, PrivSMT distribute the parameter queries to multiple servers from different providers to restrict the information learned by each server. This paper develops a privacy metric that quantitatively measures the privacy leaks, and design an algorithm to find the optimal distribution strategy of parameter queries that minimizes the privacy leaks. This paper also provides several measures to enhance privacy protection on individual servers. The evaluation result shows that the PrivSMT framework is efficient and greatly reduces the privacy leaks.
Keywords/Search Tags:Online Machine Translation, Privacy Protection, Privacy Measurement
PDF Full Text Request
Related items