Font Size: a A A

Automatic Extraction And Translation Of Popular Words On The Internet

Posted on:2016-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhaoFull Text:PDF
GTID:2308330464972787Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet makes people communicate more convenient and efficient. With a new form of communication, people’s control and innovation ability of language has been an unprecedented development. As the main symbol of Internet culture, it becomes flourished in recent years. It produced in the network, but has penetrated into people’s daily lives. In recent years, the study of the popular words on the Internet has attracted attention at home and abroad widely, research perspective consists of sociology, communication studies and linguistics. These studies revealed the general mechanism of preliminary produced with the popular words on the Internet, but mostly stick to the qualitative analysis from the perspective of the social sciences. In essence, the popular words are special new words, automatic recognition of these words is the basic of further processing and analysis. Meanwhile, with the increasingly close international exchanges, how to automatically translate the popular words on the Internet into other languages has become an urgent task, it has the significant impact for statistical machine translation, cross-language information retrieval tasks and so on.For this reason, the paper automatically extract and translate the popular words using natural language processing technology from a quantitative point of view. Extraction of the popular words is based on the use of the popular words showing rapid increase in the short term and fall of this feature, through the analysis of real data online forums to portray the big words used on the multi-year period to enhance the degree of order to quantify and measure the prevalence of words. Translation of the popular words is using the feature that the similar meaning to the words usually appears in similar contexts. By comparable corpus which is easy to access the bilingual resources and build large-scale words in the context of vector and by the similarity measure to extract candidate translations. The experimental results show that the popular words which are pumped out from the real big data forum have a consistency with the words which is published by experts from various agencies. And the popular words which are extracted by contextual information based on comparable corpus can be more accurate translation.The main contribution of this paper is:(1) We proposed the method that automatic extraction of popular words using the data based on real language. The method takes the feature of the popular words using into account, by design dynamic characteristics, static characteristics and other indicators of real online forums using the data for analysis, completed accurate extraction of popular words.(2) Designed the strategy that automatic translation of popular words based on comparable corpus. The strategy get the context by automatically collecting comparable corpora which contains popular words, and then to get the candidate translation word by comparing the similarity of context. These works mentioned above are the first attempt of the automatic translation of the Internet in this field, and it is of great initiative.
Keywords/Search Tags:Popular Wrords On The Internet, New Word Extraction, Comparable Corpus, Dictionary Extraction
PDF Full Text Request
Related items