Font Size: a A A

Research On The Extraction Of Cross Lan- Guage Semantic Similar Words

Posted on:2017-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y G WangFull Text:PDF
GTID:2308330488497748Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer applications, human beings are eager to make computer think and deal various kinds of tacks. As the most important part of Artificial Intelligence, applications in natural language processing could address many kinds of issues. Semantic similarity represents the relatedness of a set of documents, phrase, and words in semantics. Semantic similarity between words from cross languages measures the degree of expressing the same meaning which refers to the similar contents. Cross-language semantic similarity has played an increasingly important role in information processing. In addition, with the development of big data, Cross-language semantic similarity shows its great value in many researches and applications, such as, artificial intelligent, natural language processing, information retrieval and so on. In addition, as two popular languages, Chinese and English have been widely used in economical field, cultural fields, and educational fields and so on. Thus, we are necessary to give further study on semantic similarity cross these two languages.At present, measures about cross-language semantic similarity could be grouped into three categories:Semantic knowledge rules based measure, corpus statistics based measures, and the hybrid measures of them. Semantic knowledge bases contain many semantic rules which are designed by humans. They are composed of complex semantic networks. Thus, semantic knowledge rules based measures will make full use of semantic knowledges to compute semantic similar degree of cross-language words. What’s more, we also could obtain sufficient data by the way of manual entry, web crawler, and crowdfunding etc. By using the data, we could also get the semantic similarity degree. However, this method has the problem of uneven distribution of words which will lead to the deviation of computing results. Measures combining semantic knowledge based and corpus statistics based measures can well make up for the inadequacy of the problem which has aroused the concern of other researchers.In this paper, we regard Chinese and English as our research objects to explore the hidden semantic relations between the two languages. We first explore the semantic similar words extraction in single language. Chinese Concept Dictionary (CCD) and WordNet will be used to construct the Chinese Semantic Similar Words Extraction (CSWE) model and English Semantic Similar Words Extraction (ESWE). The CSWE and ESWE model will extract similar words from CCD and WordNet respectively, which will verify its performance in different languages. Our experimental results show that CSWE and ESWE achieve the same correctness compared with baseline model. In addition, we will extend CSWE and ESWE models to cross-language similar words extraction. By doing so, we make a new method which is called Cross-language Semantic Similar Words Extraction (CLSWE) model. In order to verify performance of CSWE model, we apply different sizes of data set (WordSim353 and RW) to extract similar words between two different languages. We first extract a dataset from WordSim353 and RW, which is consisted of 77 words. Then we translate them into Chinese. We use these two datasets to verify CLSWE’s correctness and the experimental results that CLSWE could achieve the same similarity values. What’more, CLSWE also shows that the larger dataset, the faster CLSWE extracts.Our proposed models show their advantages compared to baseline models. Thus, they all show their good application prospect.
Keywords/Search Tags:Cross-language, semantic similarity, WordNet, Chinese Concept Dictionary
PDF Full Text Request
Related items