Font Size: a A A

Research On Cross-language Information Matching Technology Based On The Term Extraction

Posted on:2017-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:S Y SunFull Text:PDF
GTID:2348330518970764Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology, terms are widely used in various fields. Domain terms extraction technology is closely watched by the scholars, and has become an important task in natural language processing. In this paper, the term extraction technology is applied together with the cross-language information matching technology to solve the difficulties in establishing the relevance between Chinese and English texts.Based on the analysis and summary of current achievements on terms extraction, this thesis puts forward a method to term extraction based on multi feature fusion. Because the problem of the term word formation rules is according to the word part of speech, this method firstly uses natural language processing technology for Chinese text preprocessing,conducts part-of-speech tagging of words on sentences, and lay the foundation for the next term extraction. During the process,this method firstly uses the term word formation rules to filter words of the result of pretreatment,and then determines the boundaries of the words according to the information entropy. In order to solve the problem that information entropy is unable to extract the low-frequency words, this method uses field corpus IDF values to measure words in terms of relevance, weights average of the two groups of words, and finally chooses the candidate terms according to the threshold set and sores of terms. This algorithm solves the problem of domain term extraction. And then, on the basis of terms it obtained, the article introduces the concept of word co-occurrence according to the features of field terms,and uses the method of terminology translation to align terms in both English and Chinese,and obtains the corresponding translation terms in this field. Finally it establishes the link between Chinese and English by using the term alignment results in both English and Chinese retrieval model. In order to retrieve the efficiency of the English text indexing, it uses the retrieve type to retrieve the whole English text, and determines the best match of the English text according to the matching results, so as to achieve the goal of cross-language information retrieval by using the alignment results of domain terms.In the end, this thesis carries out repeated experiments on the set of policy texts. The experiments test and verify the availability of the method by the analysis and comparison between the method introducing in this thesis and the traditional methods.
Keywords/Search Tags:Term extraction, Information entropy, Alignment of Chinese and English, Cross language information matching
PDF Full Text Request
Related items