Research On Similar Sentence Retrieval Technology For Patents

Posted on:2011-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y K Lu

Full Text:PDF

GTID:2178360302488549

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the field of the nature language processing, sentence retrieval has been widely applied and concerned by people. In the system of question answering (QA), automatic text summarization, example-based machine translation (EBMT) or translation memory, the quality of a sentence retrieval module would directly affect the performance of the system. However, there are no unified standards for judging whether two sentences are similar or not. The standards of similarity judgment are different in different fields, so judgement standards are different. Until now, unified standards do not exist, and it is impossible to make out such standards for the reason that specific judgment criteria are associated with a specific application. For example, if the structure of the syntax is similar, we can think the two sentences are similar in the example retrieval system. While, in FAQ-based automatic question answering, we can judge the sentences are similar when they have similar meaning.Following the growing awareness of the intellectual property rights and the urgent need for the international exchanges, the traditional translation way of translating patents by people can not meet the rapid needs of patent translation. And it also blocks the spreading and exchange of patent techniques between China and the rest of the world to some extent. As the rapid development of machine translation, the automatic machine translation and computer-human cooperative translation become an effective way to solve the problem.The main task of this paper is to design a sentence retrieval algorithm for the computer-human cooperative translation system according to the features of patents so as to improve the performance of the system. Compared with the common documents, the patent documents have canonical format, precise expression and an abundance of terms. Aiming at the characteristics of the patent documents, this paper presents a computing method of sentence similarity based on pseudo-LCS. This method is capable of fuzzy-alignment by improving the conventional longest common subsequence (LCS) algorithm. In addition, this method joins word meaning, parts of speech, term similarity and other related information, being more effective in sentence similarity computation for the patent documents as shown by experimental results. The accuracy of our method can achieve 83.5%, while the method of the improving edit is 63.5% and the vsm method is 66.5%.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	The Research On Chinese Sentence Similarity Algorithm Based On HNC
2	Study And Application On Chinese Sentence Similarity Computation
3	Research On Computation Method Of Chinese Question Similarity Based On Deep Learning
4	The Design And Implementation Of Multi-features Combination In Sentence Similarity Computation
5	Sentence Similarity Computing Theory And Applied Research
6	Research On Sentences Similarity Computation Based On Multi-information Fusion
7	Research On Semantic Similarity Computation And Applications
8	Chinese Sentences Similarity Computation And Its Application In Question-Answering System
9	Research Of Sentence Similarity Computation Based On Semantic Analysis
10	Research On Question Similarity Computation Based On User Intention And Syntactic Roles