Font Size: a A A

Chinese Text Similarity Matching Based On Domain Dictionary

Posted on:2015-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:H C ZhangFull Text:PDF
GTID:2268330431453455Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text similarity computation is a key task in natural language processing and is widely used in applications such as information retrieval, machine translation and so on. Compared to English, Chinese has no clear word delimiter and has a free grammar, which makes it hard to process for algorithm. In this thesis we studied the computation of Chinese text similarity based on domain dictionary. Firstly we constructed a domain dictionary form Wikipedia and after analysis of advantages and disadvantages of common algorithms in Chinese text similarity computation we proposed a new method based on semantic similarity of keywords.The contribution of the thesis is as follows:Common algorithms of text similarity computation were introduced and their performances were compared in detail. With this thorough analysis we gave our new method.A method of constructing domain dictionary from Wikipedia was proposed. Wikipedia is a crowd-sourced knowledge base with high quality contents contributed by users. Our method to extract terms related to one specific domain and extract their relations is useful in many applications.Based on the domain dictionary constructed above, an algorithm of computing text similarity was proposed. The basic steps are extracting the keywords and computing the similarity of keywords, computing the sentence similarity based on word similarity and document similarity based on sentence similarity. Experiment result was also reported.
Keywords/Search Tags:Text Similarity, Domain Dictionary, Semantic Similarity
PDF Full Text Request
Related items