Chinese Text Similarity Matching Based On Domain Dictionary

Posted on:2015-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:H C Zhang

Full Text:PDF

GTID:2268330431453455

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text similarity computation is a key task in natural language processing and is widely used in applications such as information retrieval, machine translation and so on. Compared to English, Chinese has no clear word delimiter and has a free grammar, which makes it hard to process for algorithm. In this thesis we studied the computation of Chinese text similarity based on domain dictionary. Firstly we constructed a domain dictionary form Wikipedia and after analysis of advantages and disadvantages of common algorithms in Chinese text similarity computation we proposed a new method based on semantic similarity of keywords.The contribution of the thesis is as follows:Common algorithms of text similarity computation were introduced and their performances were compared in detail. With this thorough analysis we gave our new method.A method of constructing domain dictionary from Wikipedia was proposed. Wikipedia is a crowd-sourced knowledge base with high quality contents contributed by users. Our method to extract terms related to one specific domain and extract their relations is useful in many applications.Based on the domain dictionary constructed above, an algorithm of computing text similarity was proposed. The basic steps are extracting the keywords and computing the similarity of keywords, computing the sentence similarity based on word similarity and document similarity based on sentence similarity. Experiment result was also reported.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Similarity Computing Method For Domain Texts
2	The Study Of Measures And Applications Of Short Text Semantic Similarity
3	Ontology-based Domain-specific Semantic Similarity Analysis and Application
4	Chinese-Old Bilingual Text And Sentence Similarity Calculation Research
5	Research On Semantic Similarity Measurement For Text
6	Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods
7	The Research And Application On Text Similarity Measurement Based On Semantic Analysis
8	Text Similarity Computing Theory And Applied Research
9	Research On Text Similarity Measure Method Of Combining New Word Analysis And Semantic Analysis
10	Research On Semantic Similarity Computation And Applications