Text Similarity Computing Theory And Applied Research

Posted on:2012-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:J H Ma

Full Text:PDF

GTID:2218330368489010

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

One fundamental and important work in information processing is text similarity computation, which is the key technology in the textual data mining that related to many important application researches, for example, in the area of document copy detection, text categorization, text clustering, information retrieval, and etc. It is worth further research and discussion because of its wide applications.The existing text similarity computing model has weaknesses such as deficiency in rationale and incompleteness in document properties fitting. Chinese text understanding and processing is more challenging relative to English counterpart. The aim of this paper is to improve the existing algorithms. For this aim, we compared in detail varieties of text similarity computation method in Chinese information processing, and analyze their characteristics and defects. Furthermore, a method of phased integrated semantic similarity computation, which is an improved method, has been put forward. Finally, in terms of text similarity theory, methods and application, and the characteristics of the text clustering, the application of text similarity in the text clustering is discussed.The main achievements in this paper are as follows:1) The study of existing text similarity algorithms.This paper discusses several existing similarity computation methods, and introduces the key technologies and problems of similarity computation, which has prepared the theoretical basis for next study.2) Proposed a method of phased integrated semantic similarity computation. Understanding Chinese language from the view of semantic is more appropriate than from the statistical method. We proposed a method that computes in sections from the sentences, paragraphs, to the whole text. Combining with the characteristics of each section, the text semantic factors is also blended in each section, striving to the best accuracy in the similarity computation calculation.3) Taking the text clustering as an example, to which similarity computation method proposed in this paper is applied. The influence and application of text similarity computation have been practically presented, indicating that the improved algorithm has achieved better results.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Semantic Similarity Calculation Text Field Vector Space Model
2	Study On Similarity-based Text Clustering Algorithm And It's Application
3	Research And Implementation Of Text Similarity Algorithm Based On Semantic Fusion
4	Chinese-Old Bilingual Text And Sentence Similarity Calculation Research
5	Research On Semantic Similarity Computation And Applications
6	Research On Semantic Similarity Between Words And Between Short Texts Based On WordNet
7	Research On Document Clustering Based On Semantic Similarity Of Hownet
8	The Research About Text Similarity Measuring Through Hamming-Distance And Semantics
9	Research On English Text Clustering Method Based On Vector Space
10	Study On The Chinese Text Clustering Algorithm Based On Semantic Similarity