Font Size: a A A

Text Similarity Computing Theory And Applied Research

Posted on:2012-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:J H MaFull Text:PDF
GTID:2218330368489010Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
One fundamental and important work in information processing is text similarity computation, which is the key technology in the textual data mining that related to many important application researches, for example, in the area of document copy detection, text categorization, text clustering, information retrieval, and etc. It is worth further research and discussion because of its wide applications.The existing text similarity computing model has weaknesses such as deficiency in rationale and incompleteness in document properties fitting. Chinese text understanding and processing is more challenging relative to English counterpart. The aim of this paper is to improve the existing algorithms. For this aim, we compared in detail varieties of text similarity computation method in Chinese information processing, and analyze their characteristics and defects. Furthermore, a method of phased integrated semantic similarity computation, which is an improved method, has been put forward. Finally, in terms of text similarity theory, methods and application, and the characteristics of the text clustering, the application of text similarity in the text clustering is discussed.The main achievements in this paper are as follows:1) The study of existing text similarity algorithms.This paper discusses several existing similarity computation methods, and introduces the key technologies and problems of similarity computation, which has prepared the theoretical basis for next study.2) Proposed a method of phased integrated semantic similarity computation. Understanding Chinese language from the view of semantic is more appropriate than from the statistical method. We proposed a method that computes in sections from the sentences, paragraphs, to the whole text. Combining with the characteristics of each section, the text semantic factors is also blended in each section, striving to the best accuracy in the similarity computation calculation.3) Taking the text clustering as an example, to which similarity computation method proposed in this paper is applied. The influence and application of text similarity computation have been practically presented, indicating that the improved algorithm has achieved better results.
Keywords/Search Tags:Texts similarity, Vector Space Model, Semantic similarity, Text clustering
PDF Full Text Request
Related items