Font Size: a A A

Chinese Document Content Similarity Detection Methods Research

Posted on:2011-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:D Y XuFull Text:PDF
GTID:2208330332972993Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The researches of the methods which detect similarity from contents of the Chinese documents are basic technology in the Chinese information processing. Especially in the era of information explosion whether checking for plagiarism or finding similarity documents are required to use the technology. In recent years, with developing of detection methods, it provides a good foundation for the research. But there is no perfect method that can be acceptable now.The scope of this article is about the methods which detect similarity from contents of the Chinese documents which is always a difficult thing, it very different from the method of detecting single object. Document is collection of objects there are many uncertainties about this collection. Similarity calculation of a single object can be used as one of the technologies calculating the document similarity. But it can not be getting accurate result.In this article, review the method of traditional similarity calculation at first. The documents that we selected are on-line news. On-line news had a lot of advantages, such as get simple, rich contents with the typical characteristics of the Chinese documents. In this article, analyzing the characteristics of Chinese documents, find the key issues of the detection. We analy these issues detailed and provide a method for similarity calculation of Chinese documents. Then, given the standard of similarity which get by a large number of similarity detection model analysis, based on the characteristics of the Chinese documents. In the end of article, through detecting 30 articles in a variety of similar levels to verify this similarity detection model, and further, assure its effectiveness.Base on a variety of similarity calculation method, thought in-depth analysis to the various parts of the article, integrated use of these technologies, achieve similarity of the Chinese language content of the document detection accuracy purposes. At last, a large number of experimental results show that this method can really achieve the purpose of detection of the document content similarity...
Keywords/Search Tags:Chinese documents, similarity, detection
PDF Full Text Request
Related items