Font Size: a A A

Research Of Copy Detection Of Chinese Scientific Papers Base On Text Structure And Content

Posted on:2008-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:K M ChengFull Text:PDF
GTID:2178360215951091Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of digital library and the popularization of the application of Internet, the network has been an important information sources to most people, especially to scientist and student. The scientific information in network has given them efficiency of communication of science. On the other hand, the easily access to the scientific information and the easy way to "copy-and-paste" have give chances to plagiarism or abuse and redistributing information illegally. It very necessary to research the technique of scientific documents copy detection .The technology of documents copy detection is a powerful measure to protect intellectual property and improve efficiency of information retrieval. Documents Copy Detection (DCD) is to judge whether the given document plagiarize contents of other documents in the database, which plagiarism occurs in some way, such as by duplicating partial or total document contents, by using different words or sentences to express the same meaning of the texts of pervious documents in the database.Firstly, this paper introduces the overviews of theories of the technology of document copy detection and analyses the key technologies of current copy detection systems, such as system structure, document representation in computer, the algorithm of document similarity etc. And it use Vector Space Model (VSM) to express Chinese scientific document and compute similarity base on VSM. Secondly, it analyses the characters of the Chinese scientific document, and presents the method of document representation base on structure and content, on which is tree structure and VSM content representation. Thirdly, it presents the system structure of Chinese scientific document copy detection base on tree structure with the weight of the features extracted out of the document, and give new definition of document similarity and a new function of total copy detection. Finally, some exploratory experiments have proved the validity of the system based on these researches.
Keywords/Search Tags:Document Copy Detection, Vector Space Model (VSM), Similarity, text representation
PDF Full Text Request
Related items