| With the development of computer industry, the enormous economic and social efficiency, make the value of software gain more and more attention. How to protect software intellectual property rights and the rights and interests of software developers, is worthy of study. Software source code homology identification can provide effective evidences to intellectual property. The technology of Homology identification of source code is to find the similarity of two or more software systems, which can provide strong evidences to software intellectual property.At present, the source code homology identification techniques can be divided into two categories, one is on text level, the other on the level of grammatical structure. The principle of Text-based identification technologies is to detect the same or similar characters of the source files, and then obtain the similarity of two or more files. These technologies are simple and effective for the low-level code Plagiarism, and there are many mature tools. Technologies based on grammatical structure, with the idea to generate an abstract syntax tree of a source code file, and then compare the syntax tree structure information to determine the similarity of the source files, are deep identification technologies, which can detect much more complex code Plagiarism. However, because of the complexity, few tools can be used now.Integrated measurement of homology of software is a relatively new field. How to effectively use a variety of homology identification methods and Integrate the results reasonably is worthy of study. Integrated measurement methods are divided into two categories: quantitative and qualitative, which apply to different application scenarios.In this paper, with much theoretical study and a lot number of experiments, it describes a differential system to identify homology of source code. The system implements the text and the abstract syntax tree level identification technologies, and proposes using two comprehensive measure of homology algorithm:multiple linear regression analysis and hierarchical analysis, to identify the source code homologous, which provide quantitative and qualitative analysis of the results, making the identification more comprehensive. |