| At present,there are a lot of legal disputes on the similarity of literary works both at home and abroad.But most researches are conducted in the field of computer science,and fewer scholars pay attention to the linguistic method which is basic and essential for such language-centered cases.The case that Guo Jingming’s “Meng” violated Zhuang Yu’s copyright has sparked controversy in society.Therefore,based on both the principle of simhash algorithm and Discourse information theory,this paper attempts to analyze the text similarity of two novels in the case of Guo Jingming being sued for infringing upon Zhuang Yu’s copyright,aiming to provide a set of feasible methods to effectively identify the text similarity.Thus,this paper adopts qualitative and quantitative analysis to analyze the text similarity of the two novels from the lexical level and discourse level,specifically analyzing the frequencies and the importance of WO,WA and WF,so as to select a more suitable method for the Chinese texts similarity analysis.It is found that at first,the simhash results show that the two suspected novels are not similar at the lexical level,neither from the perspective of the whole text,nor from the similar plots.But in terms of linguistic method,both in the whole text and similar plots,there does exist a very low percentage of lexical hapaxes in each novel,a comparatively high percentage of shared hapax legomena(words that occur only once in a text)and have similar lexical richness.Second,at the discourse level,both from the whole text and pin-pointed similar plots,the distribution of the protagonists in the two novels is similar,and WO,WA and WF appear most frequently in the novel text.Therefore,based on the frequencies of WO,WA and WF,the author conducts an independent sample t-test,and the statistical results also prove that they are similar.The innovation of this paper lies in the combination of computer linguistics and general linguistics to determine the similarity of two texts.It also provides a new research perspective for text similarity analysis and expands the application of Discourse information theory.What’s more it demonstrates a practical way to test the similarity,such as the frequencies of the main character,WO,WA,and WF.It also provides a set of linguistic features which is effective in similarity analysis.But due to the length of the novel and time limitation,not all the 15 Ws are annotated.So in further study,all the information knots will be tried to test the similarity. |