| Now students must write professional thesis for graduation in universities. In the information age, it’s convenient to get information by sharing resources. However, there are some negative impacts that some students plagiarize others research results opportunistic. So thesis copycatted always troubled academics seriously.Thesis detection is necessary, and it’s widely used in some fields, such as patent protection, intelligent retrieval, text classification. So far VSM(vector space model) similarity algorithm and based on hownet similarity algorithm are widely used. But the former ignores the relationship between the words, there is data of high dimensional sparse. The latter is still in words similarity stage, and it ignores the importance of words. So it’s worthy of research the algorithm of paper similarity.Based on the research of thesis similarity algorithm, improve the algorithm trying to increase computing efficiency, and apply it to thesis similarity detection. Main research is:1. Study the theoretical knowledge of similarity computing, the current situation and research results of similarity algorithms.2. Study the most common similarity algorithms, focus on VSM and hownet similarity algorithm, analyze theirs advantages and disadvantages, and improve the insufficient. Add term position in TF-IDF algorithm to remedy the limitation of term frequency, Add semantic density and depth in hownet primitives similarity algorithm to remedy the limitation on primitives relative position.3. Come up with a new model based on the combination between VSM and hownet, The model sees the same or similar words as the same dimension, remedy limitation on VSM semantic layer and hownet ignoring the importance of words.4. The algorithm consists of three layers:word, sentence, paragraph, and they are merged together. Join words similarity to sentence similarity, melt sentence similarity to paragraph similarity, in the end, put paragraph similarity into thesis similarity. Hownet words similarity computing was expanded to thesis similarity computing.5. Design the system of paper detection, and experiment to prove the effect and come out the conclusion. |