Font Size: a A A

Research And Application Of The Algorithm Of Text Comparison Based On Statistical Theory

Posted on:2007-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:J H YuFull Text:PDF
GTID:2178360212968035Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
We have developed a system for papers comparison in order to restrain the tide of the plagiarism in campus. This system downloads articles which are similar to the assigned student's paper from the searching engine in the Internet and compares with them by itself in order to adjudicate whether there is plagiarism in the assigned student's paper. The system can download and compare a lot of articles in short time. So it raises extremely the efficiency of the work which was done by hand.The thread pool mechanism for programming and the algorithm of text comparison excogitated by statistics theory are the nucleus skills of the paper comparison system.This paper achieves a special thread pool with C# based on the common thread pool described in"Java multithread pattern"to download the relative articles efficiently from the Internet. Aiming at the system for the requirements of the papers comparison system, it was enhanced with the management functions which can change the number of thread in the thread pool dynamically. And it can regulate the system load by assignment degree in order to make the system run efficiently. The algorithm of text comparison excogitated by the test of significance theory in regression analysis is the emphasis of this paper. The test of significance theory regression analysis uses to verify whether the simulated model agrees with virtual model is applied to analyze the distinction of distribution of keywords related between students'paper and articles downloaded. With the theory of statistics, the author concluded two functions which comply theχ2 distribution and used these functions to analyze the keywords distribution in the students'paper and articles downloaded to adjudicate the similarity between these articles.Finally, the author test this algorithm through checking up three articles. The result shows that this algorithm works very well.
Keywords/Search Tags:Multithread, Thread pool, Linear regression, F test, χ~2 distribution function
PDF Full Text Request
Related items