With the development of computer network and application technologies, Internet becomes the primary channel of information memory and communion, but it also brings the disaster of information high-speed increase. So information processing technologies such as Data Excavation, Information Retrieval and Text Classification emerge. As the basis of those information processing technologies, text similarity measurement technology has deep study significance and extensive application prospect.Parameters in text similarity measurement such as similarity threshold, precision, recall rate, size of moving window, shingle measure coefficient threshold, extractive rate and length of text are interrelated and complicated. The thesis firstly analyses pivotal technologies such as text mathematical expression, feature generation, feature picking and similarity calculation according to the clue of text similarity measurement implementation process; based on this, it implements and compares two kinds of the most typical algorithms; then it studies the correlation of those parameters combining the shingling algorithm experiment; at last it proposes the parameters optimization suggestions, and proposes and analyzes the parameters such as similarity threshold adaptable algorithm for text similarity measurement.The algorithm is applied to the system of text similarity measurement for the fund which has 7378 proposals in 2009. The results show that the algorithm has high performance in pratical use, and can make precision and recall rate achieve up to more than 95% no matter the length of the text is long or short. |