Font Size: a A A

Chinese Phrase Similarity Algorithm And Their Applications

Posted on:2009-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2178360242492658Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a fundamental technology in Chinese information processing, the calculation of text similarity is applied in such fields as text classification, text clustering and information retrieval and so on, which has been concerned and studied by many scientists for a long time. In the huge text data brought by highly development of the information technology, a large majority of which are short text or phrase. So, during the short text information processing, the problem of phrase similarity calculation becomes more and more important. This paper addresses a new algorithm for Chinese phrase similarity to give a solution to the problem of information processing about Chinese phrase. In the process of algorithm's development, this paper analyzes the phrase matching location, the offset values of matching location, the length of matching text and other factors. Then a function on Chinese phrase similarity calculation is put forward, and it's implementation process is described.Around the calculation of Chinese phrase similarity, the contributions of this paper are as follows:Firstly, several text similarity algorithms are researched. The present of phrase similarity and classic algorithms on phrase similarity are analyzed and its application fields and characteristics are studied. The application in text clustering on textsimilarity calculation and some text clustering algorithms are introduced.Secondly, based on analysis and comparison with commonly used algorithms of the similarity calculating, a new Chinese phrase similarity calculation method is put forward. Then the algorithm's reasonability is tested. By putting different text similarity algorithms in the same clustering method, the algorithm's efficiency is tested.Finally, the algorithm on Chinese phrase similarity is applied in the module of similar courses retrieval and elimination of a school training plan MIS, which realizesthe clustering of similar courses. Also, the entire system is designed and implemented. The research and its outcome will have valuable reference and good applicable prospect to many fields in Chinese information processing especially in the problem of Chinese phrase processing.
Keywords/Search Tags:information processing, text clustering, Chinese phrase similarity, matching offset, similar courses retrieval and elimination
PDF Full Text Request
Related items