Font Size: a A A

Research Of Similarity Based On Relative Word Frequency

Posted on:2009-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360245953597Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In Chinese information processing, word similarity computing is widely applied in the area of information retrieval, machine translation, automatic question answering, text mining and etc. It's a question of much essential and important that people study as a hotspot and difficulty for a long time. Now there are many various methods of similarity computing, whose disadvantage is the negligence of the context semantic analysis .the paper has tried to overcome the exiting disadvantage to improve the accuracy of the result, mainly adopting the method of computing the context similarity supplied by the how-net from the Chinese semantic aspect.The automatic Chinese word segmentation as the basic part of Chinese information processing is the essential precondition in the context similarity computing. The accuracy of the final calculated result depends on the process of the word segmentation. this paper has introduced the background and development on the study of The automatic Chinese word segmentation technology. We have adopted the method of the relative word frequency calculating model supported by the semantic information in order to disambiguate the Chinese context word segmentation. Through the process, we can get the disambiguation information from the context in which the ambiguous words lie, which contributes to achieve the disambiguation purpose and improve the efficiency of the word segmentation disambiguation compared with the former methods.After disambiguating the word segmentation, the semantic similarity among feature glossary is firstly analyzed, then the semantic similar features into sub-clusters is put which make the primary feature set classified into several sub-clusters. The semantic similarity between the features in the same feature collection is higher than that in the different ones. Finally it condenses the sub-clusters to reduce feature dimension. Also, the experimental results show that the method can get better performance in text classification. At the same time, we have attained the purpose that the Chinese text word segmentation computing and its similarity could be operated by the computer system. Also, technological texts are tested as examples to validate the method that used.As is shown in the research, the semantic similarity computing is efficient. This research can contribute to some domains in Chinese information processing, it will be valuable and have good prospect to a certain extend.
Keywords/Search Tags:Word segmentation disambiguation, Relative word frequency, Semantic similarity, How-net
PDF Full Text Request
Related items