Research Of Similarity Based On Relative Word Frequency

Posted on:2009-02-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2178360245953597

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In Chinese information processing, word similarity computing is widely applied in the area of information retrieval, machine translation, automatic question answering, text mining and etc. It's a question of much essential and important that people study as a hotspot and difficulty for a long time. Now there are many various methods of similarity computing, whose disadvantage is the negligence of the context semantic analysis .the paper has tried to overcome the exiting disadvantage to improve the accuracy of the result, mainly adopting the method of computing the context similarity supplied by the how-net from the Chinese semantic aspect.The automatic Chinese word segmentation as the basic part of Chinese information processing is the essential precondition in the context similarity computing. The accuracy of the final calculated result depends on the process of the word segmentation. this paper has introduced the background and development on the study of The automatic Chinese word segmentation technology. We have adopted the method of the relative word frequency calculating model supported by the semantic information in order to disambiguate the Chinese context word segmentation. Through the process, we can get the disambiguation information from the context in which the ambiguous words lie, which contributes to achieve the disambiguation purpose and improve the efficiency of the word segmentation disambiguation compared with the former methods.After disambiguating the word segmentation, the semantic similarity among feature glossary is firstly analyzed, then the semantic similar features into sub-clusters is put which make the primary feature set classified into several sub-clusters. The semantic similarity between the features in the same feature collection is higher than that in the different ones. Finally it condenses the sub-clusters to reduce feature dimension. Also, the experimental results show that the method can get better performance in text classification. At the same time, we have attained the purpose that the Chinese text word segmentation computing and its similarity could be operated by the computer system. Also, technological texts are tested as examples to validate the method that used.As is shown in the research, the semantic similarity computing is efficient. This research can contribute to some domains in Chinese information processing, it will be valuable and have good prospect to a certain extend.

Keywords/Search Tags:

Word segmentation disambiguation, Relative word frequency, Semantic similarity, How-net

PDF Full Text Request

Related items

1	Context Computing Applications, Word Disambiguation
2	Research Of Word Sense Disambiguation Based On Hybird Features And Rules
3	Research On Chinese Word Sense Disambiguation Based On Semantic Analysis
4	The Research Of Chinese Word Segmentation Disambiguation Based On Word Environment Information
5	A Chinese Unsupervised Word Sense Disambiguation Method Based On Semantic Vector
6	Chinese Word Semantic Similarity Measure And Its Application In Cross-language Information Retrieval
7	Study On Multi-sense Word Vector And Semantic Similarity
8	Research On Word Sense Disambiguation Based On DBN
9	An Algorithm For Optimizing Word Similarity In "Knowledge Network"
10	Research On Query Expansion & Key Technologies Based On Semantic Analysis