Font Size: a A A

Research On Disambiguation Of Chinese Words And Phrases

Posted on:2009-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y QinFull Text:PDF
GTID:1118360245469488Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The boom of network technology cause the issuance and sharing of information to span the limitation of space and time, which spurs researches on automatic high-quality processing of documents based natural languang processing and makes them hot. Ambiguity is popular in natural language. Consequently the computer must face and resolve various ambiguities when encountering natural language files. Accordingly study on disambiguation is one of vital problems of natural language processing.Because of complexity in language, we deem there is no single and simple analysis can cope with the whole of ambiguity and high performance of disambiguation. Research on ambiguity from several angles may reach the goal. Therefore we try to make the research in two angles.Firstly horizontal angle based on theory of comprehensive information. According to the theory, information consists of syntactic, semantic and pragmatic information. So does language. There is ambiguity on each level, that is, syntactic ambiguity, semantic ambiguity and pragmatic ambiguity. This angle can guide us from language logicality to clearly study characteristics of ambiguity.Secondly vertical angle based on Chinese syntax system. The main syntactic units consist of word, phrase, sentence and such. Ambiguity occurs on each unit, there is lexical ambiguity, phrasal ambiguity and sentential ambiguity. In this point disambiguation can be carried on expediently. And we also hope to have chance to find the general disambiguation approach.Combining these two angles, the content of this paper mainly consists of word and phrase boundary ambiguity on syntactic level and polysemy of word and functional and structural ambiguity of phrase on semantic level.In research, we focus our study on categories of ambiguity, detection of ambiguity and disambiguation. Research on ambiguity categories is to find characteristics of them. Detection of ambiguity is the task of making clear of the position of ambiguity occurrence. Finally based on detection, disambiguation tuned to ambiguity characteristics is the primary task of language processing.The two analysis angles and three tasks draw the outline of this paper.The outcomes of this paper are listed as follows.1. Study on categories of Chinese ambiguity. There are two classes in syntactic ambiguity, that is overlapping ambiguity and embedding ambiguity. Study on automatic collection approach of ambiguous clusters and disambiguation in Chinese word segmentation.2. Improving the performance of Chinese chunking. Cascaded classifiers are applied to chunking task, lessening training time of the learner sharply. Further research is done on the semantic collocation among words in noun phrase for recognizing boundary of noun phrase. There are also some works on chunking spanning different corpus and disambiguation along with it.3. Putting forward on the idea of granular phrases to cope with the vague definition of Chinese phrase. Fulfilling the definition of granularity noun phrases and the separate recognition approach.4. Fulfilling sentence skeleton recognition based on functional phrase. Functional phrases of sub-sentences are recognized for syntactic functions of phrases.
Keywords/Search Tags:overlapping ambiguity, embedding ambiguity, Chinese word segmentation, word sense disambiguation, chunking, granularity phrase, function phrase
PDF Full Text Request
Related items