Font Size: a A A

Spoken Language Analysis In Chinese Spoken Dialogue System

Posted on:2009-10-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1118360305956336Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, man-machine spoken dialogue system (SDS) is an active research field with wide application demand. There exist lots of non-written languge phenomena in spoken language, such as ellipsis, pause, parenthesis, repeat and re-start. Most of spoken language sentences are grammatically incorrect or ill-formed. So in Chinese SDS, the key and difficult problem is how to understand spoken language. Template matching processing is a popular method to do this in currunt Chinese SDSs. But the flexibility of spoken language makes the template amount very huge, so it influences system's accuracy. In this thesis, we focus on understanding spoken language in Chinese SDS, attempts to analyze the spoken language phenomena in conceptual level. The background of this research work is a Chinese SDS――SHJTQ (shanghai Jiaotong Query System), which can provide information about the best route between any two sites in Shanghai.There are two methods for handling spoken language: statistic method and rule-based analytical method. Statistics analysis does it mainly according to the statistic characteristics of the language structure, regardless of semantic features. So this method lacks of intelligence and reliability. Rule-based method can be divided into two types: logic analysis and concept analysis. Montague semantic is the representative method of logic analysis method. Using model theory, it successfully gives the meaning of fragment English. But it fails in handling true text, especially in explaining Chinese. Concept analysis is put forward by philosophers such as Wittgenstein, Austin and Searle. What the language philosophers and mental philosophers concern is analyzing concept of the word in mental state, felling, emotion. But they seldom pay attention to analyze the concept of the word of reference entity. Most of currunt SDSs analyze spoken language in the applied level, string matching method or combined with some other processing methods is adopted. The severe disadvantage of these kinds of methods is when the string changes or the order of string changes, the analysis will fail. So it can not deal with the flexibility of spoken language. In this thesis, we put forward the thought of intension concept analysis, and analyze the spoken language in the upper concept level. So we explained why such different character strings (expression type) express the same concept.From the aspect of realizing, to store the pronunciation of a string (for example phrase, sentence), we needs 1K data storage space (ignored tonality information). That is to say, ultra massive storage space is required to process the pronunciation of usual dialog. If changes the method, a Chinese character correspond to a template, the pronunciation information of 2000 common character altogether will take limited 2K*1K space. This is due to Chinese has the characteristic that the character combination express concept. So this method can reduce the pronunciation data greatly. But it brings a new question, that is, the language processing become more complex and important. In thhis thesis, using the thought of Chinese intension concept model, we realized conceptual analysis of words in domain-specific SDS, and attained successfully.The intention characteristic of the word in SHJTQ domain (mainly the word belongs to vehicles) is analyzed in this thesis. We purposed a noun has 2 concept characteristic: "definition characteristic" and "the situation distinction characteristic". The appearance characteristic of a word (the situation distinction characteristic) varies in different situation. We proposed a kind of E-A-V (entity-attribute-value) method to represented noun's concept. In our domain-specific SDS– SHJTQ, most of user's spoken language are interrogative sentences. After combining the thought of speech act theory, we reclassified the users'query sentences in SHJTQ, and this directly helped our spoken language analysis. We analyzed the intention concept of spoken sentences in SHJTQ. According to the classified user's query questions, we analyzed the truly user's query sentences in the upper concept level one by one. And we also analyzed several variant phenomena in spoken language. It's a new way of thinking to understand Chinese spoken language.Concept analysis of spoken language in domain-specific spoken dialogue system is applied into SHJTQ. We propose the whole design of the system. Sub-modules of the language-understanding module, such as POS tagging, robust parsing and concept analysis are especially discribed. System performance is tested and analyzed.The novelties of this paper reside in the following aspects.1. Proposed a method called conceptual analysis to analyze Chinese spoken language, which is different with traditional string match analysis method in the application level. Analyzed spoken language in SHJTQ from the aspect of conceptual analysis, explained the question that spoken language is formally flexible, but expresses the same meaning. The other advantage of concept analysis is it helps to establish multi-language SDS. For example, Chinese language has no tense, single plural variety, but other language (like English) contains the variety of the appearance, tense...etc., so if we analyze it in the upper concept level, these appearance phenomena will become weak. The third, in realizing SDS, the template amount which the speech recognition is required has been possible to reduce greatly by using this method. It might give an impulse to the development of spoken language dialog system.2. Proposed using the E-A-V (Entity– Attribute– Value) model to represent polysemy of noun modification. We adopted the thought of intension logic analysis (put forward by Prof. Lu Ru-zhan), decomposed the concept that a word expressed into upper concept, lower concept, defined attribute and expanded features. So, we can explain the relation of word, denotation entity and concept. Our research work also indicates a noun is a phrase item of a denotation entity, it owns two concept characteristic: "definition characteristic" and "the situation distinction characteristic". Next, we analyzed the intention concept of spoken sentences in SHJTQ. It's a new way for understanding Chinese spoken language.3. By using intensional concept method, we analyzed the truly user's query sentences in the upper concept level. And we also analyzed several variant phenomena in spoken language. Our solution for general language phenomena such as ellipsis, reference and negation is also described. The system performance is tested and analyzed.
Keywords/Search Tags:spoken dialogue system, dialogue understanding, Chinese interrogatives, concept analysis, intention characteristic
PDF Full Text Request
Related items