Font Size: a A A

The Semantic Interpretation Of Chinese Verb Nominalization Compound

Posted on:2009-07-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L ZhaoFull Text:PDF
GTID:1118360305956624Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The information on the Web and information retrieval has become an essential part indaily life. Language is the main form of information. It is an urge need to make comput-ers understand the content and semantics of the language information. The computation ofsemantic relations between natural language structures is the key for natural language un-derstanding. The essence of the semantic computation is to compute the correspondencebetween structures and semantic representations. Although there are many forms of lan-guage structures, they can all be reduced to word combinations. This is in accordance withthe trend of lexical approach in language theory.Compound is a consecutive sequence of nominal words which functions as a new nom-inal word as a whole. The semantic problem in word combination has been a major concernfor scholars working in this area, because the research on it has important significance inboth theory and application. However, there are no semantic clues like functional wordsin compound formation as in other language structures which presents a big challenge forcomputing the semantic of compound expressions. This dissertation focuses on the semanticinterpretation of a subset of Chinese compounds in which a verb nominalization is involved.First, as a work of data preparation, this thesis explores the problem of compound iden-tification. Second, it constructs the interpretation models for the two basic types of verbnominalization compounds (NV compound and VN compound), respectively. At last, thisthesis explores the application of attribute knowledge to compound interpretation.Concretely to say, the creative work of this dissertation includes the following aspects:1. The author proposes a method for compound identification based on thesaurus andthe Web. To solve the structural ambiguity in Chinese verb and noun combination, the iden-tification model introduces two novel feature sets, one is compounding ability and the otheris referential patterns. The acquisition of such features doesn't rely on the context of thecompound candidate. Instead, it uses two independent sources: thesaurus and the Web. Theadvantage of such an approach is that it has the ability to recognize compounds with low frequency in text. The machine learning experiments show that the novel features greatlyimprove the performance of compound identification.2. The author introduces the semantic relations involved in Chinese NV compound,and then, implements a compound interpretation model based on lexical syntactic patterns(LSPs). A new form of LSP is defined which is called functional lexicalized patterns (FLPs).The FLP vector of a NV compound is used as the features for the labeling of its semanticrelations. Different from previous approaches which mainly rely on ontologies or treebanks,the model exploits plain text for acquiring the classification features, which makes it morerobust and easy to generalize.3. The author presents the set of semantic relations of Chinese VN compounds, andthen proposes a translation-driven bilingual compound interpretation model. The model firsttranslates Chinese compounds into their English equivalents. Then, it explores the Englishcompounds as additional information to interpret Chinese VN compounds. The main merit ofthe model is that it can use the cross linguistic resources to interpret multilingual compoundsat the same time. The experiments verify that bilingual model has a better performance thanthe monolingual model.4. For the purpose of application in compound interpretation, the author designs aframework for the construction of attribute knowledge base. It includes two phases: the firstis attribute word acquisition and the second is attribute host computation. In the first phase,the author proposes a multi-resource bootstrapping algorithm which boots off from a set ofChinese morphemes and exploits both an MRD and the Web as the resource. In the secondphase, the author models it as a problem of selectional constraint resolution. The character-istic of the framework is that it can dynamically acquire attribute-centered knowledge. Theexperiments show the algorithms are very effective.5. Applying the acquired attribute knowledge, the author presents an attribute-basedword similarity model. Different from previous models which mainly explore the IS-A tax-onomies, the proposed model represents a word concept by the attribute word it can take.If two concepts share more important attributes, they will be more similar. Such an at-tribute representation of concepts can make more fine-grained difference between word con-cepts. The attribute-based word similarity model gets good results evaluated both in standarddatasets and in the application to compound interpretation.
Keywords/Search Tags:Compound, Verb Nominalization, Semantic Interpretation, Attribute Knowledge, Word Similarity
PDF Full Text Request
Related items