Font Size: a A A

The Study And Application Of Graphic Semantic Represtation For Words

Posted on:2016-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:1108330482957858Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Traditional lexical semantic theories usually explain words by using some other words. This kind of lexical semantic information cannot be used to solve the physical scenario oriented natural language processing problem, such as automaticly generating language descriptions for physical scenarios. That’s because this kind of lexical semantic is actually individual symbolic description which has not relationships with the perceptual information obtained from physical scenarios. Building the lexical semantics based on perceptual information, that is, establishing the the perceptual information based lexical definitions, is called "word grounding". The research of word grounding contributes to extending the research scope of lexical semantics and revealing the mechanism of children lexical development. The established lexical semantic can be used in many physical scenario based natural language processing applications. Thus our research is of academic and practical value.In this dissertation we research word grounding based on visual information, and choose graph feature as the foundation of defining lexical semantic. We start with simple lexicons and single geometry, and build the graph feature based lexical semantic representation. For complicated lexicons and synthetic graph, we further develop the corresponding word grounding technic. All these methods are tested in the application of scenario description automatically generating. Specifically, the main content and innovations of this dissertation are concluded as follows:This dissertation presents a conditional probabilistic model based on graph features to represent the lexical semantic. In lexical semantic learning procedure, cross-situational learning strategy is adopted. For simple lexicons and single geometry this model adopts cross-situational learning strategy. Based on the two-channel data of graph-language description, the graph features are aligned to lexicons through image feature extraction, words clustering, semantic association vector calculation and feature selection. Then the multivariate Gaussian distribution over selected features is used to model the meaning of each word in lexicon. The two semantic association vector computation methods of symmetric Kullback-Liebler distance and DBC (distances between classes) measurement are compared. Experimental results prove that symmetric Kullback-Liebler distance is more suitable for measuring the semantic association between visual features and elementary attribute lexicons.A feature selection method based on MSAV (Mean Semantic Association Vector) is presented for elementary attribute lexicons. Feature selection results directly influence the quality of lexicon semantic representation and the accuracy of word selection based on this representation. The word selection experimental results show that the MSAV method does:better than the forward searching algorithm based on multi-variant K-L divergence. With strict evaluation criteria (if a result is different from any one of the three standard results, it will be considered a mistake), the average generation accuracy of 5 elementary attribute lexicon categories is scored as 70%. In order to evaluate the, this dissertation presents an automatic natural language phrase description generation method based on graph feature-word joint probability.A lexical semantic representing method is presented based on synthetic graph. For a synthetic graph, a "structure graph" is firstly constructed which records both the local features and the global features of the synthetic graph object, including the size, shape and location of each region part, as well as the distance and relationship between regions. Furthermore, from the prototype theory, the lexical semantic representations for object category lexicons and complicated attribute category lexicons are built based on structure graph respectively. For an object category lexicon, the positive structure graph instances are used to build the semantic prototype graph set. Then a word selection algorithm is presented for selecting proper words for the new graph, and the average accuracy of object category words selection is 85%.What’s more, a prototype selection method based on hierarchical clustering is presented, and the experimental results show that it greatly reduces the time cost of word selection with an acceptable loss in accuracy. For complicated attribute category lexicons, we present a focal entity selection method based on the entities vector to align the; entities to lexicons automatically. We also research a prototype graph set building method based on maximum similar sub-graph extraction with which a few representative sub-graphs are extracted from the positive structure graph instances of each advanced attribute lexicon as its prototype graphs. In the experiment, the average accuracy of complicated attribute category words selecting is obtained as 86.7%.The lexicon grounding representations and description text generation algorithms as well as syntax rules described above are all integrated in the Graph Description Demo System (GDS). With this tool users can draw a describable single geometry graph or synthetic graph, and GDS will automatically generate its description in natural language.
Keywords/Search Tags:word grounding, graph features, structure graph, prototype, automatically scene describing
PDF Full Text Request
Related items