Font Size: a A A

Measuring Soft Semantic Similarity For Text Labeling

Posted on:2015-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:X C NieFull Text:PDF
GTID:2348330485994226Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the exponential growth of multimedia content on the Internet, there is an urgent demand for automatic and effective techniques to organize and manage the massive data. For the purpose of effective semantic retrieval, helping the users to find what they are interested in is very important. Moreover, a well-segmented multimedia document is an significant prerequisite for high-level semantic browsing. The aim of story segmentation and co-segmentation is to segment the broadcast news transcripts into sequence of topically coherent stories. Technically, there are two important factors highly affecting the segmentation accuracy:(1) the word-level and sentence-level semantic similarity;(2) the criterion and model to segment the documents. The previous studies mainly focus on designing proper partitioning criteria and models, while measuring semantic similarity is ignored.In this dissertation, we focus on the study of Chinese semantic similarity measurement and its application to story segmentation and co-segmentation. We divide prior knowledge into three levels:(1) 0-prior knowledge;(2) 0.5-prior knowledge;(3) 1-prior knowledge.Then, we can construct an uniform correlated affinity graph to encode semantic correlations,together. We conduct parallel affinity propagation to derive more reliable semantic affinity measurement. We measure semantic similarity between two words by the similarity of subgraphs centered at them in the affinity graph to deal with the synonymy and the similarity of their contexts in the given sentences to deal with polysemy. Based on this, we extent the classical cosine similarity to encode both the latent semantic similarity among words and their contextual similarities. Experiments on the benchmark CCTV and TDT2 datasets demonstrate the superior performance of the proposed semantic similarity measurements over other state-of-the-art data driven and common sense driven semantic similarity measurements in the application to story segmentation and story co-segmentation.
Keywords/Search Tags:Semantic similarity, story segmentation, story co-segmentation, correlated affinity graph, parallel affinity propagation
PDF Full Text Request
Related items