Font Size: a A A

Text understanding via semantic structure analysis

Posted on:2008-02-21Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Kwon, NamheeFull Text:PDF
GTID:1448390005459605Subject:Computer Science
Abstract/Summary:
To understand and manage large collections of documents about the same topic, people have to interpret various aspects/levels of information simultaneously and integrate them to achieve a coherent overall picture. To support these diverse tasks, we need tools to extract the "important" information and combine the extracted information to a coherent overview. There exists substantial need for overview corpus analysis of large document sets in a variety of applications, including discussion, political debate, or public comment. Most of these applications exhibit tight causal and/or hierarchical relations among and within the texts, so that structure analysis plays an important role in document understanding. This kind of text also frequently contains subjective, opinionated, and biased language, necessitating opinion analysis and clustering.;Prior research has addressed many of these topics, but never together. In this work, we focus on the semantic structure of sentences and discourses to identify "important" information to achieve balanced extraction and find inter-relations between information units. We apply a domain-independent sentence structure analysis based on frame semantics, and provide a discourse-level structure analysis for subjective or argumentative texts. By integrating all this analysis, we provide a novel approach to identifying frame and argument structures, classifying arguments, and integrating the structure for the whole data collection.;Each structure identification and classification module separately shows substantial improvement over the baseline, and the integration module shows the value of utilizing the individual analyses to automatically build a visual and categorized summary of the large document collection. This work points the way to future multifaceted text collection analysis technology that will be required to assist people managing increasingly large and diversified sets of documents.
Keywords/Search Tags:Structure, Text, Large, Document, Collection, Information
Related items