Automated content assessment of text using Latent Semantic Analysis to simulate human cognition

Posted on:2001-08-04

Degree:Ph.D

Type:Dissertation

University:University of Colorado at Boulder

Candidate:Laham, Robert Darrell

Full Text:PDF

GTID:1468390014959047

Subject:Psychology

Abstract/Summary:

Latent Semantic Analysis (LSA) is both a theory of human knowledge representation and a method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. Simulations of psycholinguistic phenomena show that LSA reflects similarities of human meaning effectively. The adequacy of LSA's reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word-word and passage-word lexical priming data; it accurately estimates learnability of passages by individual students and the quality and quantity of knowledge contained in an essay.; To assess essay quality, LSA is first trained on domain-representative text. Then student essays are characterized by LSA representations of the meaning of their contained words and compared with essays of known quality on degree of conceptual relevance and amount of relevant content. Over many diverse topics, LSA scores agreed with human experts as accurately as expert scores agreed with each other.; LSA has also been used to characterize tasks, occupations and personnel and measure the overlap in content between instructional courses covering the full range of tasks performed in many different occupations. It extracts semantic information about people, occupations, and task-experience contained in natural-text databases. The various kinds of information are all represented in the same way in a common semantic space. As a result, the system can match or compare any of these objects with any one or more of the others. LSA-based agent software can help to identify required job knowledge, determine which members of the workforce have the knowledge, pinpoint needed retraining content, and maximize training and retraining efficiency.; Computational models of concept relations using LSA representations demonstrate that categories can be emergent and self-organizing based exclusively on the way language is used in the corpus without explicit hand-coding of category membership or semantic features. LSA modeling also shows that the categories which are most often impaired in category specific semantic disnomias are those that show the most internal coherence in LSA representational structure. If brain structure corresponds to LSA structure, the identification of concepts belonging to strongly clustered categories should suffer more than weakly clustered concepts when their representations are partially damaged.

Keywords/Search Tags:

LSA, Semantic, Human, Content, Text

Related items

1	Study On Method To Automatically Analyze The Text Structure Based On The Relevancy Computing Of Text Content
2	Sensitivity of Semantic Signatures in Text Mining
3	Web-based Chinese News Video Semantic Content Security Analysis
4	A Designing And Implementation Of Network Content Audit System Based On Text Analysis
5	Research On Image Semantic Knowledge Extraction And Application Of UGC Combined Image And Text
6	Research On Ontology-Based Semantic Text Categorization
7	Research On Content Sifting And Storage Mechanism Of Cross-modal Image And Text Data Based On Semantic Similarity
8	The Research And Implementation Of Assistant Reading System
9	Research Of Chinese Text Preprocessing Based On Semantic
10	Research On Key Technologies Of Semantic Calculation Of Sports Human Behavior