Font Size: a A A

The Lsa-based Text Analysis

Posted on:2009-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X J DongFull Text:PDF
GTID:2208360242491103Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In Chinese information processing, text analysis is a fundamental question which contains many aspects, such as the lexical analysis, parsing, semantic analysis, and so on, and it is an important prerequisite for advanced applications and bottlenecks to information extraction, information retrieval, automatic quiz on, and it has become a challenging Scholarship.This article detailedly introduces the text analysis related essential technologies, the development stages, the existing problems and the development direction. In this paper, latent semantic analysis (LSA) is deeply studied, the integration point between it and text analysis is found, and mutual information theory and ontology technology are brought in at the same time. These theories are used to learn the mutual relations of words in sentences, and to analyze which kinds of words can help to distinguish texts best.Based on LSA, combined with mutual information and ontology, a text analysis system which is based on statistics is realized. This paper describes the designing and building process of the split sentence modules, the ontology modules and so on.In my system implemenatation, the words stored in the coupus need multi-time operations for statistic purposes. In this case,a optimized algorithm is designed which can also be used for semantic analysis on some words. Then, the procedure of SVD for words matrix through Matcom is introduced. Finally, the experiment results are compared via Level technologies.
Keywords/Search Tags:latent semantic analysis, correlation, entropy, the weight value, SVD
PDF Full Text Request
Related items