Font Size: a A A

Research On Semantic Orientation Of Chinese Texts Based On Topic Correlation

Posted on:2010-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LaiFull Text:PDF
GTID:2178360275470363Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Semantic orientation of texts, an important field of text processing, has great significance on research of information filter and retrieval, network supervision and control, etc. This paper focuses on techniques of representation and weight computation of text vectors and implements a semantic classification system for Chinese texts based on topic correlation.On the basis of traditional topic classification systems, we take the importance of semantic information of texts into account and propose some new methods and ideas for semantic analysis. Our contributions include:Firstly, improve the vector representation of texts for semantic classification by presenting the concept space vector model (CSVM) for texts. CSVM is proposed with the fact that semantic orientation is always unveiled by the evaluation of the author on the topic object or its features or even the relationship between features. The new model contains the entire semantic information of texts.Secondly, discuss and propose the algorithms for concept extraction and induction. Concepts are used to represent the main frame of the text and are retrieved based on HowNet. We solve the problem of term correlation by means of concept induction. Pseudo codes and flow charts are demonstrated in the paper. Our experiment shows a remarkable increase of precision by 4% with consideration of concept induction.Thirdly, topic correlation function is introduced and used as the criterion for term selection. We discuss the connection of similarity and correlation between concepts in detail, and then construct the computation model for concept correlation based on HowNet.Fourthly, propose the algorithm for computation of semantic weights of concepts on the basis of semantic word dictionary and dependency relationship analysis of sentence chunks. The presented algorithm takes the effects of adverbs of degree into account and resolves the problem introduced by insignificant weights using inversed document semantic value (ids). The result of experiments shows an increase of precision by 2% with consideration of adverbs of degree.In the last part, the scheme of semantic classification system for Chinese texts is illustrated and implemented. We use KNN, Na?ve Bayes and SVM to train different classifiers for texts with different topics and evaluate the effects of dimensions of concept space. The data of experiments show the good performance of the system with average precision and recall of 83% and 84%, respectively.
Keywords/Search Tags:semantic orientation, concept space vector model, concept extraction and induction
PDF Full Text Request
Related items