Font Size: a A A

A single document-based term weighting scheme by supporting terms

Posted on:2007-02-15Degree:M.SType:Thesis
University:Utah State UniversityCandidate:Cheng, JuanFull Text:PDF
GTID:2458390005989083Subject:Computer Science
Abstract/Summary:
Term weight is an important step in successful text data mining and information retrieval. The classical term frequency-based term weighting schemes considering the local and/or global word distribution present stable performance, but suffer from the constraints of simple statistical characteristics that do not reveal the detailed contextual information in the document.; This thesis presents a new term weighting scheme---term context density (TCD)---to improve the capability of discriminating among terms by mining the contextual information in a single document. To obtain the TCD of a document, a novel information transfer model is designed to mathematically describe the process of contextual information transfer among the supporting terms which are the carriers conveying relevant semantic information in the context of the textual units in a document.; Comprehensively controlled experiments on several well-known text collections, such as Reuter's and Newsgroup, validate the effectiveness and excellence of the new weighting scheme.
Keywords/Search Tags:Weighting, Information, Document
Related items