Font Size: a A A

Text mining with the exploitation of user's background knowledge: Discovering novel association rules from text

Posted on:2007-12-25Degree:Ph.DType:Dissertation
University:New Jersey Institute of TechnologyCandidate:Chen, XinFull Text:PDF
GTID:1458390005488135Subject:Computer Science
Abstract/Summary:
The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments.;This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user's background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (user-oriented novelty).;The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more they co-occur, the shorter the distance. The latter considers the common connections of two keywords with others in the concept hierarchy. It is defined as the length of the shortest path connecting the two keywords in the concept hierarchy: the longer the path, the larger the distance.;The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the user-oriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules.
Keywords/Search Tags:Rules, Text mining, Background knowledge, Documents, Objective, Measures
Related items