Font Size: a A A

A Research On Underlying Topics Visualization Based On MDS Model

Posted on:2014-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M ZhaoFull Text:PDF
GTID:1228330467985026Subject:Information Science
Abstract/Summary:PDF Full Text Request
The purpose of this study is to mine, represent and explain underlying topics in text sets by visualization method, to represent underlying topics in different levels, to reveal relationships among these topics, and to employ underlying topics visualization method to realize knowledge discovery in specific domain.In this study, the author demonstrated the rational of using terms to indicate underlying topic, the rational of employing the proximity of terms in transposed vector space to represent terms cohesion, the rational of mapping the proximity to visual space by Multi-Dimensional Scale (MDS) algorithm and the author constructed the method process of underlying topics visualization. The author believed a term set with cohesion relationships can be found to represent the underlying topics in text collection. The terms belonged to the same topic could be extracted if the cohesion relationships were abstracted. The proximity among terms in transposed vector space was calculated to represent and abstract the cohesion relationships and then terms will be neighbored in the space. Due to the invisible character of high-dimensional space, MDS was employed to map the proximity from invisible vector space to visible MDS graph. Terms, which are attached to the same topic, were still be neighbored in MDS graph while the topology in high-dimensional space was reserved. Then, underlying topics emerged as clusters in MDS graphs. The method process overcame the dependence s of co-occurrence counts and select seed-term before using co-word analysis and Database Tomography methods.Open coding technology was introduced to overcome the limitations of visual space area. Parent underlying topics were generated by coding analysis so that each parent underlying topic can be represented respectively in a MDS graph. Two strategies were designed:"pre-segmentation and post-coding" and "pre-coding and post-segmentation". The former one coded terms from segmentation into several categories which are corresponding to parent underlying topics and this strategy is suitable for text collection formed by short and totally unstructured texts. The latter coded the text collections into several sub-collection by coding chapter heads and this strategy is appropriate for text collection formed by long and semi-structured texts such as business texts and scientific literatures. Aiming to solve the difficulty to explain the MDS results, principles of grounded theory were introduced into the visualization process after the topics were shown in MDS graph. One could review original text to get real meaning of topics, to locate topics in texts, to provide more contexts for topics or terms in them, and to analysis typical individual cases.Context dependence characteristics of topics and words in them were discussed in order to find ideas to improve visualization method process. Three levels of contexts were confirmed:domain context, theme context and linguistic context.Centroid proximity matrix, in which the centroid of terms in vector space was used to represent all the words in a specific parent underlying topic, was designed to observe topics in a high level and to discover relationships among parent underlying topics in a global views.Attribute accumulative proximity matrix, in which attributes from matrix according to different topics were integrated, was developed to seek fresh son underlying topics, to explain the strong relationships among some parent underlying topics, and to provide more linguistic contexts.Finally, method of underlying topics visualization was applied to do risk identification for public companies of computer application services, using verbal content about risk factor in prospectus as texts collection. The results showed that method system of underlying topics visualization could be applied to knowledge discovery of specific domain successfully as well as to reveal underlying topics in different levels and their inner-structure, to reveal connections among topics.
Keywords/Search Tags:Underlying topic, Visualization, Multi-dimensional Scale, MDS, Textvisualization, Knowledge discovery, Grounded theory, Open coding
PDF Full Text Request
Related items