Font Size: a A A

Studying Journal Subjects With Self-Organizing Map

Posted on:2010-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:L AnFull Text:PDF
GTID:1118330332985516Subject:Information Science
Abstract/Summary:PDF Full Text Request
Academic journals are important carriers for scientific communication. With the development of science and the accumulation of human knowledge, the volumes of academic journals and journal articles are rapidly increasing, which consequently leads to the overlapping contents of journals. The same discipline or research field may involve in a lot of journals. The issue of how to effectively collect, utilize and manage academic journals from the aspect of subjects attracts the attention of many organizations and individuals. People's concerns shift from the number of journals to their subject contents. Thus, research on journal subjects has academic significance and practical value. It can assist libraries in purchasing academic journals, novice researchers in making decisions on which research topics or groups they should pursue, researchers in contributing to related journals, academic journals in developing appropriate policies, and research funding agencies in making decisions.Academic journals usually involve in huge amount of subjects. The characteristics of high-dimensional data cause the difficulty in studying journal subjects. Thus, in this dissertation, a visual dimension-reduction method, namely Self-Organizing Map (SOM) is adopted to study journal subjects, which enables users to observe high-dimensional journal subjects in the low-dimensional SOM space conveniently.The dissertation is composed of seven chapters as follows.1. The Theory Foundation of Journal Subject ResearchThis chapter aims to expatiate on the research objects, main contents, research methods and development trends of journal subject research. There are two research objects for journal subject research, namely journals and their subjects. The contents of journal subject research can be listed in eight aspects. They are 1) The indexing of journal subjects,2) The clustering of journal subjects,3) The distribution of a certain category of subjects in journals,4) The classification and clustering of journals based on their subjects,5) The subject composition of a specific journal,6) The comparison among subjects of journals from different countries or districts,7) The analysis of hot subjects of journals, and 8) The development trends of journal subjects. The research methods of journal subject research mainly include bibliometrics, content analysis and expert surveys. In addition, Latent Semantic Analysis (LSA), Multidimensional Scaling (MDS) and artificial neural network (ANN) techniques can also be employed to study journal subjects. The development trends of journal subject research can be summarized in the following aspects. First, researchers need to extricate journal subject studies from heavy and complex statistical tasks and introduce novel and effective methods which are capable of processing high-dimensional data. Second, the research contents need to be broadened to include the clustering of subjects, the clustering of journals based on their subjects and so on. Finally, the research level needs to be improved. For example, when studying the development trends of journal subjects, in addition to summarize the development status of individual subjects, researchers need to measure how much the journal subjects have changed on the whole as the time passes. 2. The Methodology of Applying SOM to Journal Subject ResearchThis chapter aims to describe the principle of Self-Organizing Map (SOM), to compare the advantages and disadvantages of two learning algorithms, to summarize several display styles, to discuss three SOM tools of high-performance and to elaborate on how the SOM technique is applied to journal subject research. SOM is an unsupervised artificial neural network technique which mainly has two learning algorithm, namely sequential learning and batch learning. U-matrix and component plane are two common kinds of SOM display styles. Compared with plane output, three-dimensional output can avoid "border effect" and is more accurate. A comprehensive survey and some trials reveal that three SOM tools have high performance. They are SOM Toolbox, Viscovery SOMine and Databionic ESOM Tools. In this study, SOM Toolbox is utilized. To study journal subjects with the SOM technique, four SOM input matrices are constructed, a novel enhanced U-matrix is defined based on the U-matrix defined by Ultsch in 2003. The author presents four new SOM display styles and explains how they are employed to study journal subjects. They are named Integrative Component Plane, Attribute Accumulative Matrix, Attribute Variance Matrix and Key Attribute Projection.3. The Clustering Analysis of Journal SubjectsThis chapter aims to cluster journal subjects with the SOM technique and generate a hierarchical subject directory to provide suggestions for users to locate relevant subjects, to browse relevant literature and to modify search terms. Fifty-three English journals in the field of library and information science are selected as samples, from which subjects reported in 2007 are extracted. A Subject-Journal input matrix is constructed and trained with the SOM technique so that 2330 subjects are projected onto 163 non-empty SOM nodes. A comparison between the self-defined enhanced U-matrix and the U-matrix presented by Ultsch in 2003 verifies the effectiveness and advantage of the self-defined enhanced U-matrix. The subjects are clustered in 21 categories based on the vicinity of SOM nodes, for example, computer information management, computer information system, education, etc. The size and distribution characteristics of subject clusters are analyzed and the clustering effect is evaluated. The clustering results are also compared with relevant research findings.4. Analysis of Hot Subjects of JournalsThis chapter aims to discover the hot subjects and their distribution among journals. The Attribute Accumulative Matrix is applied to the SOM display in Chapter 3 to identify the hot subjects among the 53 journals in 2007. The results show that although a lot of subjects were involved in these journals, the number of hot subjects only occupied 1.1% of the total subjects and focused on the field of library, computer information system, education and enterprise information. The identified hot subjects are compared with domestic relevant research and the differences between hot subjects with Chinese LIS journals and those with English LIS journals are revealed. Then three important journals are selected and their hot subjects are analyzed. Finally, three groups of hot subjects, namely library, information technology and management information, are selected, upon which corresponding Integrative Component Planes are analyzed to discover the important journals in which the three groups of hot subjects are mainly distributed.5. Similarities and Differences of Journals in Terms of SubjectsThis chapter aims to cluster journals with the SOM technique based on their subjects, to identify the key subjects that differentiate individual journals and to determine the subject characteristics of journal clusters. The Journal-Subject input matrix is constructed and trained with the SOM technique. Fifty-three journals are projected onto 140 SOM nodes and clustered into 19 categories based on self-defined enhanced U-matrix and the vicinities of SOM nodes. The clustering effect is evaluated. Then the Attribute Variance Matrix is applied to the SOM display obtained from Subject-Journal matrix in Chapter 2 to identify the key subjects that contribute the most to the differences among individual journals. The journal SOM display is projected onto the three-dimensional space formed by library, information technology and management information-related subjects to analyze the subject characteristics of journal clusters.6. Development Trends of Journal Subj ectsThis chapter aims to employ the SOM technique to determine how much the journal subjects have changed in a certain period on the whole, to analyze the activeness of subjects in a certain period and the development trends of active subjects. Journal of Information Science (JIS) is selected as the sample and it subjects from 1981 through 2007 are collected. A Year-Subject input matrix is constructed and trained with the SOM technique. The 27 years are projected onto 26 non-empty SOM nodes and the learning results are displayed with the comet mode. The vicinity of the SOM nodes onto which consecutive years are projected is analyzed and the 27 years are clustered into 13 categories based on self-defined enhanced U-matrix. The development course of the subjects of JIS is revealed. Then a Subject-Year input matrix is constructed and trained with the SOM technique. Nine hundred and ninety subjects are projected onto 153 SOM nodes. The Attribute Variance Matrix is applied to identify the active subjects that changed much as time passed. With the help of Attribute Accumulative Matrix, the hot subjects that developed smoothly are identified. Finally, Integrative Component Plane is applied to analyze the development trends of three kinds of active subjects, namely information, computer & network, and library.7. The Limitation and Future Research DirectionThis chapter aims to point out the limitation of this study in terms of data collection and research contents. More journals and longer period will be involved in future research. Attribute Accumulative Matrix and Attribute Variance Matrix can be employed to analyze the total number of subjects, the differences among subject focus and how these indexes change as time passes. Moreover, the comparison among subjects of journals from different countries or districts will help domestic research and journals develop in the field of library and information science.Figures:24, Tables:22...
Keywords/Search Tags:Journal, Subject, Self-Organizing Map, SOM, Library and Information Science
PDF Full Text Request
Related items