Font Size: a A A

Layout Algorithm Based On Document Vectorization For Academic Paper Visualization System

Posted on:2019-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:X H YueFull Text:PDF
GTID:2428330590467391Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visualization of papers is an important way for digging the potential value of papers and promoting human beings' mastering the inherent law of knowledge creation.However,the paper itself does not contain coordinate information that it needs to be assigned a coordinate value before being displayed.Therefore,designing a layout algorithm that can profoundly describe the relationship between papers is very important for visualization system.All known paper visualization systems use the force-oriented graph method based on citation relations to generate coordinate layouts.However,the citation relationship itself has the problems of lacking,inaccuracy and needing heavy manual adjustments that the generated layout contains many isolated points and artifacts,and is poorly distinguishable between categories.The fundamental problem with this layout method is that it ignores the contents of papers.For the first time,we propose a layout algorithm based on the contents of papers.We propose that firstly transforming papers into high-dimensional document vectors by natural language processing model,and reducing the document vectors to two-dimensional or three-dimensional coordinates by dimension reduction methods.We design supervised layout algorithms based on pure content,unsupervised layout algorithms based on pure content,and a fusion layout algorithm by predicting citation relationships through content.The set of layout algorithms we proposed can effectively process paper data with category annotations,paper data with citation relationships and paper data without citations and annotations.We conducted a large number of experiments on the Arxiv dataset and the IEEE dataset and compared each model with the visualization effects.At the same time,we also designed the quantitative evaluation index to evaluate the quality of the generated coordinates from the overall layout and local layout.The content-based layout algorithms outweigh the previous reference-based algorithms in both visualization and quantification.We also extend the citation prediction to the text relation prediction and propose new text relation classification models.Experiments on the PDTB,a standard data set of discourse relationships,also verify the effectiveness of our algorithms.
Keywords/Search Tags:visualization, document vector, layout algorithm, relation classification
PDF Full Text Request
Related items