Font Size: a A A

A Study On Visual Clue In Web Tables Based On Graph Model

Posted on:2015-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:W Q LiFull Text:PDF
GTID:2308330464955512Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web contains not only a large number of text data but also a huge amount of table data. Natural language understanding of Web-scale, free-format text is still a difficult and time-consuming problem. In contrast to free-format text data, data of form Web tables is briefer and more structured, which makes it easier for mining. Therefore Web table mining has become a hotspot. There are two popular study of Web table information retrieval. One is to understand table with the help of the corpus, however, it is limited by the richness of the corpus. The other is to mine information with the help of visual clue, and it is also the main research way in the paper.A graph model to represent various visual features of Web tables including structure of rows and columns, background and color of cells, and font and size of text is proposed as well as its construction method in order to dig out the meaning of the table visually. Based on the graph model, two problems are mainly studied in the paper. Firstly, kinds of visually parallel relationships in Web tables are formally defined. An automatic algorithm to extract these relationships from Web tables is also provided. Experiment results on it show that there is a significant correlation between the extracted visually parallel relationship and semantic relatedness. Secondly, the orientation of the table contains abundant meaning semantically. The orientation of the table is defined, and a set of the visual features on the table are defined and used to train several classifiers. Experiment result shows the random forest classifier is able to make the precision above 92% on the orientation of the table problem. The two problems based on the graph model show that visual information of Web table can be conducive to other semantic analysis work.
Keywords/Search Tags:Web table mining, visual clue, graph model, visually parallel relationship, table orientation
PDF Full Text Request
Related items