Font Size: a A A

Automatic Ranking List Extraction From Web Pages Based On Visual And Sematic Information

Posted on:2018-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZhangFull Text:PDF
GTID:2428330596989139Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In this thesis,we focus on a typical web structural information,with is commonly called ranking lists,and its auto-extraction.Compared with other web structures,ranking lists often contain more information,a richer variety,and a higher quality.The ranking lists can be used as an important source of data for some of the full-field knowledge base and for some Q & A systems.In this paper,we propose an efficient,end-to-end ranking list extraction algorithm.The algorithm is based on both visual and semantic information.Based on this algorithm,we can get more than 1.7 million ranking lists from 1.7 billion web pages with 92.0% accuracy and 72.3% recall rate.
Keywords/Search Tags:Data Mining, Web Information Extraction, Knowledge Base
PDF Full Text Request
Related items