Automatic Ranking List Extraction From Web Pages Based On Visual And Sematic Information

Posted on:2018-08-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Zhang

Full Text:PDF

GTID:2428330596989139

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In this thesis,we focus on a typical web structural information,with is commonly called ranking lists,and its auto-extraction.Compared with other web structures,ranking lists often contain more information,a richer variety,and a higher quality.The ranking lists can be used as an important source of data for some of the full-field knowledge base and for some Q & A systems.In this paper,we propose an efficient,end-to-end ranking list extraction algorithm.The algorithm is based on both visual and semantic information.Based on this algorithm,we can get more than 1.7 million ranking lists from 1.7 billion web pages with 92.0% accuracy and 72.3% recall rate.

Keywords/Search Tags:

Data Mining, Web Information Extraction, Knowledge Base

PDF Full Text Request

Related items

1	Research And Design Of Key Technology Of The Domain-specific Knowledge Base System For Vertical Searching
2	Building A Relation Knowledge Base For Open Information Extraction
3	Design And Implementation Of Resume Information Extraction Ystem Based On Domain Knowledge Base
4	Design And Realization Of Domain Specific Knowledge Base Extraction Syste
5	Study Of Knowledge Processes Of The University Library And Knowledge Mining Process
6	Research On The Method Of Knowledge Extraction And Knowledge Base Construction From Hudong Encyclopedia
7	The Design And Implementation Of Maintenance System About Information Extraction Knowledge Base In3D Animation-oriented Mobile Phone Text Messages
8	Automatic Construction Method Of Historical Knowledge Base Based On Timeline
9	Research On Knowledge Base Construction Method For Scientific And Technical Information Analysis
10	Data Mining Techniques Guided By Domain Knowledge And Its Application In Extracting Of Traditional Chinese Pharmacy