Font Size: a A A

Research On Vision Feature Based Information Extraction Of BBS Posting

Posted on:2010-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HeFull Text:PDF
GTID:2178360272991629Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This paper studies how to automatically extract posting information from topic pages of BBS. Traditional solutions to such problem are based primarily on analyzing the HTML DOM trees and tag structures of pages, and thus heavily dependent upon the HTML standard. The accuracy of extraction is greatly influenced by whether the page is well formatted, and the approach may have to be changed whenever the version of script language evolves.Here, a language independent technique is proposed. Our solution performs the extraction just based on the visual information of topic pages. We conlude the visual features of BBSpostings which guides entire extraction process. It is carried out in three steps: first construct the visual block tree of the topic page, then locate the posting region in the tree, and finally extract every posting information from the posting region. Experimental results indicate that the vision feature based approach can achieve high extraction accuracy.The study has mixed the BBS data mining technology and vision feature parsing technology for web pages, and will be significant to the resource integeration of BBS and social administration of BBS.
Keywords/Search Tags:BBS posting, vision feature, information extraction, cluster, visual block
PDF Full Text Request
Related items