Font Size: a A A

Research Of Web Information Extraction Technique Based On REIE

Posted on:2012-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2248330395955582Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the deep and rapid development of Web information extraction technologyresearch in recent years, information extraction technology based on the regularexpression has been a hot spot of data mining at present stage. This paper improvessome existent algorithm and proposes information extraction technology basedREIE(Regular Expression Information Extraction) algorithm by deeply researching thistechnology and comparing and analysing the classic methods of Web informationextraction.First, this paper introduces the relevant knowledge and structure of Webinformation extraction technology. It proposes an information extraction technologybased on REIE and the evaluation criterion of Information Extraction System is givenby analyzing and comparing some classic method of information extraction. Next, thispaper analyses HTMLParser information parsing method and extraction principles indetail and shows HTMLParser data structure by analyzing Web text, introducing themethod of Web text data mining and its relevance. Finally according to regularexpression extraction principles, I propose the core algorithm of this system, REIE.Finally, based on regular expression, this paper achieves a system of web contentinformation extraction which mainly extracts headlines of news on the web pages,hyperlink, and text and so on. And this system can do real-time extraction of web pagesand make the results visible to users. At the same time, the system can check its validityfrom an experimental viewpoint. The experiment shows that this paper puts forward amethod which can extract comprehensively and accurately and improve real time andaccuracy of Web information extraction.
Keywords/Search Tags:HTMLParser, Regular Expression, Information Extraction, REIE Algorithm
PDF Full Text Request
Related items