Font Size: a A A

A Web-based News And Information Extraction System Design And Realization

Posted on:2009-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:P Y LeiFull Text:PDF
GTID:2208360272978782Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,it has been the most intensive,the richest news information originuntion now.Following the widespread application of Internet,the news information in the network is accumulating and inflating rapidly,the news information has been used,demanded and requested also enhance unceasingly,people find that it is difficult to use search engines to find news and information. at the same time many search engines have been provided specifically services to the news and information searching. However, it is difficult to help people find the specific news and information to meet their needs just on traditional browsers and search engines. therefore,this paper proposed Web News and Information Extraction System(WebNE), It may effectively solve Web news and information extraction problem.The article put forward a kind of information extraction method based on homepage structure. The method used regular expression to show text characte of extracting information,then baseing on this basical principle devised an arithmetic to form distill rules by semiautomatic format and man-machine interaction to produce the destination Website the news extracting rule. This method put extracted information in the database so as to all kinds of search and other expansion application The extracting results can be saved in the corresponding relational. It divided the process of information extracting into three parts:information source making, information extracting and information management phase.During information source making phase,the user selected a Web site as information source and defined extracting rule and save it in the database; During information extracting phase,the system can extract batch information automaticly according to extracting rule defined in information source making phase and saved; During information management phase,the user processes the exteacting results by Web Graphic User Interface(GUI). Based on this kind of extracting method's prototype system can directly apply the Web inquiry and the search, also use in other applications for the data preparation.The article used the method, designed and has realized WebNE, and using lager number of Web pages as experimentation, They can all be extracted and results is good.Compare this system with the traditional manual information extraction method, this system may reduce duplicated work, extraction time, save the manpower and resource cost, and will not appear the omission, the deviation.This system is the high-efficient information extraction and management system ,which has the good prospects for development.
Keywords/Search Tags:information extraction, regular expression
PDF Full Text Request
Related items