Font Size: a A A

Study On Web Information Extraction Based On Automobile Industry

Posted on:2008-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:W F JiFull Text:PDF
GTID:2178360242975487Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The present web is the main way to get information.HTML being used to express Web information there is the born-with shortcoming, the "mark" is only telling browser how to demonstrate defined information. As the forerunner of deep data excavates, Web IE can rapidly,accurately extract the interesting information from great resources. The majority Web IE Systems based on Induction Learning Approach, exist the problem of extendibility.The thesis against the background of automobile profession, conduct the research on half structure and free texts'information extraction. To former we take the path of extracted information in DOM level as the coordinate, by this basic principle designed half-auto extraction rule based on Explanation Learning Approach. Regarding the free text, we put forward a multi-slot regulation which have good expansibility and solved the knowledge acquisition and representation bottleneck. Experimental results show that the rules acquired by this algorithm achieve higher precision and recall.
Keywords/Search Tags:Information Extraction, Explains Learning, DOM level, Multi-slot Regulation
PDF Full Text Request
Related items