Font Size: a A A

Research On Method Of Deep Web Oriented Based On Web-Page Blocking

Posted on:2016-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2348330536455074Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the continuous development of web technology and mature,all kinds of web exponentially rising trend.Due to Deep Web information of rapid growth,people pay more and more attention to the research of Deep Web.Deep Web is not a traditional search engine can access to the page,can only access by submitting the form.For example: a lot of shopping website,professional literature retrieval system and all kinds of BBS,etc.According to the different types of web pages using web block technology,which can improve the accuracy of information extraction.Thus web block technology became a necessary research method.This paper introduces research status and development direction of information extraction technology.Afterwards,expounds the Deep Web some approaches to the study of the information extraction in recent years,and discusses the advantages and disadvantages of these methods.Describes the method of web-page blocking at some aspects,and introduces the related technologies of web-page blocking mainly includes: Deep Web entry found,Deep Web page identification and classification,the Deep Web data integration and fusion,the content of the web-page blocking classification.We adopt the method of data integration,built the Deep Web data integration system framework,it mainly includes the query result processing,query processing,integrated query interface module.Study of web-page blocking algorithm,combined with the inherent in web properties,and the habit of developers using the tag,is proposed based on web-page blocking of the splitter and the label attribute of the partitioning method.Finally,through the experiment of partitioning information extraction system based on web,realized with different categories of information to extract page.The other algorithm compared with the algorithm,proves that this algorithm has higher recall ratio and precision ratio,for the next step of research work has laid a solid foundation.
Keywords/Search Tags:web information extraction, Deep Web, Web block, data integration, cluster
PDF Full Text Request
Related items