Font Size: a A A

Research On Entity Information Extraction And Recognition On Deep Web

Posted on:2014-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:X W DangFull Text:PDF
GTID:2268330401462266Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the high speed development of the Internet and the increasing of Webinformation, it causes more researchers to pay attention on the Deep Web field. It isthe main path to get the information from the Deep Web for the users, but duringquerying demanded information, there is lot of noise which are not needed for theusers in the returned detail result pages, such as advertisement, related links, picturesand so on, therefore it is most important to researching on the Deep Web entityinformation extraction. Because there are lot data source in the Deep Web, even thesame field, it can provide the result page by different data source, the returned detailresults are came from different data source, while the description of the same entitycan be different, cause the form of same entity can be different, therefore create a lotredundancies data which are useless for the users. Entity recognition is remove therepetitive data, recognize the same entity while it is the premise of data integration,therefore the Deep Web entity recognition has become one of hot topics in the studyof the Deep Web.The article is based on the analysis of the present status of research on Deep Web,regarding to the existing problems of entity extraction and entity recognition, proposethe method of Deep Web entity information extraction based on template and themethod of entity recognition based on BP neural network. Before recognizing theentity, firstly it must extract the useful information from the returned detail result page,remove interference information. Entity information extraction page to get a samplefirst, transform the format of the sample file; then build the template. Firstly tostructural analyze in order to reserve the blocks which the structure is similar with theentity information, then potion the entity information by semantic analyze; at lastextract the entity information from the target page regarding to the buildinghomologous template. Recognize the extraction entity information by BP neuralnetwork, judge it is the repetitive or not. Divide the entity which are extracted by labelwith property value at first; then calculate the similarity value between the extractedtemplate and the other entity; input the different property similarity into the buildingBP neural network to train, recognize the repetitive entities regarding to the trainingresult.Finally, validate the effectiveness of the method of entity information extraction base on Deep Web template which are proposed in the article by experiment,meanwhile increase the satisfaction from the user, at same time the method isapplicability and has higher accuracy and efficiency.
Keywords/Search Tags:Deep Web, entity information extraction, template, entity reorganization, BP neural network
PDF Full Text Request
Related items