Font Size: a A A

Result Pattern Semantic Annotation Based On Cpn Network In Deep Web Integrated System

Posted on:2009-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2198360308979400Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet, Web has become the main source of much information all over the world. And it is also the most effective method to get useful information which people are interested in. Deep Web as one of webs has large data mount and nearly includes all information which people want to get. However, the Internet users only concern with a part of them, therefore, it is necessary to provide effective search engines or information integrated systems for people to find information quickly and precisely.Nowadays, what are the representatives in Deep Web are the e-business web stations. These have their own web databases whose data are filled into some template pages to form the result pages. We can use some technology to extract them from result pages, but there is still a big problem that computer can not realize the semantic meaning of the data. Therefore, we need to assign semantic labels to the data which we have extracted. To solve the problem, we bring forward a new method which has a high recall ratio and are equivalent to classic annotation methods in precision.In this thesis, we firstly define the result pattern and discuss the Deep Web semantic annotation based on the result pattern. Then, we give the rules how to assess a Deep Web semantic annotation method. We put forward seven attributes characteristics by observing a large number of result pages. And we propose attributes data classification model for the demand of computing the characteristics, and meanwhile we discuss the necessity of characteristics standardization. After that we use samples to train the CPN network and apply it to complete the semantic annotation tasks. Therefore, the thesis mainly includes the following technologies:getting information from the result pages, analyzing the characteristics of the result pages attributes, standardizing the characteristic vectors, assigning semantic labels to attributes by CPN network and improving the classic CPN network. The algorithm used in this thesis costs less time to build and train a CPN network than classic ones and apply it to annotate the attributes. In practical application, it may raise a studying endless loop problem, we use a new way to improve the algorithm and reduce the probability of the problem effectively.
Keywords/Search Tags:Deep Web, Result Pattern, Semantic Annotation, CPN Network, Characteristics Selection
PDF Full Text Request
Related items