In the modern world, the technique of WWW advanced quicker and quicker. TheInformation contained in the web with an Explosive growth, although we use the webspider to catch the information, and use the Search engine to help people to get theinformation they wanted. But we also think there is also much information on the net wecan't get just like the part of the iceberg under the sea. And the web spider of the classicsearch engine can't download the information. So we call this part of the web Deep Web.Now, the study of the Deep Web focus on these aspects: The discovery of the webdatabase, the extraction of the interface pattern of the query, the sort of the database, theintegration of the structure of query, the translation of the query, the extraction of the queryanswer, the annotation of the query answer and so on. In a word there are three mainmodules:1. the integration of the query interface module;2. the query processing module;3. the query answer processing module.My paper focuses on the third part that the query answer processing module. And inthis module also contains three small modules: the extraction of the query answer, theannotation of the query answer and the combination of the query answer.First, we introduce the technique of the information extraction. There are theinformation extraction based on the Dom-tree, the information extraction based on naturallanguage processing, the information extraction based on ontology, the informationextraction based on induction. Although these techniques give me a lot of help, after ouranalyzed the answer page of the deep web query, we find these techniques are very fit forthe situation. There are some flaws of them.The answer page of the Deep Web has two same points:1. In the answer page, our interest of the data block is the biggest one blocks of thepage and also is the only one.2. When you find the biggest one in your page, we always find there are some smallblocks in the block, and these small blocks are just what we want to extract fromthis page. But we find in the biggest block there are not only contained these small block we want to extract, there are also some other data blocks. Afterinvestigation we find these small block always have the same structure, and evenin this structure the CSS attitude or ID is the same. Because as the advancement ofthe Web technique, the Ajax frame based on the language JavaScript we can see itanywhere. One part of JavaScript code should band to the ID of one Dom node,for decrease the workload we construct the same structure.According to the two same point of the Deep Web, we define the definitions of thegeneralized data block and data region. Only the two definitions are not enough, I choosethe VIPS to help us to segment the pages and find the biggest part of the page. The VIPSalgorithm makes full use of page layout feature: it first extracts all the suitable blocks fromthe html DOM tree, then it tries to find the separators between these extracted blocks. Here,separators denote the horizontal or vertical lines in a web page that visually cross with noblocks. Finally, based on these separators, the semantic structure for the web page isconstructed. In this process , we use our generalized data block orientation algorithm to sitethe generalized data block. Our algorithm has three steps: first, we use the VIPS algorithmto construct the blocks of this level; second, we use the algorithm to judge the blockwhether or not is a generalized data block; third, if we find the generalized data block, thealgorithm finished, otherwise, we will return the first step, continue to the next level. In thisalgorithm, the algorithm to judge the block whether or not is a generalized data block is thecore of the whole algorithm:First step, we will judge the html tag of the root node of the current block, if its tag is |