| With the development of Internet, the information resources on the Web was explosive growth, the general search engine's bottleneck is becoming more and more obvious, in order to more quickly and accurately retrieve the information which people want, vertical search engine came into being in recent years, it is a field-oriented search engine. Vertical search engine provides more sophisticated search results than general search engines, which needs to extraction the field-related information from related Web page. The focus of this paper is how to extract the object information from the Web page, the specific research contents include the following aspects:(1) The Web page analytic technology based on visual feature.Mainly study on the VIPS algorithm which is a Web page segmentation method based on visual features, and implement a VIPS prototype system. Using this system to segment the Web page so that provide the data preparation for the next extraction work.(2) Web object information extraction based on Block Important Model and 2D CRFsA Web object extraction method based on the Block Important Model and 2D CRFs is proposed in this part. First, using the Block Important Model to calculate the important value of each block, and then locating the target block based on the important value; Second, using the 2D CRFs to extract the information in the target block; Finally, through the experiments verify the feasibility of this method.(3) Web object information extraction based on the improved HCRFs.Hierarchical conditional random field model is a statistical model that can be used for the Web object extraction, but HCRFs did not complete description of conditional dependency relationships between elements of Web object. To conquer this limitation which affects the effect of extraction, an improved statistical model called LL-HCRFs is proposed. The method of adding these dependency relationships is put forward in this work, and also the method of parametric estimation is improved. Finally, the proposed model is compared with existing linear-CRFs and HCRFs , and the results show that LL-HCRFs has a good effect on Web object extraction.(4)"Soushiji"vertical search engineFinally, a food category vertical search engine prototype system which called "Soushiji" is designed and realized, and then each module of the prototype system is introduced in detail. |