Font Size: a A A

Research And Implementation Of Vertical Search Engine Based On Characters Of Webpage Structure

Posted on:2009-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:J RenFull Text:PDF
GTID:2178360275970374Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of internet, people rely more and more on the internet to find information they need. Information resources on the Internet have characteristics of diversity, distribution, openness, timeliness and heterogeneity. On the internet, information with the same subject is always scattered among different websites using different formats. The Vertical Search Engine can extract information by subject and store it in a structured form.In this paper, a Vertical Search Engine Model based on webpage structure is presented. According to characteristics of professional and industrial websites, the standard to unify meta data expression according to related themes is presented. On the basis of meta data expression standard and webpage structure, information extraction template of related website is established through webpage analysis. Based on this information extraction template, Vertical Search Engine carries through webpage crawling, page conversion, data extraction, data separation and data storage. Information extraction template of website is described in XML in accordance with webpage structure. Because the template is in standard XML and is saved as a file, it is convenient to share it among users who are interested in information provided by the website.In this paper, we present the Vertical Search Engine Model based on webpage structure, and develop one Vertical Search Engine example. Based on the meta data model, the whole system processes the webpage structure to obtain structuralized information of webpage. Based on the search engine, we develop vertical search sites for freight matching etc. to verify the search engine in practice.The main work and achievements of this paper are as follows:1. The working principle and basic system structure of Vertical Search Engine System based on webpage structure is explored.Through the research on Vertical Search Engine technology and related technology, a working model of the Vertical Search Engine based on webpage structure is presented, and system process is mainly divided into: web crawling, page conversion, data extraction, data separation. According to the working model, a multi-layer system architecture is introduced.2. A meta data model on webpage information of industrial websites is presented.In this paper, through analysis of industrial information, a common and standardized meta data model is presented and realized with the use of technology such as XML etc. And in accordance with webpage characteristics of specific websites, a method is introduced, which can transform website information to an information extraction template conforming to meta data model by using XSLT technology.3. A Vertical Searching Engine System based on webpage structure is realized.In this paper, according to the system model a Vertical Searching Engine System based on webpage structure is realized by using Microsoft .Net technology. The system adopts interface-oriented programming and is highly configurable and flexible with the use of system configuration files. At the same time, with the use of multi-threading technology, the computer and network resources are maximally utilized, which allows the system to have a very high work efficiency.4. A prototype system of freight matching is established.Based on the Vertical Search Engine System developed in this paper and the meta data model, a meta data model and an information extraction template of freight matching are established combined with freights matching websites. With this system, related information from websites can be obtained through vertical searching on freights matching websites. A prototype website, which can provide vertical searching service on freights matching, is established to demonstrate the system feasibility and availability.
Keywords/Search Tags:Vertical Search Engine, Information Extraction, Meta Data, XML, HTML
PDF Full Text Request
Related items