The Technology And Application On Web Information Extraction Based On Analyzing Webpage Content

Posted on:2011-11-10

Degree:Master

Type:Thesis

Country:China

Candidate:X L Yang

Full Text:PDF

GTID:2178360305452302

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of computer networks and the advent of the information society, search engines are playing an increasingly important role. The general search engine is well-known as easy to use, fast and containing vast information, but when it is used in retrieving specific information or to be faced in the application of specific area, the information obtained is often not very accurately and exclusively. Therefore, to develop a specific search engine has great practical value as well as practical significance. In the development of a specialized search engine, design and implementation of Web Spider for Web pages acquirement have considerable influence on the overall performance of the search engine.Since 1999, recruitment of students in our country has been expanding these years, which gives the student education and employment many pressures. Therefore, it is so important to apply scientific method to administrate education and employment information, more important, it can also give some assistant instructions for university recruitment and cultivation. Based on the above demands, this thesis designs and implements a web spider to a specialized search engine for some graduate employment information and education information. During the process of web capturing, it can finish the web content analysis based on a certain formula, pick up and analyze the relating information , filter the information which is not about the subject and store the information in the web pages'database,and then complete the data preparation of the search engine and build foundation for the following job. In detail, this thesis has discussed the process of the web spider implementation, and has provided a system structure. Furthermore, on the basis of using the algorithm, this paper has analyzed the related functions of this spider by comparing with other web spiders. This algorithm can extract information from related webpage more exactly. In the end, some drawbacks and the further works have also been present in the end.

Keywords/Search Tags:

Search Engine, Web Spider, Information extraction, Information Analysis, Multithreading

PDF Full Text Request

Related items

1	The Vertical Search Engine Research And Design
2	The Design And Realization Of The Vertical Search Engine On The Basis Of Java
3	Design And Realize Of Spider In Vertical Search Engine
4	Research And Implementation Of Data Acquisition Technology Of Vertical Search Engine
5	Research And Practice On Automatic Information Extraction In Vertical Search
6	Intelligent Search Engine Based On Thematic Information Technology Research,
7	Research And Implementation Of Thevertical Search Engine For Electronic Information
8	Research And Achievement Of The Search Strategic For The Topic Search Engine Spider
9	Design And Implementation Of A Spider For Topic-Specific Search Engine
10	The Study And Implementation Of Web Information Extraction Mechanism Based On Classification Semantics