Font Size: a A A

The Technology And Application On Web Information Extraction Based On Analyzing Webpage Content

Posted on:2011-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:X L YangFull Text:PDF
GTID:2178360305452302Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer networks and the advent of the information society, search engines are playing an increasingly important role. The general search engine is well-known as easy to use, fast and containing vast information, but when it is used in retrieving specific information or to be faced in the application of specific area, the information obtained is often not very accurately and exclusively. Therefore, to develop a specific search engine has great practical value as well as practical significance. In the development of a specialized search engine, design and implementation of Web Spider for Web pages acquirement have considerable influence on the overall performance of the search engine.Since 1999, recruitment of students in our country has been expanding these years, which gives the student education and employment many pressures. Therefore, it is so important to apply scientific method to administrate education and employment information, more important, it can also give some assistant instructions for university recruitment and cultivation. Based on the above demands, this thesis designs and implements a web spider to a specialized search engine for some graduate employment information and education information. During the process of web capturing, it can finish the web content analysis based on a certain formula, pick up and analyze the relating information , filter the information which is not about the subject and store the information in the web pages'database,and then complete the data preparation of the search engine and build foundation for the following job. In detail, this thesis has discussed the process of the web spider implementation, and has provided a system structure. Furthermore, on the basis of using the algorithm, this paper has analyzed the related functions of this spider by comparing with other web spiders. This algorithm can extract information from related webpage more exactly. In the end, some drawbacks and the further works have also been present in the end.
Keywords/Search Tags:Search Engine, Web Spider, Information extraction, Information Analysis, Multithreading
PDF Full Text Request
Related items