Web Subject Information Acquisition System Design And Realization

Posted on:2010-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:Z A Zhang

Full Text:PDF

GTID:2208360275983405

Subject:Software engineering

Abstract/Summary:

At present, the network develops quickly. It has increasingly become the center of information and the center of media. Every user everywhere of the Internet can obtain all kinds of information from Internet, such as natural, social, political, history, science and technology, education, health, entertainment, political decision-making, finance, business and weather forecast. But, how to gain personal information from the Internet? There is no denying that there are a lot of available tools and methods of information search, but they can not be correctly, automatically get the information we want, which makes a lot of inconvenience. The system proposed can solve this problem. The user can customize the resources and information you want in the Web, regularly update news of the network and integrate the information from the network. All of these functions make users access the appropriate resources simple and fast.This thesis analyses the characteristics of the web pages, in accordance with the characteristics, we offer the method to obtain the contents we need. Regular expression uses certain writing rules to get access to the text string and the content we need. We take the way handle the Web content filtering and gain the content to be required, in order to handle further processing. On its website information collection of, the system can find fixed-site page's information, such as the title and the content, so that users do not have to query-by-page to get all the information effectively. The system has three parts: customization of the web pages, information fetching and the management of the contents. In the first part, we customized the Internet address and the regular expression matching rules we need. Kept them in our information database and prepared to pick up effective information. In the second part, we updated regularly to the latest data matching, that is valid information stored in the database through our rules be applied in our algorithm. In the third part, designed the information management system, that managed the data we stored. We can manage them use a series of increase, delete, change, and search operations. And we set up a special page that allows users to access Web site look up exclusive custom integrated.In this thesis, we used a particular Web site to obtain news and information, demonstrating the superiority and convenience of this method. The method demonstrates good prospects for the development and wide application.

Keywords/Search Tags:

regular matching, information collection, Internet page analysis

Related items

1	The Application And Research Of Regular Expression In Webpage Extration
2	Research On Web Page Classification And Information Collection
3	The Design And Implementation Of Regular Expression Engines Based On Deterministic Finite Automata
4	Design And Implementation Of An Information Auto Collection System In IPTV
5	Regular Expression Matching With Multi-step Speculation
6	Research And Implementation On Key Technology Of Web Text Collection And Analysis
7	Research On Multi-dimensional Regular Expression Matching Algorithm For Network Security
8	Gigabit Ids Quick Source Of Information Collection And Analysis Engine,
9	The Research Of Web Information Extraction Technique And Application Based On NFA Regular Matching
10	Research And Implementation Of Agricultural Products Information Collection And Release Platform