The Application And Research Of Regular Expression In Webpage Extration

Posted on:2015-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z B Zuo

Full Text:PDF

GTID:2268330428982818

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularization of Internet, people are more and more accustomed to get information from the Internet through a variety of terminals (PC, tablet, phone, etc.). The Web is a huge repository that contains all kinds of valuable information. Web-based information extraction technology is the study of how to accurately extract the required information from the Web pages to meet the demand of the user, and put them to structured information. For example, in the form of a database to facilitate the use of statistical analysis. In this paper, based on regular expressions related technologies, In Scholar Googleâ€™s paper collection and Okooo.comâ€™s lottery analysis case study, provide the Solution of automatic extraction of information on Website. On the basis functions of realization of regular expressions based NFA engine to extract Webpage, the paper also carried an analysis and Comparison based on NFA engine optimization and NFA engine in conjunction with DFA engine used.The solution is:First, use the tool RegexBuddy3to debug and optimize the regular expressions. Second, under.Net platform, through the use of tested regular expressions, write code to read Web source files, match extract fields and stored In Oracle database. In this paper, the method can automatically browse the target Web site, batch reading, record and field extraction of high accuracy, support the filter of HTML tags and a variety of data collection.

Keywords/Search Tags:

Regular Expressions, Web information collection, lottery, Scholar Google, Information Extraction

PDF Full Text Request

Related items

1	Research And Implementation Of Agricultural Products Information Collection And Release Platform
2	The Research And Application Of Web Information Extraction Technology
3	An Information Extraction System Used To Describe Scholar Portraits
4	The Design And Implementation Of Scholar Knowledge Graph
5	The Design And Implementation Of Navigation And Traffic Information Collection System Based On Google Map
6	Design And Implementation Of Literature Management System Based On Google Scholar
7	Design And Implementation Of Scholar Search And Scholar Community Discovery System
8	The Research And Implementation Of Web Information Extraction System Based On The Regular Expression
9	On Research Of Deep Search And Information Extraction For E-commerce Websites
10	The Research Of Web Information Extraction Technique And Application Based On NFA Regular Matching