Font Size: a A A

The Design And Implementation Of Web Information Extraction System

Posted on:2013-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y DingFull Text:PDF
GTID:2268330392969538Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays, the Web which is based on the Internet, is playing a more and moreimportant role in people’s daily life. There is lots of information conveyed by the Web,which makes it a significant information source in people’s life. Finding a convenientway of digging the desired information from the vast amount of the data on the Web isvery important. Web information extraction is one of the useful solutions. Thisprogram comes from search platform department at Alibaba.The thesis is mainly about the analysis of Web extaction problem, according to itsapplication fields. The thesis defined the extaction problems, from the view of theextraction tragets’ and Web pages’ features, and also put forward specific Webextraction solutions to them. Meanwhile, how to design and implement a Webinformation extraction system, using those solutions, is an importamt topic, as well. Byusing this system, users could easily get the desired data and information from Web.In the process of this program, author analysed the problem, which Webinformation extraction solutions foused on, and defined a data model to indicate theWeb structure information. Based on the system’s application fileds, author describedbussines application scenarios, which finally are concluded as the original systemrequirements. At last, according to the software developed life cycle, the system’srequirement analysis, design and implements, and testing are introduced. In this part,author used the use case model to express the requirements, and so do the system’sdesign and implements by functional model and system architure diagram. As the coreparts of this topic, the design and implements of the workflow engine and Http serviceframework are described using class diargram, sequence digram, activity digram andflowchart diagram. Last but not the least, the thesis introduced kinds of Web extractionalgorithms, such as template extraction, list-detail model auto extraction, and so did theevaluations of these algorithms. Finally, by system testing and algorithm evaluation,the system’s satisfying the predefined requirements was proved.
Keywords/Search Tags:Web data mining, Web data extraction, template extraction, list extraction
PDF Full Text Request
Related items