The Research And Implementation Of Commodity Information Extraction And Fusion Based On Web

Posted on:2009-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2178360245455119

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the fast development of the Internet,World Wide Web has become a huge distributed information space,which provides users a massive and valuable information resource.But,when search engines are used for information retrieval on Internet,the returned results are so extremely huge that users often find it difficult to seek the quite consistent useful information from the complex magnanimous information.The technology of web information extraction and data fusion is one of the important ways to solve this question.The web information extraction can carry on structurized processes for the information in each kind of different text to express unified and structurized form by locating and distinguishing the needed information point.The data fusion mainly carry on automatic detecting,connecting and combinating processes for datas from many information source to expand time and spatial observation scope,to strengthen datas' confidence level.In this thesis,the commodity information extraction and fusion technology are researched.A commodity information extraction method and a corresponding data fusion method are proposed.This method adopts the commodity information online extraction and the weight correlates method of data fusion,unifying the web commodity information characteristic.And the thesis gives corresponding realization by quoting Google Web API,HtmlParser,the regular expression and weight coefficient.The content is as follows:1)The thesis presents the web acquisition technology which integrating Google Web API into java application to search and acquisition web and introducing regular expression to find out the interrelated links in the Web.Then,these collection are stored to the local disk,waiting for analysis in next step2)Build a commodity parameter database as far as possible completely after having mastered the knowledge of the commodity parameters.And realize the commodity information extraction fast and accurately through the source technology of HtmlParser and the matched regular expression based on the parameter database. In the process of extraction,the table blocks and the div blocks are only parsered by characteristics of the commodity pages resource code,which enhance the speed of distinguishing and analysing.3)Obtain a corresponding weight coefficient table by analysing the extracted specific data set.Then carry on data fusion with the table based on the weight coefficient method.Finally the fusion datas are saved to the history database and the system presents a quite complete information view for the users.4)The thesis designs the system of commodity information extraction and data fusion based on the Web in the mass,and realizes the system.Through testing and anlysis to some kinds of mobile phone' information,the system can extract hundreds of interrelated commodity information online,then carry on data infusion,which lays the foundation for developing more special and far-ranging system in the further.

Keywords/Search Tags:

Information Extraction, Data Fusion, HtmlParser, Regular Expression, Weight Coefficient

PDF Full Text Request

Related items

1	The Research And Implementation Of Web Information Extraction System Based On The Regular Expression
2	Research Of Web Information Extraction Technique Based On REIE
3	Research On WEB Entity Information Extraction Algorithm And Its Application
4	The Research Of Web Information Extraction Technique And Application Based On NFA Regular Matching
5	The Application And Research Of Regular Expression In Webpage Extration
6	Research On Multi-dimensional Regular Expression Matching Algorithm For Network Security
7	The Design And Implementation Of Regular Expression Engines Based On Deterministic Finite Automata
8	Research And System Realization Of Key Technology Of Information Extraction Optimization
9	Research And Implementation Of A Generic Web Information Extraction System
10	A Web-based News And Information Extraction System Design And Realization