Extraction Technology And Internet Product Information Based On The Structural Semantics Of Entropy

Posted on:2010-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Wu

Full Text:PDF

GTID:2208360275491686

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

These years have witnessed the exposure of online commodity and trade volume, but on the contrary,the trust and security consumers hold towards the Internet is decreasing.To cope with the inconsistency,IOEB of Fudan University Software School has performed the discussion about the technology of Internet merchandise monitoring and conducted in-depth research into the core problem that is how to extract commodity information from the Internet.Currently,a number of methods have proposed for the web information extraction,most of which need people to label the extracted results.Therefore,the accuracy rate will decline if the manual interventions are reduced.On the other hand, many existing methods cannot adapt to the changes of web sites.Once the web pages are altered,the wrapper of web page information extraction must be reconstructed.Based on the issues mentioned above,this paper proposes structured semantic entropy based web page recognition and extraction algorithms,utilizing web pages' structures and recognizing the main parts of web pages by computing the aggregation metric of commodity's information.We firstly investigate the publication situation and characteristics of commodity information on Internet,based on which we construct semantic dictionary for commodity information extraction.The dictionary helps to locate the commodity information in which the users are interested.Coupled with the features and traits of web page structures and commodity,structured semantic entropy based commodity extraction algorithm is capable of recognizing whether the page is a commodity sales page or not and extracts information from web pages automatically.Combining the algorithm with meta-search technology and web crawler,a framework is presented to realize the automatic discovery of the new e-business websites and extract commodity information from that.Finally,an online drug monitoring system has been developed on the basis of the framework presented in this paper.Through using the proposed algorithm,the system greatly expands the information extraction coverage and raises the automation level.It also provides technical feasibility to realize full-line monitoring of goods release information on the Internet,which helps to protect online transaction security.

Keywords/Search Tags:

Web Information Extraction, Structured-semantic Entropy, Aggregation Analysis, Meta-search Technology, Semantic Dictionary

PDF Full Text Request

Related items

1	Research Of Forest Products Trading Web Messages Extraction Based On Semantic
2	Based The Multidimensional Semantics Internet Drug Information Extraction Research Applications
3	Study On Ontology-based Micro-content Aggregation And Inquiry Technology
4	The Research On Semantic-Based Web Information Automatic Aggregation System And Key Techonology
5	Chinese Information Extraction And The Method Of Summarization Generating Based On HowNet Semantic
6	The Research On The Principle And Implementation Of Semantic Search Based On Latent Semantic Analysis
7	Research On Construction Of The Field Semantic Dictionary Based On WordNet And FrameNet
8	Research On Online Drug Information Extraction Algorithm
9	Research Of A Postprocessing Method Of Meta Search Results Based On Information Distribution
10	The Study Of Latent Semantic-Based Personalized Search Key Technology