Font Size: a A A

Research On Relationship Extraction Based On Semantic Pattern Matching In Web Environment

Posted on:2010-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhouFull Text:PDF
GTID:2218330371950006Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As the increase of the number of valuable information appears in the form of Web pages, World Wide Web has become the world's largest knowledge base. It's urgent need of an automated method in order to make use of Web information resources efficiently. Against this background research, Web Information Extraction is focused on. The main purpose of Web Information Extraction is to translate unstructured text into structured or semi-structured information, and stored in a database for further analysis of user querie and utilization. Web Information Extraction has three basic tasks, named entity recognition, relationship extraction and events discovery. Relationship extraction is of great significance since it is not only an important information extraction task, but also the basis of events discovery and a wide range of applications.Pattern-matching as one of the primary method of information extraction, has been concerned in recent years. However, today's pattern-matching technology has still many limitations, most of them still need manual intervention, time-consuming and difficult to maintain. The problem is increasingly severe with the rapid increasement of Web data sources. So we hope to find a high automatcal pattern matching method which can be applied to different data models and integrated applications.This paper analyzed the existing entity relationship extraction technologies, and proposed a semantic pattern matching based entity relationship extraction model in Web environment (SPMREM). SPMREM used a string semantic similarity calculation algorithm, combined with machine learning technology. SPMREM took a limited group of entities, among which the relationship is exactly known, as a training set, extracted relationship patterns from the web page contains entities in training set, and then, used these patterns to extract entitiy relationships from the web pages, translate the information to structured data.The experiment results show that SPMREM proposed in this paper can effectively extact relationship pattern by the training set, and then extract the relationship between unknown entities from Web pages through pattern matching method. SPMREM achieved higher accuracy and recall than existing methods, made the Web page information is fully utilized, and can be applied to a wide range application of Web Information Integration.
Keywords/Search Tags:pattern matching, entity relationship extraction, semantic similarity
PDF Full Text Request
Related items