Semantic Based Scientific Literature Metadata Retrieval System

Posted on:2008-12-26

Degree:Master

Type:Thesis

Country:China

Candidate:F Chu

Full Text:PDF

GTID:2178360272969083

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

For resources retrieval, traditional statistic strategy uses keyword based algorithms efficiently, but with the lack of semantic information, both search query and result have much misunderstanding. Meanwhile, data from heterogeneous sources may exist various quality problems.There are many duplicate records in the retrieve results. There is a strong need to carry out a cleansing process to improve the data quality.To overcome the disadvantage mentioned above, we use semantic thinking, and describe a metadata retrieval model for scientific literatures. In semantic retrieving, we provide a semantic search portal and use semantic reasoning rules to improve search result. At the same time, we put forward the semantic search for metadata including concept, instance and relationship. The relationship can be further divided into three types in detail, i.e., the relationship between concepts, between instances, and between concept and instance.We summarized and described the theories, methods, evaluating standards and basic workflow of data cleansing. Especially our researching emphasis is on the techniques and algorithms of duplicate records cleansing, and we put forward the relevant advanced algorithms. In duplicate records cleansing, we introduce its basic knowledge and workflow, depict the main techniques and algorithms in detail in each step respectively. At the same time, we give our advanced algorithms to improve the limitation of original ones in each step. They mainly include the following: the advanced method using sorted key to sort the dataset. In duplicate records detection, we put forward the field match algorithm and abbreviation-discovered algorithm based on edit distance. In record match, we come up with the optimized method using valid weight value and length filtering to reduce the runtime of original algorithm and improve its efficiency. In clustering the duplicate records on database level, we amend two limitations of traditional sorted neighborhood method and give the advanced sorted neighborhood method.At last, based the metadata management model framework and previous research work on duplicate records cleansing, we apply the strategies of semantic retrieval to SemreX System.

Keywords/Search Tags:

Scientific Literature, Metadata Retrieval, Semantic Association, Semantic Reasoning, Duplicate Records Cleansing

PDF Full Text Request

Related items

1	Research On Resource Semantic Space And Retrieval Of Scientific Literature
2	Duplicates Cleansing Based On Semantic Association
3	A Semantic Association Based Literature Information Browsing System
4	Research On Semantic Search And Related Technology
5	Reaserch On The Representation Of Uncertain Semantic Relations AndInexact Reasoning In Ontology Knowledge
6	The Research And Implementation Of Semantic Annotation And Reasoning For Device Metadata Towards Semantic Web Of Things
7	Research And Implementation Of Metadata Construction And Visualized System For Movable Artifact Based On Semantic Association
8	Data Cleaning Algorithm And Applications
9	Research Of Data Cleansing Algorithms For Duplicate Records Detection Problem
10	Design And Implementation Of Scientific Literature Retrieval System Based On Graph Structure