Font Size: a A A

Semantic Based Scientific Literature Metadata Retrieval System

Posted on:2008-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:F ChuFull Text:PDF
GTID:2178360272969083Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
For resources retrieval, traditional statistic strategy uses keyword based algorithms efficiently, but with the lack of semantic information, both search query and result have much misunderstanding. Meanwhile, data from heterogeneous sources may exist various quality problems.There are many duplicate records in the retrieve results. There is a strong need to carry out a cleansing process to improve the data quality.To overcome the disadvantage mentioned above, we use semantic thinking, and describe a metadata retrieval model for scientific literatures. In semantic retrieving, we provide a semantic search portal and use semantic reasoning rules to improve search result. At the same time, we put forward the semantic search for metadata including concept, instance and relationship. The relationship can be further divided into three types in detail, i.e., the relationship between concepts, between instances, and between concept and instance.We summarized and described the theories, methods, evaluating standards and basic workflow of data cleansing. Especially our researching emphasis is on the techniques and algorithms of duplicate records cleansing, and we put forward the relevant advanced algorithms. In duplicate records cleansing, we introduce its basic knowledge and workflow, depict the main techniques and algorithms in detail in each step respectively. At the same time, we give our advanced algorithms to improve the limitation of original ones in each step. They mainly include the following: the advanced method using sorted key to sort the dataset. In duplicate records detection, we put forward the field match algorithm and abbreviation-discovered algorithm based on edit distance. In record match, we come up with the optimized method using valid weight value and length filtering to reduce the runtime of original algorithm and improve its efficiency. In clustering the duplicate records on database level, we amend two limitations of traditional sorted neighborhood method and give the advanced sorted neighborhood method.At last, based the metadata management model framework and previous research work on duplicate records cleansing, we apply the strategies of semantic retrieval to SemreX System.
Keywords/Search Tags:Scientific Literature, Metadata Retrieval, Semantic Association, Semantic Reasoning, Duplicate Records Cleansing
PDF Full Text Request
Related items