Research On The Full Text Retrieval In Scientific Literature Sharing Platform

Posted on:2008-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Tan

Full Text:PDF

GTID:2178360272468720

Subject:Computer system architecture

Abstract/Summary:

With the explosion of the scale of the web and the enrichment of the resource we can access, it also bring us the problem that we have to spend a lot of time and energy to find the information we indeed need. Traditional literature retrieval system considered the documents as BOW(bag of words), and calculated the cosine distance between the document vector and query vector as criteria to rank the retrieval list. However, this method did not harness the context information of the article which is helpful for similarity evaluation. In SemreX, we adopt the new ranking algorithm in which the context information of paper is considered it brings the context information such as classification of article and reference validation into the process of similarity calculation. we evaluate the outputs of ours and traditional method with TREC_EVAL program against the traditional method. The experiment results obviously indicate that new method can obviously enhance the retrieval precision relative to the traditional way.Another import function of SemreX is to find similar literature of customer's favorite one. This function is quiet common in other literature retrieval system, such as Citeseer, CNKI and so on. To find similar literature also reveal the relationship of semantic. Because finding similar literature is very time consuming, so we use the term compress, and candidate literature set to enhance retrieval effectiveness. At the mean time, we use the IT theory to evaluate the similarity of the literature. Because we use the candidate literature set, this make our system's computing time will nonlinear increase with the aggregate of literature repository. Currently, SemreX can find twenty thousand literature's similar document one day.Generally speaking, traditional retrieval system will return very large mount of result, and the result relationship is not visible to the user. This make the user takes a lot of time to browse the retrieval list. SemreX use the online classification algorithm to analysis the result list for the user at first, it classify the result and label each cluster with a eligible string. And then represent this result with GUI interface. So user can browser the result conveniently, and this will enhance the retrieval effectiveness...

Keywords/Search Tags:

information retrieval, reference validation, similar literature measure, cluster online algorithm

Related items

1	On The Teaching Of Literature Retrieval Reference For Training Of Information Literacy Oriented College Students
2	Research Of Analysis And Quantization Algorithm For Reference Relationship Of Scientific Literature
3	Establishment And Update Of Similar Users' Cluster In Personalized Information Retrieval
4	Design And Implement Of Web-Based China Chemical Literature Retrieval System
5	Research On Cluster Algorithm For Web Object
6	Theoretical And Applied Research On Fuzzy C-means Clusteirng And Its Cluster Validation
7	Research And Improvement Of Pagerank Algorithm In Literature Retrival Ranking
8	Research On The Sorting Algorithm Of Scientific Literature Retrieval Based On TF-IDF
9	A Study Of Online Local Literature Resources
10	Cluster-based Query Expansion Using Language Modeling for Biomedical Literature Retrieval