Font Size: a A A

Enriching The Representative Of Document Using Irf

Posted on:2011-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:S M ChengFull Text:PDF
GTID:2198330332479276Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Web text mining is a process of discovering the content contained in the document. At the same time the Web information sources explosively grows, the data in the database e-books expand at a faster rate. The main task of e-book management makes the user quickly and accurately find a satisfactory document. Although the information in each document indicate the subject of the document in a different weight, the users certainly want to get the most relevant documents. So on the basis of not losing the document information,choosing the right words will undoubtedly enhance the performance of document retrieval and classification. Before the prior method focused on optimizing the word appearing in the document,ignoring the relevance between the documents. When the word input by user exactly match the keyword appearing in the document, retrieval system will return document containing the keywords.There are a small number of words in the document and are academic. For a beginner, because he is hard to find the exact query words, he can not quickly search a satisfactory answer.To break through this bottleneck, this paper introduces IRF (Iterative Reinforcement Framework) Model done in the environment of delicious site.The site fully use the core concept of Web2.0.The uses of the site annotate the interested site and document with lables and interesting Website or articles users use the semantic enrichment of the term as a label. These tags like keywords can index documents.But the different is that these tags conducted by this persons which is not noly the author but also user freely joining the conductiong of Web. These labels will undoubtedly enrich the semantics of the document information Uses may use same tag to annotate the document which is similar in content.Tags become the sematic bridge of related document changeing the status that document is independent of each other. To begin with, IRF model uses TFIDF algorithm to calculate the initial representative of the document, and then it iteratively generates relevant terms which may not in the document. This terms greatly enrich the representative of the document and increased the scope of the document retrieval.To get better results, this paper introduces the concept of technology to Web2.0 library management, and explains my point of view based on the assumption. But at present the library search system only recommend several similar documents to users which is a relatively static retrieval system. Users can not well interact with each other. A reader's experience about the document can not be stored effectively and shared in time. This caused a large of waste about resources. This article introduce the core idea of Web2.0 into library management and bulit a interactive platform between the readers. This allows users not only marked interest sites with a label and record their own reading experiences So that other users can read other people's experiences to determine the effect of the article, effectively saving time.
Keywords/Search Tags:data mining, the representative of document, library management, Web2.0
PDF Full Text Request
Related items