Font Size: a A A

Research On Temporal Profiles Analysis Of Document And Time-based Document Retrieval Based On Scientific Research Theme

Posted on:2015-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ShenFull Text:PDF
GTID:1368330461456706Subject:Information Science
Abstract/Summary:PDF Full Text Request
Time is an important dimension of information space.Most documents contain temporal information for text timeine construction.temporal fact and event annotation.Recently,combining and integrating temporal expression in information retrieval process is a new research problem in information retrieval domain.The temporal information research areas mainly focus on the time profile implied in web document,and especially focus on the time-based documents ranking and clustering process.There has been several applied research output in using temporal profiles for web document analysis,such as topic detection&trace and miroblog hotspot finding.As a consequence,integrating temporal profile of documents into Web search will not only improve the effectiveness of search results,but also advance the research on information retrieval.The current level of research resoures index and retrieval from temporal profile of document is just utilizing the time profiles in document metadata,such as document creation time or document publishe time.Therefore,the lack of knowing and using time-aware information contained in content of document will restrict the further semantic search of document for temporal knowledge in the text.The analyzing and utilizing document temporal profile can not only help in fast finding relative retrieval result during limited time interval when encountering massive dataset,but also help in constructing timelines which can combine all relevant information from different resource platforms and can represent the temporal evolution of document distribution based on temporal word token and time expression in document text.The mainly research problems in this dissertation are about the temporal profile uses of documents which have different temporal expression distribution characteristic,and also are about how to use these special temporal profiles to improve document retrieval and knowledge observation processes.The introduction part elaborates on the background,significance,innovative points,whole process,and framework of the paper.The literature review part summarizes the relevant researches into temporal expression extraction and annotation,temporal profile of document organization,temporal evolution of document distribution,as well as incorporates different sources by time-based document retrieval and observation.In the aspect of temporal expression extraction and annotation,we design the automatic algorithm to extract temporal expression and then link temporal fact and event together.We choose scientific literature and miroblog resource as research objects,and use LDA topic model and Labeled-LDA topic model to improve the annotation precise of temporal profile in each document,based on the temporal expression destruction statistic analysis of explicit temporal expression,implicit temporal expression and relative temporal expression implied in each document.Finally,we chooce the existence indexed hashtags of miroblog as gold standard for evaluating the performance of Labeled-LDA topic model.In the aspect of temporal profile organize and retrieval,we improve the research topic ranking using temporal information retrieval model while calculate Kullback-Leibler divergence between document from temporal Profiles.Then we efficiently identify insightful time points for documents include in each research topic which have similar temporal evolution pattern based on the time-aware Kullback-Leibler divergence.Finally,we do data correlation analysis using topic model and Kullback-Leibler divergence of documents.We integrate different research topic which have different time evaluation pattern based on temporal expression distribution for document sets,and then construct general timeline of each research topic and also finish visualization work.
Keywords/Search Tags:temporal expression extraction and annotation, temporal knowledge and fact, temporal information retrieval, temporal database, Conditional random fields, the integration of resources research
PDF Full Text Request
Related items