Font Size: a A A

Research On Personalized Information Dissemination And Conceptual Retrieval

Posted on:2003-11-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1118360185496993Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information retrieval is concerned with selecting documents from a collection that will be of interest to a user with a stated information need or query. How to let user gets the information he (she) wants is becoming increasingly important in wide-area information system. This dissertation describes a new Concept Network - Views (CN-V) model to represent documents and user's interests, together with user's access pattern analysis and ontology theory, which can be used to overcome several limitations both in traditional user's interests modeling and information retrieval.CN-V model is the kernel of this dissertation, which can be divided into two phase: (1) Transform text from word space into concept space; (2) Generate CN-V model from concepts. In the first phase, we combine statistics technique and rule based method to resolve these problems, and we use wordnet and how-net to disambiguate word sense. Extended phrase mining algorithm is presented to extract semantic units in documents. In the second phase, we give ConceptRank algorithm to extract the topic of documents and user's interests. At the same time, the concepts that have high ConceptRank but don't give contribution to the topic will be distinguished as hub concepts, in order to avoid them affect the efficiency. At last, we present two similarity measures based on CN-V: energy decreasing algorithm and cosine measure of concept vectors.Our research on personalized information dissemination include three parts: (1) We give a new user's interests modeling method based on CN-V model, analyze the main factors in personalization; (2) By analysis on user's access pattern, we give the potential interests mining algorithm. Firstly we collect personal data, these data will be clustered and represented by Concept Network - Views model after preprocessing, which can be used in information recommendation. (3) Collabratative information filtering technique can support personalized information dissemination too. We use ISODATA algorithm to cluster user's feedback, and the result can be used to discover user groups that have similar interests.Also, Document representation is based on CN-V model. Further more, the tags of HTML files can be used to represent document efficiently. And we give high efficiency methods for accessing dictionary and invert file indexing.Traditional information retrieval methods are based on keyword matching, which face two difficulties: multivocal and thesaurus. At the same time, how to integrate the semi-structured information and structured information during information retrieval is also an important problem. We represent texts in multi levels and use domain ontology to associate them with the data in database, in order to achieve conceptual retrieval.
Keywords/Search Tags:Personalization, Information Dissemination, Conceptual Retrieval
PDF Full Text Request
Related items