Font Size: a A A

Research On Key Technologies Of Context-Aware Web Search

Posted on:2009-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:K Z JiangFull Text:PDF
GTID:1118360272991208Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the explosive growth of Internet, WWW has been developed into a dynamic information service network which has many kinds of information resources, many worldwide websites, and provides users with an extremely valuable source of information. The aspirations of information sharing have become true. However, the "information overloaded" caused by the vast amounts of information stimulates efficient Web information retrieval technology. In September 2002, an international conference about information retrieval challenges of the future was held in CIIR in the Massachusetts Institute and contextual retrieval were identified as particularly important long-term challenges of information retrieval. Since 2004, every two years an international conference about information retrieval in the context have been held.In the information retrieval activities, users and the information of users' need are all in their own contexts. On the one hand, users are at the Task Context, User Context, Query Context and so on. On the other hand, the information need of users is in the Author Context, Link Context, Structural Context, Path Context and so on. In order to be able to provide users with high-quality information, information retrieval model must combine the context of two sides into a single framework, and form the context-aware information retrieval model.According to the strategic objectives of information retrieval and the status of Web search, we launched an in-depth study on Contextual Retrieval. A context-aware retrieval model was put forward, in order to solve the user's query and similar pages search. The main characteristics of this model are:Firstly, the model can be aware of the user's query intent or theme: A local sub-tree from a reference ontology can be obtained based on combining user's query and context. The sub-tree of a user's query reflects the real intention or theme. In this paper, a series of algorithms are put forward to obtain this sub-tree.Secondly, on the theme of the expansion of the tree Based on the trees proposed in 1), the leaves were based on the reference nodes in the body of the ISA and non-ISA expansion of relations. thus, get a user's query as the center of the concept map, called the user's personalized concept map.The Web pages were represent as vectors in term of key words of the personalized concept map, i.e. the content of pages is restricted in the concept of the information sub-space of personalized concept map. The measurement between concepts of personalized concept map will weight link measurement between pages. In this paper, a series of algorithms of measurement are put forward.Thirdly, the model can be aware of semantic information from author of pages: The authors of pages are the context of the information requirements. The topic of the authors' network and the topic of the link network are similar or the same. It is necessary to research about the authors' network. In this paper, "simple document" concept is introduced . "Compound document" are comprised of simple documents. Data sets are constitutes of compound document and model as a tensor. Through decomposition tensor, the semantic similarity between the members is defined and its algorithm is put forward.Fourthly, the model can be aware of the link structure context of the information requirement: The link network is comprised of the pages through link between pages and become the link structure context of the information requirement. The topology of the previous user's concept map based on 1) and 2) is applied to the context: on the one hand, it takes the concepts (keyword) of the user's concept map as the term and denote each page as a vector, and calculate the term's weight as CF-IDF like TF-IDF; On the other hand, it assigns a weight to the link. Weight calculation is based on the personalized semantic similarity. Through the adjacent weight matrix we can calculate the authority scores of pages and sort the pages in accordance with the authority scores. A series of algorithms are proposed in the paper.Obviously, the sequence of page which changed with user's concept map effectively overcome "Spamming", "Topic drift" and "One size fit all".Fifthly, the model can be aware of semantic information of anchor text of links: Add anchor text as the third axis or mode on the basis of the weight adjacent matrix of 4), so as to establish tensor model of the data. However, the mathematic theory of the tensor is unmature. The tensor model will be transformed into three-matrix model to avoid the tensor tool.This study content based on the Shanghai Science and Technology Committee on Science and Technology research project (Grant No. 055115001) "for voice services of volunteers push information service platform," participated by author. The Expo MIA system is realized about 2010 Shanghai World Expo based on this research project. The proposed algorithm in the system have been verified, the results show that they can effectively solve related problems and have high performance. Therefore, this paper's research results for improving the accuracy of Web search have great practical value.
Keywords/Search Tags:Information retrieval, Contextual retrieval, Personalized semantic similarity, Tensor, Link analysis, CF-IDF weight, Singular Value Decomposition
PDF Full Text Request
Related items