Font Size: a A A

Research Of Intelligent Information Retrieval Based On Web Logs Mining

Posted on:2010-10-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:K P ZhuFull Text:PDF
GTID:1118360332957773Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era, the amount of web log data increases dramatically. Thus the problem on how to obtain, manage and make full use of the web log data has become an urgent issue in information science. As one of the basic tools to deal with the problem, the technique of Web data mining has received an extensive concern and won a tremendous development in the past several decades.Intelligent information retrieval based on Web logs mining aimed to analyze log data of Web information retrieval, mine Web user retrieval patterns from user logs, apply such patterns to improve current methods, and achieve the purpose of intelligent information retrieval. The goal is based on such hypothesis that there are some characteristics of user accessing Web exist in Web user logs, and these characteristics are reflected in some patterns, and the patterns can be mined and utilized. The research of the dissertation is based on mining of query log from Sogou search engine, uses such techniques as statistics, text mining, relation analysis, clustering and language modeling to get the valuable knowledge contained in user log, and the practice and application of acquired knowledge in area of query expansion, retrieval recommendation and user clustering was studied in depth. The experiments showed that the technique of Web logs mining can improve the performance of information retrieval system. The main content of this thesis consists of the following four parts:Firstly, we carried out the research on the analysis of retrieval rules in query logs. User query log is the important carrier to record the user behavior in Web search engine. We can summarized the general rules of user accessing Web and retrieval features through analyzing log files and mining the relationship between the log information. In order to better understand user's search behavior, the dissertation analyzes real Web logs by positive statistics, and made a detailed analysis on user behavior from the query, click-through information and the user session. The conclusions draws from experiment are very useful to improve the search engine's retrieval algorithms and get more accurate performance.Secondly, the research on correlation analysis based adaptive query expansion can effectively eliminate the query ambiguity and improve the precision and recall of information retrieval system. The thesis mines related queries through analyzing the relationship between the queries and their related documents, and propose a new query expansion method which extracted expanded term from the related queries. At the same time, we put forward a new method to measure the ambiguity of user queries, which can calculate the fuzzy degree of user's search intention, and also can estimate the performance of search session in advance. In the thesis, we use the query ambiguity measurement to dynamically adjust the number of expanded terms; the method can improve the flexibility and adaptability of query expansion.Thirdly, the research on information recommendation based on features fusion among information retrieval has been carried out. The Web page recommendation systems based on query logs mining can predict users'next clicking results in information retrieval process; the research will be beneficial to many applications ranging from intelligent recommendation to improving effectiveness of search engines. In the dissertation, a relevancy-based recommendation system is proposed, which combines document relevancy calculation with the method of statistical language model. Furthermore, both word frequency and the concept relevancy model of Hownet are used to compute document relevancy, the result is used to guide the process of pages recommendation. And in order to improve the applicability of recommendation, the methods of back-off smooth and related queries were used to amend the model. The experiments show that the performance of recommendation system has improved greatly.Fourthly, we carried on the research on the user clustering with retrieval intension. Our clustering is based on the analysis of user session in user's query log, the goal is to find out similar users based on their behavior or interest, and put them into one group. In order to solve the problem of current similarity calculation based on user session, we propose an algorithm to use related queries in query logs as the compensation characteristics to measure the similarity of users. And an improved affinity propagation algorithm will be used to clustering the user data; the algorithm can dynamically adjust the adaptive cluster parameters, detecting and eliminating clustering oscillations, scanning the parameter space to obtain the best clustering results.
Keywords/Search Tags:Web mining, User logs, Query expansion, Search recommendation, User clustering
PDF Full Text Request
Related items