Font Size: a A A

Research And Application Of Abstract Technology And Query Behavior Analysis In Search Engine Of Universities

Posted on:2011-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:X R XuFull Text:PDF
GTID:2178360302980138Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The research background of the paper is education resources search engine in Donghua University which search mainly resources of website in Donghua University, admissions information of colleges and quality courses information. It can help the teachers and students of Donghua University to find information of school, and also help others to find enrollment information as well as information on fine courses.This paper studies two aspects. One is to study automatically summary technology of Web page. The goal is to generate a good summary which contains the theme of the page and takes into account the user's query words. On the other hand is to analyze the user's query logs, mainly by analyzing the user's query logs to provide users with query suggestion. The purpose is to help users more accurately describe the demand information.Text summarization techniques can not only compress the text, reduce the burden on users to browser the information, but also provide support for other text processing technology, and has become the research hotspot at home and abroad. This paper researched automatic summary technique that based on statistical method to summarize Web pages which involves text structure analysis, keyword extraction, sentences' importance calculation, summary generation and other key technologies. In this paper, the summary technique is to summarize the Web pages. So we have to consider the query words entered by the user. Important degree of the sentence calculation is divided into two parts: the one only considers the important text information and another consideration of an important degree of query words in the importance calculation, the first part of which can be calculated in advance in order to enhance the efficiency of the generated summary. Finally, delete the redundant of the summary to output.The sentence importance calculation need consider various factors and compute the weighted average of these factors. In this paper, genetic algorithm trains the weighting coefficients. There are small differences in fitness of individuals. In order to increase the picking probability of the best individuals, the selection operator of genetic algorithms has been improved. Genetic algorithm is prone to premature convergence. When premature convergence appears, make use of logistic equation, and use the current best individuals as the initial value to regenerate the population, so that diversity of species recovery, while retaining the best individual.In order to enhance the user experience, and help users to accurately describe the demand information, mine the user's query log to cluster those related query terms. After the user submits a query, the search engine returns the results and the relevant query suggestion at the same time. This paper analyzes the K-Means clustering algorithm, and the problems of the algorithm. Then, improve the method to solve these issues, and then use the improved K-Means algorithm to cluster user query logs. The distance between data objects was calculated not only using the Euclidean distance between URLs, and also considered the distance between the terms of the query. According to the query words submitted by the user, to find their respective classes, and return the most relevant query words as query suggestion. In order to verify the effectiveness of the algorithm, use the improved algorithm to divide the data divided into many different classes in advance. The cluster algorithm is found to achieve good results.Finally, train texts in corpus using the genetic algorithm to get weighting coefficients of the influencing factors. Then use the weighting coefficients to summarize a Web page. It found that the summary is a very good summary of the text topic. After analyzing the user query logs, clustering the query words using the improved clustering algorithm is carried out the experiment, found the result has a larger correlation with the query words.
Keywords/Search Tags:Automatic Abstract, Genetic Algorithm, Chinese Segmentation, Clustering Algorithm
PDF Full Text Request
Related items