Font Size: a A A

Analysis Of User Search Log And Its Application In Retrieval

Posted on:2021-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X P LiuFull Text:PDF
GTID:2428330602980883Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rise of the Internet and the rapid iteration of cloud computing technology,the amount of data generated and processed by each industry is growing exponentially.As a product of the development of the current era,big data is affecting social production and life in a diversified way.In the field of retrieval,search engine has become a turning point in the history of modem network development.Large search engine can generate and obtain tens of thousands or even hundreds of millions of click logs every day.These click logs contain a large number of user related information,so each major search engine company began to pay more and more attention to its own search logs.Through filtering and analyzing these logs,we can mine information related to users,so as to improve the effect of the search systemThis thesis focuses on the following work for massive user search logs:(1)Analysing the related technologies of log cleaning and analysing the data form of the original browsing log.According to the jump relation of browser record,establishing the corresponding relation between user search query and click,generating the click data flow of the user every day by the way of sliding window,cleaning and filtering the user's click links and normalizing the parameters based on spark through the method of related data mining,and generating the data used for subsequent related algorithms(2)According to the vector propagation algorithm,the relation between search words and links is mined out.First,the click bipartite graph of search words and links is constructed.Then,the click bipartite graph is modeled based on the random walk model,and the user search words and link data that are not clicked are mined out,so as to establish the implied relation between search words and links.At the same time,we can also use this algorithm to get the internal relations between search words and search words,links and links(3)Through vector propagation algorithm,we can calculate the relationship between the known search words and links,but in search engines,there will be a continuous stream of new search words every day.How to calculate the relationship between these new search words and the known links has become a problem that must be solved.In order to solve this problem,based on the data generated by vector propagation algorithm,an online training method is developed.The general model is used to calculate the association degree between user search terms and website links in real timeThrough the analysis and processing of the browser log,not only the relevant click features can be obtained,but also the existing click features can be used to expand the new click information,and the new search terms can be generalized according to the information.This feature can directly participate in the sorting of web pages,and achieve a more human-oriented sorting.
Keywords/Search Tags:Click Log, Data Mining, Spark, Vector Propagation, Click Generalization
PDF Full Text Request
Related items