Font Size: a A A

Automatic Mining Of Relation Between Headwords And Requirement Words From Queries

Posted on:2009-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:J Z LiuFull Text:PDF
GTID:2178360308479403Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web log data mining technology is a widely used Internet technology. Its purpose is to tap the meaningful and valuable data from massive data in the web logs, so that we could better use the Internet.The analysis of natural language query expressions is often neglected in the community of information retrieval. It's a common practice that all word segments in a user's request sentence are utilized directly as search terms. The lack of thorough query analysis leads to a poor grasp of internet user's particular demands. Thus, the back-end system with all those complicated algorithms will not be fed with good inputs, proved to be not working well. The thesis focuses on the NLP application in query analysis to achieve a better understanding of internet users'demands. Such as:users enter the "mobile phone" implied that there is the "price" of the demand. The "mobile phone" is the headword of the input query and the word "price" is the requirement word corresponding the need of the headword. Analysis the relationship between the headword and requirement word can help creating the relationship network between words. It can be used to judge query intent, query expansion, and so on, it can be better able to guide the index to meet the demand of the users.The thesis aims to mining the relation between the headwords and requirement words from the large scale query logs data using the web data mining technology. First, it uses different types of templates of different class types, such as commodities, software category and so on, extracting requirement words from different class. Second, it gets the headword and its requirement words using the clustering methods. Finally, we filter the requirement words using the statistical and collocation methods. The precision of the relationship of headword and its requirement words can reach 90%, so the mining mean has practical values.
Keywords/Search Tags:data mining, collocation, clustering, query log mining, search engineer
PDF Full Text Request
Related items