Font Size: a A A

Key Technologies Research On Content Based Exploratory Search Guildance System For Mass Text Retrieval

Posted on:2012-10-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C WangFull Text:PDF
GTID:1488303356472574Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The content of this thesis is about key technologies research in exploratory search system(ESS). ESS mainly gains pragmatic information from users as system input via interactive strategies. As for this thesis, we mainly use user selected key words for search guild, user marks on retrieval entries, an overlap parameter for user's marks and a percent ratio of how many labels user has made by his need. By collecting all those pragmatic information above, exploratory search system can make semantic information as output which will satisfy users in a higher level of their information retrieval needs and eventually enhance the quality if information retrieval.The work of this thesis focus on two aspects of user search needs:learning to rank and investigate. The first chapter basicly introduces exploratory search and its related technologies. Main content begins from chapter two, which proposes a new learning to rank approach based on little pragmatic information and marks provided by user. From chapter three to chapter sex, the content focus on fulfilling a new kind of user need-"investigate" in information retrieval. Based on the ESS framework proposed in ACM SIGIR 2006 workshop, an evaluation approach is proposed in this thesis as a lead. Then, surrounding that evaluation approach, some current technologies are used for generating an ESS guild structure so that user need "investigate" can be fulfilled. The rest content studies and enhances several key technologies used in the ESS framework above. These technologies include evaluation approach of ESS, key words extraction on special field, topic clustering, guild hierarchy structure for ESS and words relationship classification. Chapter three introduces user information needs, evaluation approach of ESS and the concept design of ESS in this thesis. Chapter four firstly introduces the text base and the key word extraction technology used in this thesis. The following content of chapter four proposes a ESS guild structure based on topic clustering, which has better performance than LSA clustering. Chapter five introduces two kinds of words relationship classification approach which focus on precision and recall comparatively. In chapter six, the author made an evaluation on the ESS proposed in this thesis. The evaluation result showed that the ESS proposed in this thesis outperformed current main search expansion systems.This thesis made several contributions as below:Firstly, this paper proposed a new kind of learning to rank algorithm based on comprehensive information theory, which enhance the usage of user labels in learning to rank progress by collecting little pragmatic information of user, so that mass training on limited field for constructing learning to rank classifier can be avoided.Secondly, this paper integrated several current text processing technologies in order to realize an key word based ESS.Thirdly, the ESS in this paper generated semantic ESS guild structure by using combined simple pragmatic and simple syntactic information. Fourthly, a ESS evaluation approach is proposed in this paper and tested. By comparing the word expansion systems of baidu, google, LSA based query expansion and ESS in this paper, the results show a correct trend of our evaluation index. Furthermore, the ESS in this paper break through the neck that google and baidu expansion system lack of information capacity. According to the experiment in this paper, the overall scores of baidu and google are both zero but our ESS made 0.25 score better than score 0.0095 of LSA query expansion.Finally, several technologies studies and modifications are included in this paper.
Keywords/Search Tags:exploratory search system, information retrieval, search guild system, evaluation on exploratory system
PDF Full Text Request
Related items