Font Size: a A A

A Study Of User Goals Based On Dependency Relation Approach

Posted on:2012-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:R X DuanFull Text:PDF
GTID:1228330374499595Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of the Internet, millions of millions web pages appear in the web site. These web pages contain lots of information and update every day. In order to help users to find the desired information effectively, the typical information retrieval’s application-search engine come into being. However, because of the shortness and ambiguity of queries, well understanding of users’goals is still a problem. Understanding user goals behind queries has become a critical technology to enhance the performance of search engine. How to understand user goals is a complicated problem. Different query has different user goals. Even the same query has different user goals for different users. In order to simplify the user goals’problem, currently, researchers mainly treat it as a query classification problem and classify the query into a predefined taxonomy. However, the query classification taxonomy is always static and coarse and only a small part of user goals can be understood. This paper focuses on understanding user goals. Base on the query classification taxonomy, we use the dependency relation method to mine more meaningful user goals and further cluster the user goals into groups. Lots of achievements are obtained. The main research work and achievements are as following:Firstly, a novel dynamic hierarchical taxonomy for web queries is proposed and user goals detection is implemented in this taxonomy. In order to simplify the problem of understanding user goals, recently, researchers classified user goals into predefine taxonomy. Search engines can improve the performance and relevance by applying difference ranking strategy for each kind of user goals. Although, there is a lot of work focus on query classification, they are all based on the three-kind coarse taxonomy. This paper proposes a refinement method. Queries are classified into different hierarchical taxonomy dynamically. Then, query related snippets can be retrieved from search engine, and those snippets are regarded as context of the query. Dependency relation method is used to detect user goals in the context. However, the amount of direct dependency relation is not enough, so indirect dependency relations are built by making user of other relations to overcome the data sparseness problem. Finally, the experiment results show that our approach outperforms other methods both on precision and relevance.Secondly, we propose to use Hierarchical Dirichlet Process model to cluster user goals. Recently, researchers pay most of their attention on how to find better features to classify user goals. Though some researchers have proposed methods for detecting user goals, they only list the possible user goals and the user goals may have the same or similar meanings, i.e., user goals are not clustered. For the cluster number cannot be determined for each query in advance, the Hierarchical Dirichlet Process model which has the advantage of determining the cluster number according to the corpus is adopted. User goals are clustered according to the topic models. Documents which are composed by occurrence nouns, dependency relation nouns or both are used to represent the user goals’verb. The experiment results show that the Hierarchical Dirichlet Process and Dirichlet Process mixture model outperforms the Latent Allocation model and the Hierarchical Dirichlet Process model which incorporate document topic layer can resolve the user goals clustering problem better.Finally, short document query is classified using the short text classification method. When the search query is not composed by2or3words, but in a short text format, the user goals are always specific. The topic model can be used to classify user goals into specific topic. However, the short text contains less information and the structures are always incomplete, the traditional classification methods cannot classify short text well. This paper proposes a method incorporating grammatical information to increase the importance of the words which have stronger relationship with other words. By this method, the information is increased in the short text. The experiments show that incorporating syntactic information into short text can improve the performance. According to the topic type of each query, search engine can apply different strategy for each query. Besides, the method can apply to vertical search engines.
Keywords/Search Tags:user goals, dependency relation, topic model, HDP
PDF Full Text Request
Related items