Font Size: a A A

Research On Domain Classification Of Search Engine Queries

Posted on:2018-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2348330518498509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, Web information grows exponentially, which brings difficulties for users to retrieve useful information. Therefore, to identify the query intention of search engine users has become one of the hot topics in Web information retrieval field.One of the key to identify user's query intention is to construct query classification system. The existing query classification systems have some drawbacks, such as large granularity of classification, ambiguous recognition of the query intention and so on. Recently, a variety of query features were proposed and the corresponding classification systems were established. But these methods relied heavily on artificial feature extraction and hardly considered the user information of search engine.So they are not feasible for automatic classification of queries.To solve the problems mentioned above, this thesis firstly analyzes the structural characteristics of search engine query log and extracts relational data of queries, then proposes an automatic feature generation method considering both the user information and clicked URL, finally builds an automatic classification model of queries. Two kinds of automatic classification algorithms of queries were proposed in this thesis:(1) The first algorithm is based on matrix decomposition, which uses probabilistic latent semantic analysis model to analyze binary relations.Experiments show that the classification effect of the first algorithm is not ideal, but its performance is greatly improved when adding prior knowledge and using semi supervised probabilistic latent semantic analysis.(2) The second algorithm is based on tensor decomposition, which improves the first one and uses tensor decomposition model to analyzeternary relations, so as to generate the classification features of queries.Finally, this thesis performs experiments using LIBSVM with the query log provided by the Sogou lab. The results show that the user information introduced in this thesis contributes to improve the classification effect. The two algorithms proposed in this thesis can effectively achieve the classification of queries. The second algorithm outperforms the first one, and it is more suitable for domain classification of queries than the first one.
Keywords/Search Tags:query log, feature extraction, matrix decomposition, tensor decomposition, support vector machine
PDF Full Text Request
Related items