Font Size: a A A

Research On Key Technology Of Internet Search Keywords Classification

Posted on:2012-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:J LvFull Text:PDF
GTID:2178330332475984Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the rapid development of the Internet, digital information on the Internet is experiencing the exponential growth. So it is more and more difficult to find specific need from the ocean of information. The search engine, which is one of the most popular tools to retrieve information, is indispensable for helping user get mass information. The Internet users more and more seriously depend on search engine, and search behavior has become the common Internet behavior of users. The most important part of search behavior is the Internet Search keywords (ISK) provided by users. These ISK can directly or indirectly reflected the users' potential interests and needs, which is fundamental for many personalized network applications, such as directional advertising and other network services.Therefore, this paper proposed a novel process of ISK classification analysis. To address this problem, we summarized the background and definition of ISK. Then we analyzed the characteristics of ISK, based on which we proposed a two stage solution for ISK classification. Firstly we described the ISK by pseudo relevance feedback, and then we can apply text classification technology for ISK classification. We also provided an idea that uses pseudo relevance feedback to convert unsolved classification to well-studied text classification.Beyond the process of solving ISK classification problem, this paper also studied and compared some of the text classification technology. We proposed an optimization method based on the concept of reconstruction for further feature selection. The method referenced the idea of column selection to present an objective function for selecting a subset from the rest of features, and finally obtained the feature subset by using greedy and transductive experimental design. And the experimental results showed its effectiveness. Besides, by comparing and analyzing the classification performances of different combinations of various feature selection and classification methods, we finally chose the suitable feature selection and classification method for the ISK classification problem. At last, we also presented some further improvements of this work.
Keywords/Search Tags:search keywords, pseudo relevance feedback, feature selection, text classification
PDF Full Text Request
Related items