Font Size: a A A

Research And Application Of Short Text Classification In Search Engine

Posted on:2016-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2308330464457719Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, information is more and more rich, people obtain the information they need to become more and more difficult. Search engine as a platform to help people to get information, and it becomes a very important element in the Internet. For the user’s search word, you can mine some potential user intent. According to the classification of the search words can know the area which user wants to search for their results. Aim at the characteristics of user intention and the filed can recommend users to meet the demand of the user’s Web applications to optimize the search results.In this paper, to study the classification for the search word, analysis of the characteristics of such a short text search text, and the difficulties in the classification. As the search term contains too little information and the expression is not standardized, the traditional exact match, N-Gram match, semantic dictionary expansion and other methods of classification are all limitations. This paper presents a solution to a three-phase short text categorization to solve search word classification problems: short text-based pre-processing stage pseudorelevance feedback, the training phase short text, short text classification stage. Use pseudo-relevance feedback in short text expansion.and use calculation feature weights and voted to realize the classification algorithm, short text classification problems will eventually be transformed into mature long text classification problems, and by extension experiments comparing different methods of classification short text corpora in search ranking factors used. In the problem-solving process, the feature weighting made a thorough study, the traditional TF-IDF method does not consider the categories of information, this paper within the class concentration, improved inter-class dispersion TF-IDF feature weighting methods, and by experiment to verify the availability of the method.This article use short text classification technology into the search engine, and design of the overall architecture of the system, short text classification module, Web application system architecture. Detailed design and implementation of short text classification modules various processes, and the use of the feedback learning classification algorithm to optimize the classification model.
Keywords/Search Tags:Short Text Classification, Search Engine, Feature weighting, TF-IDF, Pseudo-Relevance Feedback
PDF Full Text Request
Related items