Font Size: a A A

Analysis And Research On User Intention Of Short Text Based On Transfer Learning

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:X S WuFull Text:PDF
GTID:2428330605967341Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
In recent years,the development of artificial intelligence has been in full swing,and the natural language field has undergone earth-shaking technological changes in the past two years.It has entered a period of rapid development of technical blowouts.Conversation systems,machine reading comprehension,search recommendations,and text classification have been successfully applied To all aspects of life.Among information retrieval and search engine applications,fast and accurate user intention analysis is one of the hot spots of the next generation search engine.This topic is mainly aimed at the user's search in the entertainment search scenario,based on the analysis of the user's intent in the short text as the research object,and based on the research of the user's search short text query in the entertainment APP,a user query semantic space is proposed The fusion method maps the current search query and entities into a semantic space,realizes the classification judgment of the user's intention,and provides data basis for recall and sorting in the downstream of the search.In the research process,the user's short text data set was established,including two user behavior query data sets and Item information data set.By analyzing the user's search query,there are 13 types of primary tags and 244 types of secondary tags There are nearly 30,000 items of physical item data.Analyze and model the data,clean and filter,long-tail processing,etc.,and carry out certain pre-processing.This subject is based on the Embedding method of unsupervised learning training based on Word2Vec,Doc2Vec,Bert models,combined with supervised training MLP and FastText,TextCNN supervised learning labeled training,a fast and accurate user intention classification model is proposed,which can The query is Embedding vectorized,and the label category is located.Use the fastText network for Embedding multi-label classification,Embedding+MLP/CNN item multi-classification,by adjusting the feature dimension of the data set,fastText model parameters,training different word vector representations,gradually optimizing the word vector model,and exploring the vertical search query It is a good representation in the semantic space to iterate the classification model of intention recognition.In the Linux system,based on Tensorflow and fastText framework,Embedding is fused,label dimension multi-class experiment is performed on query dimension data(that is,user's intention classification),and multi-class experiment is performed on Itemid dimension data.In fastText's query dimension data multi-classification experiment,the classification accuracy reached 85.80%;in the MLP model Itemid dimension data multi-classification experiment,the head Itemid 142 class classification accuracy reached 85%,using the MLP+CNN model,the classification accuracy reached 88%;using the merged Embedding in downstream recall and sorting,the CTR conversion rate increased by 5.16%.Relying on Embedding's pre-trained user intention analysis to improve the expression of Item in the semantic space of vertical search,that is,when the user enters the query in the search,it can enrich the semantic related recall,improve the user's search experience and provide data support for downstream recall sequencing tasks.
Keywords/Search Tags:natural language processing, short text query, word vector, semantic space, intention analysis
PDF Full Text Request
Related items