Font Size: a A A

Research On Tag Generation For Chinese Short Text Based On Community Question Answering System

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:W G LiFull Text:PDF
GTID:2348330515492633Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the number of Internet users is increasing dramatically,which results in more and more user-generated content.How to effectively manage the massive user-generated content has become more and more important.Community question answering system is a popular user-generated content application.As an important tool to manage massive user-generated content,tag system is also an important subsystem of community question answering system.The text of the question in community question answering system is a typical social short text which contains less information and has a sparse feature space.Due to these characteristics of the short text,the traditional text processing methods can not be well applied to the short text.Through the study of community question answering system and tag system,three methods are proposed to tag generation for community question answering system in this paper.The first is to exploit external knowledge base to construct graph model for tag generation.The second is based on text clustering and phrase discovery to tag generation for community question answering system.The last is based on content similarity to tag recommendation for community question answering system.The development of community question answering system is a process that the users gradually gather.In the early stage of community question answering system,the questions asked by users are more and the answers are less.When the number of users in community question answering system gradually increases,the answers in the question page of community question answering system gradually become more and more.When community question answering system is prosperous,the number of users is large and the type of question content in community question answering system is rich.At this point,the tag system tends to be stable,the proportion of new tags is low,and the proportion of reusable tags in community question answering system is high.The first method is to exploit external knowledge base to construct graph model for tag generation.This method uses Wikipedia as an external knowledge base to construct similarity matrix between the words,and then introduces the similarity matrix into TextRank model to improve the TextRank model.This method regards a term of Wikipedia as a topic,and then constructs the similarity matrix by measuring the similarity between the words according to the distribution of the words in the various topics.This method only needs the question text in the community question answering system to generate the tags and is well suitable for the early stage of the community question answering system where the content of question page is not rich enough.The second method uses text clustering and phrase discovery to generate tags.It is considered that the content of the question is highly correlated with the content of the answers in community question answering system.When the content of the question page in community question answering system is rich,this method tries to use text clustering and phrase discovery to generate tags.This method enhances the effect of tag generation by expanding the coverage of the generated tags.Firstly,this method extracts tag rules as a priori knowledge.And then cluster the question text and the answers text as a separate document.Finally this method uses tag rules,pointwise mutual information and contextual entropy to mine the phrases for tag generation.This method generates not only words but also phrases as tags,so as to improve the coverage of generated tags.This method needs the question text and the answers text in community question answering system,which is suitable for the scene where the content of question page in community question answering system is rich.The last method is based on similarity recommendation for tag generation.When tag system in community question answering system is gradually stable,the proportion of new tags is low and the proportion of reusable tags is high,so this method tries to recommend tags for each document.This method also clusters question text and answers text as a separate document.After finding the top-m documents of the target document by measuring the similarity between documents,this method clusters tags of similar documents as a tag candidate set,and then reorder the tag candidate set by similarity between the tag and the document.This method introduces topic model to reduce the feature space and measures the similarity between the documents from the topic level.This method,which recommends tags by reusing the tags,does not generate new tags.And this method needs the question text and the answers text in community question answering system,it is well suitable for the scene where the content of question page in community question answering system is rich and the tag system in community question answering system tends to be stable.
Keywords/Search Tags:Community Question Answering System, Short Text, Tag Generation, Graph Model, Text Clustering, Phrase Discovery, Topic Model
PDF Full Text Request
Related items