Font Size: a A A

Tag Recommnedation For Dialogue Corpus

Posted on:2013-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:G N FangFull Text:PDF
GTID:2248330371966323Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of network information, people hope vast amounts of text could be marked with appropriate tags. In other words, the content of text is described with one or a few words. As a result, that can greatly accelerate people’s browsing speed. Furthermore, the performances of tasks in Natural language Processing field, such as text classification and information retrieval could be promoted with high quality labels. Therefore, there are many researches focusing on automatic tag generation (tag recommendation). At the same time, with the fast development of social networks, for example instant chat, twitter, microblog, people express and exchange their views using these tools. However, there exist great differences between this kind of data and web pages. For instances, they all have certain characteristics of dialogues, they are usually short and have loose structures. These characteristics bring more difficulties on tag recommendation for dialogue corpus. At present, the research directly to tackling this kind of data is still very rare. And whether the methods of tag recommendation that have good performances on web pages could be suitable for dialogues are still unknown.This paper focuses on the data which own the characteristics of dialogues. We research tag recommendation, relevant words and dialogue characteristic in-depth and propose an unsupervised method for generating informative tags for multi-party dialogue in an open domain. Our model first extracts keywords from text through a multi-weighting framework, which includes frequency weighting, sentence weighting, speaker weighting and position weighting. Then we get their bigrams through frequent pattern matching. In order to generate more flexible and socialized tags, we expand keywords and their bigrams by exploring tag associations mined from a famous bookmarking web Delicious. Finally we rank the three parts of tag candidates under a uniform metric.The main research contents are as follows:1) We conduct a deep study on the characteristics of dialogue data and analyze this kind data from five aspects, there are dialogue format, discourse mode, discourse style, discourse field, turn-taking. 2) According to the characteristics of the dialogue data, in the module of keyword extraction, we propose a multi-weighting framework by considering four weightings and there are frequency weighting, sentence weighting, speaker weighting and position weighting. On the basis of extracted keywords, we get bigrams through POS pattern matching. The experiment results on two dialogue datasets indicate that our algorithm is effective;3)In the section of social tag expansion, we introduce one classic association rule mining algorithm named Apriori to get social tags which highly related existing keywords and bigrams. The results show the method is available.
Keywords/Search Tags:dialogue, tag recommendation, multi-weighting, association rule, social tag
PDF Full Text Request
Related items