Font Size: a A A

Research On Topic Detection In Cross-network Platforms

Posted on:2016-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:F BeiFull Text:PDF
GTID:2308330503950646Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, various network platforms appear on the real world, including news sites, micro-blog and social networks, have become main information sources to help the public cognize realistic society, so the social needs a way to obtain core content from multiple network platforms. Topic detection technique can help people to find valuable clues from huge amounts of network data. Different from traditional way of network transmission, the topics clues distribute in different network platforms at now. Under the condition of reality, on the one hand, clues characteristics are distinctive on different platforms, so traditional news topic detection methods cannot apply to other network platforms. On the other hand, the carrier of topic is story, and these stories will be as time goes on to describe the topic in different levels, so how to help people to cognize the development process of the topic events, has become a hotspot in the research of topic detection.In the research of topic detection, one of the key problems is the research of the relationship between part-of-speeches and the capacity of topic detection. Based on two representative corpus of news and micro-blog, an experimental study was conducted in the paper, in which the purpose is to find the effect and influence of different part-of-speeches and their combinations on the network topic detection. The research shows that if we choose a single part-of-speech as a characteristic is chosen, nouns can get the best results, and named entities can greatly reduce the dimensions of clustering characteristics while keeping the accuracy. If the combination of part-of-speeches as a characteristic is chosen, nouns or named entities, numerals, the time phrases, adjectives and quantifiers can promote the accuracy of news network topic detection while nouns or named entities, adjectives, quantifiers, numerals, and the combination of special symbols and sites can achieve good results on micro-blog corpus.Sub-topic detection is the basis of the topic evolution research, the existing research work aimed at a single network platform, rarely on across network platforms. Aiming to deal with this issue, this paper puts forward a topic detection method across network platforms based on words semantic and time features. First, we put forward a words weights calculation method using semantic and time characteristics of word features. Then, a multi-vector model will be created to detect sub-topics. Finally, we detect sub-topics with Layer-Partition clustering algorithm. Experimental results show that the proposed approach in the single topic detection of sub-topic effect is superior to other algorithms, and in the mixed topics can effectively promote accuracy.
Keywords/Search Tags:network platforms, topic detection, sub-topic detection, feature selection, part-of-speeches
PDF Full Text Request
Related items