Font Size: a A A

Micro-blog Feature Discovery And Topic Keyphrase Extraction Based On Language Network

Posted on:2015-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:H W MaFull Text:PDF
GTID:2268330428964458Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Microblog is a new kind of Internet media that appeared in recent years. It hasthe superiority of spreading rapidly and convenient to use. With the rapiddevelopment of Internet technology, especially the quickly growing number of mobileInternet users, microblog content is produced more and more everyday. Therefore,research on microblog content is becoming increasingly important. Based on thelarge-scale microblog content corpus, this thesis firstly construct word co-occurrencelanguage networks to find microblog stylistic features, and then with the topic relatedmicroblog content corpus, construct topic keyphrase extraction network. Through theanalysis and research of the language networks constructed, a microblog text researchmethod and a topic keyphrase extraction method are proposed. The experimentalresults show that the proposed methods are feasible.Firstly, we review the present studies on language networks and microblog text.With the analysis of related state-of-the-art technologies of language networksresearch, two manifolds are concerned for microblog content research:(1) From theview of linguistics, the stylistic features analysis;(2) From the view of text mining,microblog information mining.Secondly, a method based on the language network for microblog contentanalysis is proposed. Generally, the language network analysis methods are applied torecognize and understand common language network topology and the general rulesof evolution through quantitative study of linguistic forms. In this thesis, we apply thelanguage network analysis to such an Internet language microblog. By analyzing thecomplex networks characteristics of the microblog content language networkconstructed, we can discover the linguistic characteristics of microblog content as awhole.Thirdly, a keyphrase extraction method based on a topical language network ispresented in this thesis after summarizing the advantages and disadvantages ofexisting microblog keyphrase extraction methods. First of all,language networks oftopical microblog content are built, and then two central parameter-betweennesscentrality and closeness centrality combined with degree centrality in the small-worldcharacteristics of complex networks are used as the feature weights of the words, and then calculate the two parameter values of every node in the language network, finally,the topical keyphrases are selected according to the parameter values of the wordnodes.Finally, the experiments on the proposed methods in this thesis are performedbased on the large-scale microblog content corpus and topical microblog contentcorpus. The experimental results show that our methods are feasible for microblogcontent research and topical keyphrase extraction. Then a summarization on the mainwork is presented. The unsolved problem in microblog language network and topicalkeyphrase extraction are also analyzed and considered as our future research work.
Keywords/Search Tags:microblog, complex network, language network, keyphrase extraction, Chinese information processing
PDF Full Text Request
Related items