Font Size: a A A

Character Relationship Extraction In Microblog Based On Sementic Role Labeling

Posted on:2014-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LianFull Text:PDF
GTID:2268330392469079Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The social relations extraction on Social Network Sites is an important class ofinformation analysis of people s network behaviors. How to efficiently andaccurately extract a large scale character relationship information from the socialnetwork,is the focus of this paper. In this paper, an improved character relationshipsextraction algorithm based on feature extraction is proposed, the algorithm to extractthe relationship is applied in the corpus of Sina Weibo. After analyzing and filteringthe statuses or comments in Sina Weibo which contain two-person relationship,usingsemantic role labeling and syntactic analysis to extract character relationshipcharacteristics, training tectonic character relationships template,so as to realize therelationship between the characters classification. This paper mainly includes thefollowing aspects:Firstly, an improved character relationship extraction algorithm based on featureextraction is proposed. Through careful analysis and research on the structure of theChinese statement, two named entities in the sentence and the sentence in which thecomposition and sentence structure type as a characteristic of the relationship betweenthe characters was found, the core feature words selected by the above analysis, notonly provide a very big improve the accuracy of the selected feature words richfeature types.Secondly, the paper tries to figure out the character relation extraction in corpusof Sina Weibo. In this study, the Sina Weibo API interface is used to obtain somecorpus of the Sina Weibo, then filter the corpus to satisfy the text sentence structurestandards, and then use this in the improved algorithm in this paper to analyze thecorpus of Sina Weibo, taking its corpus implied relationship classification between thecharacters of the virtual relationship laps in Sina Microblog, some analyses on miningthe level of user s activity in the Sina Weibo are included, such as genderdistribution and geographical distribution.Thirdly, Flex technology is used to implement the visualization analysis of therelationship between the characters. Birdeye, which is an open source visualizationframework based Flex technology, is used to build character relationships networktopology, and use histograms and pie charts to analyze user s gender and regionaldistribution. Java language is used to make the logical calculation in the background,then pass the data to the Flex page, Actionscript is used in Flex page to parse the data,make the data format meet the Birdeye component data structure, then thevisualization of the relationship in Sina Weibo is implemented.In this experiment,3000statements of People’s Daily corpus and3000statements of Sina Weibo status are selected to be the corpus, which contain twonames entities.In the experiment with Sogou news corpus, the proposed method inthis paper achieved precision and recall rate of81.17%and81.00%, the method which just select context words as features achieved precision and recall rate of72.32%and72.35%; in the experiment with Sina Weibo status corpus, the proposedmethod in this paper achieved precision and recall rate of71.65%and71.70%, thesecond method archived precision and recall rate of62.67%and62.60%. It is showthat the character relations extraction method based on semantic role labeling featuresare able to achieve more optimal results in both news corpus and Sina Weibo corpus.
Keywords/Search Tags:character relationship extraction, Microblog, semantic role labeling, feature extraction
PDF Full Text Request
Related items