Font Size: a A A

Research On Some Key Technologies Of Multi-element Social Network Extraction And Analysis Based On The Web And The Email

Posted on:2013-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:M J YinFull Text:PDF
GTID:1228330395480618Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology and network communication technology,the number of illegal activities and incidents through the Internet is getting increasingly larger.Therefore, it has become an important research topic to accurately extract person attributes andsocial relations from multiple types of network data, and then mine the potential key persons andcommunity organizations. Although there are many developed technologies about social networkextraction and analysis on a single type of network data, these technologies can’t resolve theproblem of extracting and analyzing social networks based on a variety of network data. Thisthesis firstly analyzes the related works and applications about the important technologies onsocial network extraction and analysis based on network data. Then, taking Web pages and Emaildata as examples, this thesis carries out an in-depth study on several key issues in social networkextraction and analysis based on a variety of network data. These issues include the socialnetwork models, person attribute extraction, social relation evaluation and community detection.And the primary works and contributions are as follows:(1) The social network models. As the existing social network models can’t show the entireinformation about the attributes of persons and social relations among persons in a great deal ofnetwork data, the concept and the model of multi-element social network are proposed, and thedescription of the multi-element social network instance based on the Web and the Email arepresented. And the model provides basis for researches on social network extraction and analysisbased on various network data, such as person attribute extraction, social relation evaluation,community detection, and etc. With this model, the framework of multi-element social networkextraction and analysis is put forward, and a brief analysis of the key techniques in theframework is presented. And the framework is a good guidance to different researches andsystem designs related to multi-element social network.(2) The person attribute extraction in Web pages. As the existing concept and approach ofWeb person attribute extraction can not resolve the problem of automatically extracting personattributes in Web pages with different types of the known person attribute, the concept andformalization description of the generalized Web person attribute extraction are proposed. Inorder to solve the problem of the generalized Web person attribute extraction, a novel Webperson attribute extraction method named MFAR is put forward, which extracts person attributesby using multi-feature automated reasoning. In defining the attribute association rules of MFAR,multiple association features with good versatility are raised, based on one or several of whichthe attribute association rules are defined, and logical representations of the association featuresand rules are presented as well. Also, the problems of automated training and reasoning theassociation rules in MFAR are resolved by using the Markov Logic Networks, and theframework of automated training and reasoning association rules based on the Markov LogicNetworks is put forward. The experiment results show that when faced with different kinds ofWeb person attribute extraction problems, the proposed approach can more accurately extractperson attributes from Web pages automatically than some of the existing methods based on asingle rule. (3) Person attribute extraction in Email data. The Email data based person attributeextraction framework is raised to solve the problem of person attribute extraction from Emaildata. Considering one of the problems in the framework, that is extracting the candidate nameattribute from salutations and signatures in Email bodies, the statistics and rule based blocklocating algorithm is proposed. And for the other problem, that is ranking the reliabilities of thecandidate names, the candidate name reliability evaluation algorithm based on clustering andcommunication importance is put forward. This algorithm evaluates candidate name reliabilityby clustering candidate names and analyzing the importance of names in Email communications,and extract persons’ credible names and their aliases based on the reliabilities. The experimentresults on the Enron Email datasets show that the proposed block locating algorithm canrelatively locate and extract salutation and signature texts in Email bodies, the candidate namereliability evaluation algorithm can precisely extract person’s formal names and aliases.(4) Social relation evaluation based on Web pages. As the existing Web social relationevaluation method is insufficient to acquire accurate and stable results, a Web social relationevaluation model named SETARM based on the search engine and the text analysis is proposed.With this model, two typies of relation evaluation functions are designed and the correspondingevalutation methods are presented. The experiment results demonstrate that the SETARM modelbased relation evaluation methods are able to acquire relatively high accuracy and stability, andthe performance of the model can be better when the two primary relation evaluation approachesof the model are integrated in the linear way and the method based on text analysis makes abigger contribution.(5) Community detection algorithms. As the existing community detection algorithms arenot able to well solve the problems of community detection in multi-element social network, thebasic idea of community detection in multi-element social network is proposed. On account ofthis idea, to transform the multi-element social networks into the weighted networks, themulti-element information based relation closeness evaluation method named MICE is putforward. In order to discover communities on the weighted networks, the two-stage local greedexpansion algorithm named TSLGE is proposed. This algorithm makes improvement in the keyissues, such as seed selection, expansion evaluation function definition, similar communitiesmerging, and etc. The experiment results on the Enron Emai datasets show the relation closenessevaluated by the MICE method can reflect the real relationship among people. The experimentresults on the synthetic benchmarks and empirical networks show that the performance on therun time of TSLGE is good, and compared with some typical community detection algothrimsbased on local expansion, TSLGE can detect communities on both unweigthed networks andweigthed networks with good qualities.Finally, the research work of this thesis is summarized, and the future developing directionsof multi-element social network extraction and analysis are indicated.
Keywords/Search Tags:Social Network, Social Network Analysis, Multi-element Social Network, PersonAttribute Extraciton, Social Relation Evaluation, Community Detection
PDF Full Text Request
Related items