| Character attribute recognition refers to the process of using text,social relations and other data in multi-source networks to extract character features and use machine learning,deep learning and other technical approaches to identify character attributes such as positions,workplace,research fields,etc.It plays an important role in the fields of personas and user recommendations.Internet data is an important source for obtaining character attributes,and its openness makes the character attributes represented in network data quite complex.Therefore,it is necessary to comprehensively analyze multisource network data to extract complete and accurate attribute information of target character.There are many existing methods for extracting character attributes,but it is still difficult to meets the needs of practical applications for two reasons: firstly,the character attribute information contained in single source network data is very scattered and one-sided,but there is currently a lack of effective approaches to establish connections between multi-source network data,resulting in the inability to further carry out character attribute recognition work;Secondly,in terms of specific character attribute recognition,unstructured text data is the main input for character attribute recognition.However,existing methods lack sufficient summary of the features expressed in the text,resulting in low accuracy in character attribute recognition.This thesis aims to improve the integrity and accuracy of character attribute recognition by establishing connections between multi-source network data and summarizing the features of character attribute expression in text.The main contributions are summarized as follows:(1)In order to establish connections between multiple sources of social network data and provide complete data support for character attribute recognition,this thesis proposes an account matching method based on hypergraphs.Social network is an important source of character attribute information,but the character attributes they represent are scattered.Therefore,this thesis introduces a hypergraph model to uniformly represent complex data from multiple social networks,and implements representation learning of multiple social network data in the same embedding space,avoiding the noise introduced by existing topology based graph representation methods.Furthermore,an unsupervised matching model is trained using user profile similarity and node proximity information across social networks,and the integration of multi-source network character attributes is achieved by matching account information from multiple social networks.Experiments have shown that this method greatly improves the accuracy of account matching and effectively establishes connections between multi-source social network data.(2)To address the issue of accurately extracting character attributes from text data,this thesis proposes a syntactic feature-based character attribute recognition method based on the integration of multi-source network data.This method uses dependency relationship to reconstruct the context in which character attributes appear in the text,allowing the model to capture the lexical relationships of character attributes at the syntactic level across multiple lexical elements;In further analysis of character attribute at the syntactic level,this thesis identifies syntactic features that are difficult to recognize in text using traditional methods,and represents the syntactic features of character attributes by constructing feature functions,thereby solving the problem of insufficient utilization of syntactic features in character attribute recognition.Experiments have shown that this method effectively utilizes the syntactic features of character attributes and improves the accuracy of character attribute recognition in text. |