With the rapid development of Internet technology,people generate more and more behavior logs in the network.These behavior logs have rich data sources and can be used for extracting basic and advanced user features from various levels.The user features are collectively known as user profiles.Multi-granular user profiles not only represent user features accurately and provide personalized recommendations for users,but also expand user influence and achieve accurate marketing.How to build an accurate user profile from the massive user behavior logs has become one of the greatest challenges in the field of Natural Language Processing.In this study,log parsing and log application tasks are investigated on the basis of log analysis.User profiling is the focus of this study in terms of log applications.Not only the basic user features should be explored in user profiling,such as gender and age,but also advanced user features should be extracted,such as user relations.To address the above issues,this study proposes algorithms and models for log analysis,basic user profile extraction and relation extraction.The details of the study are as follows.1.For the task of log analysis,a log analysis framework is proposed,which consists of four main modules: log recording,log storage,log parsing,and log application.Log parsing is a major challenge for log analysis task,and the existing log parsing algorithms do not perform well in terms of accuracy and robustness,so a keywordbased log parsing algorithm is designed.This algorithm matches the corresponding log template by identifying keywords in the log messages.Then it calculates the distance between the log message and the log template to complete log parsing.Experiments on various public log datasets demonstrate the effectiveness and robustness of the algorithm.Then,the parsed logs are applied in practice to extract user profiles by performing statistical analysis on the logs of frequently visited web page sequences and user action logs in the system.Finally,people nodes and family trees are recommended to users according to their profiles.2.For the task of basic user feature extraction from user profiles,a joint user profiling model with hierarchical attention networks called JUHA is proposed.The model leverages the user’s behavior logs to predict user’s age and gender.Existing user profiling models consider this problem from only one perspective.In JUHA model,the user behavior data is divided into corresponding user behavior bags based on the behavior type.Then JUHA extracts the features of each bag using Convolutional Neural Network models,word-level and sentence-level attention mechanisms,which are annotated as user-inner features.Users are connected to each other based on their similar behaviors to build a user-user graph.Userinter features are extracted from this user-user graph using a Graph Convolutional Network.Finally,user-inter features and user-inner features are fused to learn the comprehensive user features jointly.The prediction of user’s age and gender is performed based on the final user representation.Experiments on two real-world datasets show that our JUHA model outperforms the baseline models.3.For the task of advanced user feature extraction from user profiles,a distant supervised relation extraction model with keyword and hierarchical attention mechanisms called PCKA is proposed.This model extracts relation between entity pairs from textual data bags.Due to the shortcomings of the distant supervised algorithm,the corpus contains a large amount of noisy data,and some entity pair bags contain too little data to extract effective bag features.Our PCKA model uses a hierarchical attention mechanism to mitigate the effects of noisy data and low-information bags.Firstly,an approximate relation vector is obtained by the entity pair vectors.Then the similarity between each word in a sentence and its approximate relation vector is calculated.Words with high similarity are selected as keywords,and these keywords are used as contextual representations of word-level attention to calculate the weight of each word in the sentence.Sentence features are obtained by both the word-level attention model and the PCNN model.In the sentence module,the attention weight of each sentence is also calculated by keywords to obtain bag features.In the bag module,bag features with high similarity are fused into the current bag and the corresponding weight of each bag is calculated by a bag-level attention model to obtain the final bag features for relation extraction.Experiments on the widely used NYT dataset show that the PCKA model outperforms existing distant supervised relation extraction models in terms of both AUC values and P@N values. |