Font Size: a A A

Research On User Features Based Data Mining In Social Networks

Posted on:2015-03-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LianFull Text:PDF
GTID:1268330425989213Subject:Information networks and security
Abstract/Summary:PDF Full Text Request
As one of the most valuable resources of the internet, data usually contain important information and can be utilized for many ways. Thereby data mining is significant for e-commerce, enterprise strategy and promotion as well as information diffusion and prediction. With the development of Web2.0techniques and mobile terminals, social network services are increasing their popularities and utilizations in people’s daily life. Compared with the traditional networks, it emphasizes more on user’s proactive role, diversity of network features, huge information, user interaction&reciprocity, and fast message propagation in social networks. Traditional approaches and models are inadequate to describe user behavior features. Therefore current methods are insufficient for data analysis and mining in SNS networks. In view of this, this paper utilizes interdisciplinary methods and theories to study on the researches of data mining in social networks, including data retrieval and preprocessing, network analysis, user influence and behavior, personalized recommendation, and machine learning based prediction methods, in order to enhance efficiency and effectiveness of the existing algorithms and models.The work of the dissertation is supported by the National Natural Science Foundation of China (No.61172072,61271308), Beijing Natural Science Foundation (No.4112045), and the Specialized Research Fund for the Doctoral Program of Higher Education of China (No.20100009110002). Main contributions of the dissertation are as follows:1. We research on information retrieval and data preprocessing techniques. Due to the idiographic requirements on data accuracy and computational performance of the models, we propose an integrated framework of data retrieval. Firstly, we optimize the distributed web crawler based on Nutch system by a synchronous operational architecture. This improvement can enhance the efficiency of the web crawler. Then we study on information extraction methods and propose two webpage extraction models based on rules and wrappers. Rules based extraction model is pervasive and has a simple computational complexity which is applicable for mass information process in the internet. Wrappers based extraction model can implement a highly accurate data retrieval and realize a structurized information extraction within the same domain. Besides, we research on rapid webpage de-duplication algorithms and automatic summarization algorithms for the purposes of reducing the data magnitude&dimensionality and enhancing the quality of the information.2. We empirically analyze the network features of SINA Microblog social network including user characters, tweet features, network evolvement, etc and discuss the dominating factors which act on user influence and tweet oriented information dissemination. Motivated by statistical results and conclusions above, we establish a user weight model for Microblog networks. This model is composed by user active degree and HITS based user influence factor. We improve the HITS algorithm by eliminating the iteration in node authority calculation process. The interaction among users is one of the most important identities in social networks. Therefore the analysis of user authority can take an active role in the researches of information recommendation and diffusion.3. We study on personalized recommendation algorithms in social networks. For the problem that the existing recommendation models can hardly describe user preferences in social networks, we introduce a tweet recommendation algorithm based on statistical features for Microblog networks. This algorithm combines content similarity, author influence and user reciprocity. It has a low computational complexity and is adaptive for the applications of real online Microblog systems. In order to enhance the recommendation accuracy, we employ a bipartite network based model by which is named NBI into our tweet oriented recommendation research. We improve the traditional NBI model by the original matrix and the link weights in the resource allocation processes. We combine the improved NBI model with user features for the final model which can eventually address the distinct characteristics for social network recommendations. The experimental results illustrate that the effectiveness of our proposed model is better than either the traditional NBI model or singe preference based recommendation model.4. We propose the information prediction approaches based on machine learning algorithms. According to the empirical analysis of SINA Microblog network, we confirm and quantitate the exact features for the eigenvector of the data samples. Consequentially we establish the user link prediction model based on logistic regression and the user re-tweet model based on SVM. In order to improve the classification accuracy and enable a robust implementation for big data, we preliminarily explore the parallel computing patterns for the relative machine learning models in the parameter training process, and optimize the coefficient weight of the slack variable in SVM model. Eventually, we employ the computational result of the user re-tweet model as the prior probability for an arbitrary node, and then utilize Monte Carlo method to simulate the tweets propagation process in SINA Microblog. This method is based on the microcosmic user model, and integrates with the macroscopic simulation of the information dissemination. Therefore, it can not only predict the general trends of a web topic, but also can discover the key users in the information diffusion trace. These approaches could provide positive ideas for the researches of information dissemination and prediction.
Keywords/Search Tags:Data Mining, Machine Learning, Personalized Recommendation, Information Perdiction, Complex Network, User Behavior Analysis
PDF Full Text Request
Related items