Font Size: a A A

The Extraction And Analysis Of The Social Network Data

Posted on:2017-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y RenFull Text:PDF
GTID:2348330518496228Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recent years became the golden age of the development of the internet,with the rapid growth of the number of Internet users.Internet users at the same time as the "producer" and "consumers",the double role,take part in the Internet life,and make greatly promoted the development of social networks.So scholars put forward,compared to the previous Internet environment,we have entered the WEB2.0 era.At the same time,Chinese social networking after continuous iteration and development,such as Sina micro-blog,"(WeChat)circle of friends" got lots of application and praise from the users,and they have become the most famous social network.The scholars did a lot of research to the social network(micro blog),especially in the analysis emotion of the words.But there are still some problems in the related research.This paper start research from the work of previous scholars and try to solve the problems existing in works.The main content is as follows.According to characteristics and limit of web data collection of Sina micro-blog,we design distributed web crawler system architecture and realize the program by using the Python programming language and MySQL database.To extract a certain amount of micro blog and micro blog comments of them,then mark the emotion of them manually as the experimental data.We implement and improvement "Based on the traditional text classification algorithm(SVM)" and "Based on emotion dictionary algorithm" by increasing special processing to dual sentiment emoticons,special punctuation and optimize network word dictionary by using Java programming language.According to the characteristics of the"conversational" micro blog comment,the Java programming language is used to realize the hierarchical analysis algorithm based on structure of social network comment.We compare and analyze the accuracy of the improved algorithm in the experiment.The experimental results show that the design and implementation of the distributed network crawler can achieve stable and rapid access the data of Sina micro-blog.Improved algorithm and experimental results show that,double emotion emoji,punctuation,network word dictionary,stratified analysis "dialogue" micro-blog comments and other special treatment can obvious improve the correct rate of sentiment analysis of micro-blog review.
Keywords/Search Tags:social network, distributed web crawler, comments of micro blog, sentiment analysis, SVM
PDF Full Text Request
Related items