Font Size: a A A

Social Network Data Acquisition Method Research And System Implementation

Posted on:2019-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2348330569987725Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era,social networking sites have been born that meet the needs of users.Different social networking sites have their own unique structure.Data is hidden in the structure.The web page structure of a website is like a tree.The data is on the branches.Fruits,how to quickly and easily pick fruit from trees,has become a topic that people are striving to study.At the same time,massive data does not mean massive amounts of important information.Most of the data obtained in social networks is of no value to users,and there are data processing,inquiries,and analysis steps from massive data to valuable information.Therefore,how to obtain relevant data information quickly and accurately is also of great significance.The information contained in social networks is of great value,but the collection of social network data usually has problems such as narrow application scope,huge repetitive workload,and professional knowledge required by data collection personnel.At the same time,it can meet the needs of users by querying the massive data obtained.The information also has some problems.This paper studies the existing social network data collection and query methods.According to the increasing data collection and data query requirements,a set of social network data collection and query system is designed.The system satisfies the specific point of view and completion of this article.The work is as follows:(1)Designed and implemented a social network data collection and query system.In order to satisfy the efficient,stable,and reliable collection of data on social networking sites,the social network data collection and query system is designed and implemented from a convenient and applicable perspective.The system consists of three parts: server,client and data storage.The client provides the user with a visual interface,which is convenient for the user to use the system.The data storage provides a stable and secure data storage environment,while improving the data query rate and enhancing the user.Experience;The server is the core of the system,it contains two core methods of data collection and data query.Based on the research of data collection and query methods,combined with the characteristics of social network structure diversity and the need for monitoring and management of collected data,this paper proposes a self-adaptive data collection method for social networks and an extended query method based on weights and semantics.System data acquisition and data query performance enhancements.Through tests and practical applications,the system can collect most of the social networking sites,and can monitor and query the collected data in real time.(2)According to the characteristics of social network diversity and user's massive demand,a self-adaptive data acquisition method of social network is proposed.The adaptive data acquisition method consists of reconstructing the DOM tree,generating data collection code,and extending the isomorphic web page link.The reconstruction of the DOM tree refers to parsing the source code of the web page by the breadth-first algorithm,and obtaining the required data and combining it with The marker information together constructs a new DOM tree;generates data acquisition code for generating adaptive data acquisition code,and at the same time,in order to increase the self-adaptation of the acquisition webpage,this paper proposes an acquisition path generation method based on relative paths and absolute paths.Enhanced web page data collection adaptability;homogenous web page link expansion means that by comparing web page similarity,obtaining an extended link that satisfies a requirement and using an extended link to generate a link extension rule,it realizes a rapid expansion of the number of homogeneous web links.Objectives,and through test analysis,show that link extension rules are universally applicable and effective.(3)Based on the query and monitoring of collected data,an extended query method based on weights and semantics is proposed.This method improves on the query expansion method of automatic relevance feedback.Based on the original word frequency expansion,the method of semantic similarity based on lexical decomposition and combination judgment is introduced to solve the problem of word mismatch,and based on the local context analysis method.Calculate different weights of extended query words and original query words,and integrate weights into the original query model.Through test and analysis,the results show that the new extended query method improves the accuracy of the query results and the user can also monitor the collected data to see if the collected data meets their own needs.
Keywords/Search Tags:social network, data acquisition, data query, system
PDF Full Text Request
Related items