Font Size: a A A

The Key Technology Research And Implementation Of The Software Big Data Continuoius Aggregatio Platform For Open Source Community

Posted on:2019-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:W LinFull Text:PDF
GTID:2428330572995083Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,open source software has developed rapidly and has been applied in an increasing number of application areas.At the same time,the success of open source software has also attracted a large number of developers to participate in the development of open source software.Only the GitHub community has hosted more than 60 million repositories,and more than 20 million users are involved in the development and maintenance of these project repositories.The open source communities has accumulated a large amount of data such as development data and behavior data.These valuable data have gradually attracted the attention of researchers.A series of work related to open source software,such as collaborative development mechanism and quality assurance mechanism,has been conducted.Efficient and reliable data acquisition is an important prerequisite for all the research.In order to better support the effective conduction of this type of research,in this article we propose a data acquisition platform for software big data towardsGitHub community.The main content of the article includes the following:First,in terms of raw data collection,this article has proposed an easily scalable and efficient data acquisition system.According to the business logic of the system,this article divides the entire system into two modules:task generation and task execution.The two modules are connected and interacted through task queues and data storage.By this decoupled approach,this article parallelizes the relatively time-consuming and resource-intensive task execution module and improve the ability of the system to scale in real time.This makes it easier to meet the user's demand for high-speed data acquisition.Second,in terms of structured data extraction,this article proposed a multi-source oriented data extraction system.This article proposes a template-based extraction strategy towards the wide variety of data types in the open source community.This article first separates the extraction logic and data format,and then design an extraction template for each data type.This strategy enables the extraction module to invoke different extraction templates for parsing and extraction according to different types of data.This approach improves the usability and flexibility of the extraction code and can better accommodate the acquisition needs of multiple data types.Third,in terms of data visualization,this article designed an intuitive and interactive data visualization system.This article visually demonstrated the data flow in the system and the state of each module of the acquisition system.This sub-system can enhance the controllability of the system and facilitating the interaction between the user and the system.Through this system,the user can intuitively obtain the data processed by the system and have a clearer understanding of the state of the system.In addition,users can easily operate each module of the system to facilitate real-time control of the system.
Keywords/Search Tags:open source community, data acquisition, data extraction
PDF Full Text Request
Related items