The Key Technology Research And Implementation Of The Software Big Data Continuoius Aggregatio Platform For Open Source Community

Posted on:2019-11-15

Degree:Master

Type:Thesis

Country:China

Candidate:W Lin

Full Text:PDF

GTID:2428330572995083

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,open source software has developed rapidly and has been applied in an increasing number of application areas.At the same time,the success of open source software has also attracted a large number of developers to participate in the development of open source software.Only the GitHub community has hosted more than 60 million repositories,and more than 20 million users are involved in the development and maintenance of these project repositories.The open source communities has accumulated a large amount of data such as development data and behavior data.These valuable data have gradually attracted the attention of researchers.A series of work related to open source software,such as collaborative development mechanism and quality assurance mechanism,has been conducted.Efficient and reliable data acquisition is an important prerequisite for all the research.In order to better support the effective conduction of this type of research,in this article we propose a data acquisition platform for software big data towardsGitHub community.The main content of the article includes the following:First,in terms of raw data collection,this article has proposed an easily scalable and efficient data acquisition system.According to the business logic of the system,this article divides the entire system into two modules:task generation and task execution.The two modules are connected and interacted through task queues and data storage.By this decoupled approach,this article parallelizes the relatively time-consuming and resource-intensive task execution module and improve the ability of the system to scale in real time.This makes it easier to meet the user's demand for high-speed data acquisition.Second,in terms of structured data extraction,this article proposed a multi-source oriented data extraction system.This article proposes a template-based extraction strategy towards the wide variety of data types in the open source community.This article first separates the extraction logic and data format,and then design an extraction template for each data type.This strategy enables the extraction module to invoke different extraction templates for parsing and extraction according to different types of data.This approach improves the usability and flexibility of the extraction code and can better accommodate the acquisition needs of multiple data types.Third,in terms of data visualization,this article designed an intuitive and interactive data visualization system.This article visually demonstrated the data flow in the system and the state of each module of the acquisition system.This sub-system can enhance the controllability of the system and facilitating the interaction between the user and the system.Through this system,the user can intuitively obtain the data processed by the system and have a clearer understanding of the state of the system.In addition,users can easily operate each module of the system to facilitate real-time control of the system.

Keywords/Search Tags:

open source community, data acquisition, data extraction

PDF Full Text Request

Related items

1	Research On Key Technologies Of Web Data Extraction And Mining On Open Source Community
2	Research And Design On Open Source Community Data Mining Key Technologies
3	Research On Technologies Of Web Data Extraction On Open Source Community
4	A Comparative Study Of Sustainability Of SOURCEFORGE.net Open Source Community Projects
5	An Approach Of Automatic Fork Summary Generation In Open Source Community Based On Feature Extraction
6	Measuring The Contribution Of Developers In Open Source Software Community
7	The Key Technical Research Of Open Source Community Oriented Developer Discovery And Guidance
8	Research On Software Recommendation Method Based On Open Source Community And User Behavior
9	Research And Implementation Of Open Source Community Architecture Based On SCA
10	Design And Implementation Of Open Source Community Content Recommendation System Based On User Information Fusion