The approach of big data era has brought significant challenge for real-time processing. Under this background, the D-Stream stream processing system provided a general, reliable, efficient and scalable distributed computing framework for the applications which based on real-time processing of massive data, as the task engine of D-Ocean which is an unstructured data management system.The implementation of D-Stream system comes from a design of common stream processing framework, taking in several advanced ideas of open source stream processing platform, such as S4and Storm. Its functional structure mainly includes three parts:the first is a simple and open task model, which could be used to build flexible task topology according to requirements. The second is a reliable and stable stream processing engine, which guarantees rapid and transparent data transmission among processing elements. The last part is a high available and scalable scheduling framework, making full use of the whole cluster by scheduling computing resources efficiently.Around the three parts, we represent the D-Stream modeling method for real application. As for the implementation of the D-Stream system, this article introduces the component architecture and symmetric scheduling framework, focusing on the relevant algorithms and design patterns used in the implementation of the stream processing engine. At last, we make a comprehensive assessment for the D-Stream stream processing system and use the case of D-Ocean CBIR application to verify the high utility of D-Stream system in real-time processing applications. |