To support the rate of growth of data streams and stored data, high performance platforms need fundamental advances to meet application requirements. Such Big Data systems are dominated by the size of data, and the speed at which it must be processed. Additionally, Big Data applications often have Real-Time constraints that need to be met in order to provide applications with timely responses to requests for data – involving anything from simple database lookup and input data stream processing, to more complex data-mining or intelligent data processing.
Often, real-time performance is sought by increasing the overall performance of the platform, hence increasing the raw speed of response to any request. However, such an approach is unlikely to be able to offer real-time guarantees regarding the speed of response, and is often expensive in terms of cost and power, as more and more resources are included in the platform in the hope of it being sufficiently fast to offer the illusion of real-time.
The JUNIPER project intends to construct the platform from real-time technologies, using real-time principles, so that appropriate guarantees can then be given with respect to Big Data processing.
Real-Time Big Data
Fundamentally, Big Data is typically processed by two main components as shown in Figure1. Firstly, a data generator can produce large (eg. Gb/s) streams of data that need to be filtered prior to storage to reduce the volume of data – e.g. Hadron Collider outputs a 300Gb/s stream filtered to 300Mb/s for storage and later processing.
Secondly, end user requests for analytics / mining of the stored data by an application to produce a reply back to the end-user e.g. financial transactions seeking authorisation from a banking database. Generalising the view of big data processing, a number of separate (related) applications can utilise the same stored data.This can be viewed in Figure 2, where a number of related sub-applications make up the application, all using the same data.
Real-time constraints are often placed on Big Data streams and on processing. If filtering of an input data stream to its storage does not occur within given time bounds, data may be lost. Also, if a reply for a given request does not occur within given time bounds, end user applications will be disadvantaged. In this domain, real-time constraints are Quality-of-Service, where different applications are allotted different proportions of available bandwidth to any resource (eg. I/O, processor time, filesystem). We note that current platforms do not provide any real-time guarantees.
The JUNIPER Approach
The JUNIPER project proposes a Real-Time Java based platform, built from real-time technologies. The project vision is to:
The intuition behind the approach is the observation that traditional real-time systems enable real-time guarantees to be afforded to applications executing upon a constrained resource system – even though there are limited resources, sufficient resources (eg. processor, I/O) are allocated to application processes such that real-time constraints can be met. Big Data systems are resource constrained too – only a proportion of available resources can be given to any particular incoming request or data stream; and that there are strict time constraints.
When constructing a system using real-time technologies developers can know that real-time constraints will be met prior to the system running. This is in contrast to conventional system design, where the increase of performance by adding processing power is not fully understood until the system runs. When using real-time technologies, the scalability of a system is better managed, as the effect of adding resources (eg. processing power) can be understood before the system is changed. So the key challenges that need to be met by any platform for real-time Big Data applications include providing sufficient Performance, whilst providing Real-Time Guarantees, and allowing System Scalability.