Here’s a response to this post from Marc, who wonders how streaming SQL vendors like Coral8, StreamBase or Aleri could work with SPlus/R/Matlab.

Marc wonders whether one of those languages would be a good candidate for a streaming language. I looked at implementing streaming extensions to R and I did not see a way to truly (without klugery) integrate streaming into the language. Doesn’t mean there’s not a way, just that it’s not as straight forward as it might seem.

Take the function aggregateSeries in S+, which has analogs in R and Matlab. This function works as follows: The return value is a vector/array, which is the default data structure in these languages (a scalar variable is an array of size 1). So we pass in an array and aggregateSeries takes a sub-array corresponding to the first window and applies the function FUN to that array. The function FUN must return a scalar value and that scalar value is stuffed into the first entry in the return value. It proceeds like this to calculate the value of the function FUN applied over subsequent windows (sub-arrays), putting the resulting scalars into subsequent values of the return value. So the return value is an array of values that were returned by FUN when applied to each window in turn.

This is not really something that works in a streaming world. Notice that the input to aggregateSeries is already an array. There is no declaration to take an infinite stream and break it into windows. That concept of an infinite stream, to which one can subscribe, is the part that requires the kluging.

So here is the kluge: you have some external process/thread pushing data into the engine. It pushes in a data structure and calls an S or Matlab function on that data structure. If that data structure is a single value, like a tick, then you process single events. And if that data structure is a sliding window… you process sliding windows.

This brings me to how simple it should be to integrate S+/R/Matlab with streaming SQL. I suggested this about two years ago: Streaming SQL has the concept of a window. So I just want to be able to push the contents of the window into S+/R/Matlab via the above kluge, every time the contents changes. This should be dead simple because (1) the window already exists and (2) the method to push in the data and call a function already exists.

Of course, there would also be a library provided to push a value back to streaming SQL (as in, to insert a value into a stream). Again, this is very easy.

I don’t know much about Coral8, but this is probably not too hard to implement yourself as a plug-in to the engine. If you’re using R, you can use JRI to interface with the R engine (to push in a data structure and call a function). The other languages come with their own interfaces.

However, at this point I am at a total loss to understand why this stuff doesn’t come build in by the streaming SQL vendors.

One Response to “Streaming SQL and SPlus/R/Matlab”


  1. [...] Hans Gilde commented on improvements to R and responded to Marc’s comment on S+ Queries [...]


Leave a Reply