Making progress on streaming SQL languages
I was happy to see this post by StreamBase on a feature of the latest version of their streaming SQL language. I hope it begins a move back to interesting technical blogging in the CEP space. Coral8 (prior to being bought) had also done some technical blogging. And while Aleri has an interesting blog, they could (IMO) make it better by adding more technical content about their product and use cases. Edit: Apama is also doing some technical blogging that I missed because they changed their feed to Feedburner late last year.
The biggest complaint about streaming SQL languages is that, while the they simplify many tasks in processing network and streaming data, they make certain other tasks mind-numbingly difficult. For example, maybe I would like to build an arbitrary-length list of numbers in one component and pass in along streams to other components. This task would be dead simple in many (most?) programming languages, but is nearly impossible with most streaming SQL products.
Part of the problem comes from the database roots of streaming SQL languages. Many products that implement streaming SQL languages use database-like structures under the covers, and those structures do not seem to like arbitrary-size collections being passed around streams.
I am still enthused about Esper which, other than being an impressive example of what one motivated programmer can do, mitigates many of the more annoying problems of streaming SQL. Esper is not bound by database-like data structures: its streaming SQL language interacts with streams of POJOs. The result is much more flexible than other streaming SQL implementations. Of course Esper pays a price for this flexibility, including the potential for garbage collector pauses.
So I am also enthused to see that database-rooted streaming SQL vendors are taking steps to make their languages more programmer-friendly. And blogging about it, no less. It think that the last time there was a blog post about such topics was about Aleri’s SPLASH last year.
I do not know where SQLstream fits into this language usability issue, but I will be interested to find out.

I am chief architect of SQLstream and I do a bit of technical blogging at http://julianhyde.blogspot.com about what SQLstream is capable of. Not enough, I admit – sometimes it’s a choice between blogging and ‘real work’.
I want to improve the usability of streaming SQL languages, but I think that if we stray too far from relational semantics we will end up with something less declarative, more proprietary (and therefore more difficult to understand by the many folks who have a SQL background and would like to process data in flight), and less maintainable.
I actually read that particular Streambase post with some horror. The problems solved by that post are already solved, much better, by standard SQL and implemented in a few database systems. Streambase have introduced concepts similar to standard SQL concepts but have given them different, and misleading, names. Where they use CREATE SCHEMA, the rest of the world would use CREATE TYPE (standard SQL has a SCHEMA but it means something completely different). What they call TUPLE, standard SQL calls ROW. Wildcard attributes might be a quick win to deploy a project quickly, but you will end up with a project that is brittle: you can’t even add a column without the risk that it will be captured by a wildcard rule somewhere in your application.
I’m not one of those relational bigots who believe that we should remain faithful to every word E.F. Codd wrote in 1970. I believe that SQL systems have been effective because they have a small number of basic operations that can be combined in powerful ways, they allow structures and operations to be specified declaratively so that the system can optimize, because there are standards to allow SQL systems to interoperate, and because there are a lot of IT professionals who understand SQL deeply.
Those principles are as important, if not more so, for problems of streaming data. We may need to add more or two new operators, but the basic operations are applicable to streams and can achieve a lot of power. The SQL standard has some newer elements, such as moving totals, nested relations, XML support, user-defined transforms, and SQL/MED that are perfect for streaming systems but I have not seen any other streaming SQL vendors exploiting them. At SQLstream we have started with these fundamentals, then added a few key extensions for streaming data.
I can’t describe it all here, but if you are interested in finding out more about what SQLstream is capable of, I will be happy to fill you in.
Julian Hyde
Hans~
Foremost, thank you for the interest. It is always better knowing that someone is reading your work
My only small correction is that the feature I wrote about was added in StreamBase 6.0, the latest version is 6.3. Actually, 6.3 introduces a list data type, so expect a post about that shortly.
Thanks,
Matt
Hans~
Oh, I forgot to mention earlier 6.3 (which adds a list data type) has an `aggregatelist` function to transform the contents of an aggregate window into a list. It operates in amortized linear time (worst case), but has the smarts to deal with the common case of repeatedly issuing lists of the same size. A small example is:
CREATE INPUT STREAM in (x double, y double);
CREATE OUTPUT STREAM out;
SELECT
aggregatelist(in) AS points
FROM in[SIZE 5 TUPLES] INTO out;
Sorry for the comment spam,
Matt
Oh, FINALLY you guys got to this feature. I am just a little worried that this feature will somehow not work with some other good features; this kind of thing has happened in the past. That would be bad.
Maybe I don’t know enough about marketing, but I think that this kind of discussion is just what’s needed to bring more interest and understanding of a product like StreamBase.
Hans~
I don’t do marketing. I just try to discuss things that I find interesting.
You are right to point out that sometimes new features aren’t entirely integrated. Saldy, that is the nature of projects and deadlines. We try to fix such shortcomings in follow on releases.
As for lists, we took a great deal of effort in bring them to trunk early in the development effort so that we could ensure things like feedsim, the debugger, and sbtest all work with them. Hopefully, we didn’t miss anything
Matt
[...] Sorry Julian, didn’t see it until today. It was in response to my previous post about making progress on streaming SQL. I am reprinting almost his whole comment: I am chief architect of SQLstream and I do a bit of [...]
[...] crítica a los sistemas de SQL Streaming, Hans Gilde dice en su blog que por debajo aún están demasiado pegados a las bases de datos relacionales como sistema de almacenamiento y cita Esper como un sistema de tratamiento de streams de POJOs no ligados a ningún SGBDR. [...]