Hans Gilde’s weblog

Fair and unfair criticism of an SQL EP approach

Posted in eventprocessing by Hans on November 6, 2007

Recently we have seen various criticisms of the SQL approach to EP. I find it interesting that I agree with some of the general premise of the criticism, yet I find many of the particular arguments to be flawed.

Apama has been a long time critic of the SQL-esque approach and I think that they are highly credible in this area. Apama could absolutely implement an SQL language into their EP solution. They have very significant resources and an extremely flexible EP product which could almost certainly allow SQL and MonitorScript to coexist. So when they say that they don’t like SQL, it’s because they’ve though a great deal about the topic and genuinely feel that implementing an SQL language is not in the best interest of their customers. If a market leader takes a strong position like this, it’s only wise to pay attention.

And yet, this post from inferences about an SQL EP approach based on various issues with SQL databases, without looking deeply enough into the topic to support those inferences. I find the latter category to be misleading because some of these inferences are incorrect and that throws an unnecessary shadow over the whole issue.

For example of a “category 1″ flaw,

This is, of course, not to say that a nested data structure is necessarily inappropriate. But as we can see, it is certainly incorrect to assume that using a nested structure is better than a flat structure without significantly more analysis. This leads to the argument about O/R mappers. The argument about how the existence of O/R mappers proves that an SQL approach is bad, is illogical. If a nested object view of data were so much better than a flat table approach, object databases would be much more popular than they are. The fact is that flat table structures are so common not because of some limitation of database vendors or user’s imagination, but because they are often found to be much more flexible in the long run than a nested structure.

Having now disagreed with Louis’ nested-data-structure argument, I find myself thinking “ok, but if I determine that a nested structure truly would be best, I would like to be able to use it.” This leads me into “category 2″ flaws.

inferences about an SQL EP approach based on problems that are common with an SQL database. After much consideration, I think that while there is the a core of a good point in this idea, the general comparison between SQL databases and an SQL EP approach is being abused. As we have heard recently, not all SQL-like languages prevent nested data structures. As far as I know, nothing in SQL like syntax prevents addressing a nested structure. This leaves it up to each vendor to implement the capability for nested structures (or not). Let’s not get caught into the trap of assuming that just because relational databases don’t allow for nested data, an SQL language approach to EP will always have the same “problem”. We can see the example of Esper, which allows for an SQL-like approach while retaining the ability to use nested objects.

Apama is an extremely professional organization, so I know that they will take this post for what it is: a comment on the discussion of SQL-or-not. I criticise one of their recent posts, but not of their opinion about SQL. They have a much more broad view of the EP market than I do and when they say that they don’t see a need for an SQL-like approach, I imagine that they know what they’re talking about. Indeed, I have seen several things about existing SQL languages that I wish were better. But still, I have not yet seen an argument that condemns an SQL approach to eternal uselessness. At the same time, I have seen several arguments that show why an SQL approach might be appropriate. So I look forward to Apama demonstrating their reasons over time. In the mean time, I continue to believe that the best thing would be for each user to analyze both SQL and SQL-less approaches in the context of their problem.

P.S. I believe that Louis provides an unnecessarily inefficient bit of code in his article. I’m no expert on StreamSQL, but I’m pretty sure that I recall that it’s possible to implement a counter like this as a select statement from the input stream without using a memory table.

Update: Apologies to Louis for the “inefficient bit of code” comment, I didn’t realize that it comes from a vendor. Indeed that bit of code demonstrates an annoying “workaround” in SQL.

About these ads

8 Responses

Subscribe to comments with RSS.

  1. Hans said, on November 7, 2007 at 3:08 pm

    For some reason, people seem to prefer to email me privately than to comment on the blog or even better in the CEP Interest group.

    I got one comment that makes a good point:

    My “category 2″ flaw comment was related only to the potential for an SQL-like language to support nested data structures. Just as we know that skills used to tune an SQL database are less likely to apply to an SQL-like EP product, we should assume that limitations with an SQL database might not translate directly to EP.

    However, it’s completely fair to say that one product does support a nested data structure while another does not. I didn’t mean to imply that anyone might sit around waiting for a particular product of class of products to support the data structure that they want to use.

  2. [...] side to an SQL EP approach Posted by Hans under eventprocessing   Having mentioned in a previous post that a flat data structure can be more flexible to query than a hierarchical one, I also have a [...]

  3. [...] latest contribution to the community, Taking Aim, does a great job responding to a earlier rebuttal to his post on SQL and its suitability as an EPL for CEP.  I agree with Louis and look [...]

  4. Don Cohen said, on November 14, 2007 at 3:19 pm

    There is plenty I don’t like about SQL, but I think that the “flat” relational model is not the problem. I argue that the nested vs flat arguments are the result of confusing what I think of as “specification” and “implementation”. Clearly nested structures can be modeled in flat ones, since computer memory is flat (a table of integer addresses related to contents) and that is used to represent all the nested structures people ever use. Further, the flat relational model has a clear advantage of being “representation neutral” – when you change your mind about how you want to access the data you don’t have to change all your definitions.

    The real complaint, I think, is that in traditional databases, there is not much ability to control the “implementation” – perhaps “representation” would be a better term – of the flat tables used in the specification. Even in normal database usage this can be a problem for performance, but it seems to be a bigger problem in event processing.

    The way to get the best of both worlds is to “connect” the (still flat) specificational level to the idiosyncratic representations that people design in order to achieve their performance goals. The specifications can then still be written in the representation neutral style, and compiled into efficient implementations. (There’s always a limitation on what the compiler can do. In some cases the user is not satisfied and wants to hand code some parts. This is possible, but it amounts to a tradeoff between the local efficiency and global maintenance cost. As long as the compiler is writing an appreciable part of the code you’re better off writing at the higher specificational level than at the lower level.)

    It’s interesting that this comment interface has a space for website, but that seems not to appear in the replies.
    So I’ll end by mentioning that this approach is developed much further at http://ap5.com

  5. Hans said, on November 14, 2007 at 7:23 pm

    Hi Don, so your concept about expressing the relations and having the compiler do as much of the rest as possible sounds great!

    I couldn’t figure out how one would add a relation that uses a sliding time window into AP5. For example, if I wanted to take a stream of numbers and out the mean of every group of 10 (As in get a number, open a window, get the next 9, close the window and output the average. Note the opening of a new window every time a number comes in).

    I think we’ve talked about this before but I forget exactly what we discussed.

  6. [...] devoted several blog posts (here and here) to defending streaming SQL, so you can be sure that I like many of the features that this [...]

  7. [...] choir started singing with Louis which motivated a counter voice the choir with the post, Fair and unfair criticism of an SQL EP approach only to have the same author counter that post with, One down side to an SQL [...]

  8. Taking Aim | Capital Markets Blog said, on July 13, 2012 at 10:04 am

    [...] using SQL for Complex Event Processing. I was expecting some measure of response, as shown by this rebuttal given the somewhat polarizing nature of using the SQL language for CEP applications. For every [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: