EP and analytics

June 25, 2008

I got an email this morning asking me what I think of the idea that CEP (complex event processing) must be combined with “analytics” to be useful to most customers. I suspect that this is coming from the very strong opinions expressed by Tim over at The CEP Blog. For example, this post about Tibco acquiring Insightful where he states that “any software company that discusses CEP and does not support or advocate advanced analytics are selling snake oil.” So here I’m posting my answer, in three parts.

First part: about the term CEP

There is an argument that because CEP is often described as being focussed on “detecting previously unknown patterns in events” then it must by definition involve analytics that are designed for such detection. Here I agree that the idea of detecting previously unknown patterns is mostly wishful thinking that is used to attract interest in CEP as a “strategic” technology. I can think of one exception: the case where you simply have too much data to work effectively with a traditional database. On a very large data set, CEP technology could help to summarize data or to test assumptions and screen for patterns on data sets that would be unmanageable with other technology (for example, data sets where you don’t even have to disk space to store a massive incoming data feed).

On a similar note, maybe CEP could lower the barrier to entry for cartain kinds of pattern detection on very large data streams. Storing and using large amounts of data with a database can add significant cost in the form of storage and processing overhead. But one could attach a CEP applitation to a stream of network data and test for patterns without ever even storing the data to disk.

But people should remember that goal of CEP has always been to lower the barrier to entry for real-time applications. When you start to talk about advanced analytics, you have two options:

  • You might require users to have a strong background in that kind of analytics. For example, see my post about using EP as real-time data mining. Certainly S+ contains lots of great stuff for data mining and other analytics, but most of that stuff is not for the faint of heart and requires at least the equivalent of a masters degree to use effectively.
  • You might design a solution that applies advanced analytics to a particular and very focussed problem, like detecting likely signs of a network intrusion. In this case, you can put the power of analytics in the hands of an less trained user, but they can only use that software for the very focussed problem domain.

Neither of these scenarios meets the goal of reducing the barrier to entry. There are already terms for applying analytics, for example applied math, some of the domains of statistics, or even detection theory or data analysis. If CEP were any of these things, it would not be called CEP – it would already have a name.

Second part: the utility of CEP solutions for advanced problems

I have heard a lot about how existing CEP products can’t be used for advanced problems because of how they process data. This is a very short sighted view. It’s like saying that C++ is not useful for advanced problems because it only comes built in with a few storage types. CEP products are platforms on which you build larger applications. They reduce the barrier to enrty for real-time processing. But in the end, they provide a way for you to build larger applications out of smaller components, and this is exactly what is needed to solve advanced problems. If I have a new model for some kind of advanced pattern detection, I might code that model using a CEP engine, thus saving myself all kinds of time and pain related to dealing with real-time data. When a product comes with certain built in data structures, the whole point is that you can combine them to provide more advanced functionality. That’s how advanced software gets built.

Third part: the market for current CEP products

Some have expressed the opinion that most customers that process real-time data need “advanced analytics.” This then leads to the idea that existing CEP vendors are not providing the tools that most customers need. So is this a statement confirmed by a market study, or the subjective opinion of one person? Matlab has market data connectivity and about as much in the way of analytics as you can buy. So why all the use of CEP in algorithmic trading? How did CEP even get started in this area that already has well developed analytics solutions? The answer is, AFAIK, that most users actually don’t require advanced analytics. Some do. Most don’t. Most businesses need advanced real-time analytics in a few (possibly vital) areas of their business, but to run the rest of the day-to-day operation, they just need to lower the barrier to entry for developing relatively simple real-time applications and then to manage the growth of those simple applications into large rule sets. It’s the same with data processing. Many static reports use advanced analytics, and in many cases those are key to the business strategy. But most reports are made of very simple components, assembled in meaningful ways. I have no market study to back that up, but I think that the evidence is stronger for this view.

I was talking yesterday with a colleague about application monitoring, which lead me to think about how an EP engine could help. There are many monitoring products on the market that use rules or even a rules engine to help interpret and respond to events. What about embedding an SQL EPL (event processing language) in there?

An SQL EPL could provide some functionality that is tough using traditioinal rules. This could compliment a rules based approach, to provide more flexible monitoring.

The first feature that comes to mind is event pattern matching. Some monitoring products already support certain patterns, but AFAIK, SQL EPLs support more. All of the major SQL EPLs now support detecting several kinds of patterns in incoming events, a simple example being “event A followed by event B within X seconds”. For reference, see StreamBase SELECT FROM PATTERN, Coral8 SELECT FROM MATCHING, Esper SELECT FROM PATTERN.

This kind of pattern matching could allow for more flexible scripted monitoring. Let’s say I want to send a request to a message bus and expect a response from an application on that bus. I have seen solutions that allow for this, but configuring an advanced scenario can be tough (although I have not done much work in this area, so there are many products that I don’t know). But suppose we have a monitoring component that sends the message to the bus and a component that gets messages and sends them to an SQL EPL engine. Now we can declare patterns like “a send, not followed by a bus error, not followed by a matching response within 30 seconds”. Or a more complicated version involving several patterns:

  • a send followed by a bus error results in an internal bus-down alert
  • a bus-down alert, followed by a send, not followed by a bus error results in a bus-up alert
  • a send, not followed by a bus error, not followed by a response within 30 seconds results in a service-down alert
  • a send, not followed by a bus error, followed by a response within any time-frame results in an internal service-timing event containing the time between the send and the response

Now, add in SQL aggregation (see StreamBase GROUP BY, Coral8 GROUP BY, Esper aggregation in SELECT statements):

  • First a pattern: a bus send, not followed by a bus error, followed by a response within any time-frame results in an internal service-timing event containing the time between the send and the response as well as the service-ID.
  • Now some aggregation: select service-timing events where response-time is greater than 10 seconds, group by service-ID and count the events within the last minute. This would produce one service-delay-count event per minute, per service-ID
  • And then a basic selection rule: select service-delay-count events where the count is greater than 3 and produce an internal service-delayed event.

And then do some simple statistics with aggregation and windows (see StreamBase CREATE WINDOW, Coral8 CREATE WINDOW, Esper CREATE WINDOW):

  • Using aggregation as above, create events containing the mean and variance of service-timing over the past hour, updated every minute.
  • Keep the last one of these events in a window of size one. Now we have an up-to-the-minute record of service timing, per service, for the past hour.
  • Also, use aggregation to produce events containing the per-service timing for the past ten minutes.
  • Join the hourly timing with the ten minute timing (see the JOIN statements from previously mentioned products), and use simple statistical methods to detect the case where the service timing is slowing down to a significant degree.

I think that it’s exciting that all of these scenarios could be detected using an SQL scripting language. The events produced by the detection scripts could be fed into an existing event management solution, a rules engine or whatever.

Some of these techniques are already used with these SQL products to detect stock market patterns, so we are using a known approach to event detection. I’ve seen an SQL system used to detect service issues in an electronic trading scenario, but it was created as a custom project and had to be manually integrated with existing monitoring solutions. I wonder what would happen if an SQL EPL were integrated tightly with a monitoring product, which would make it more friendly to the people who are already involved in monitoring.

This sounds like an exciting prospect for SQL EPLs. At the moment, SQL EPLs are used mostly in custom projects or high volume, low latency applications like electronic trading applications, custom monitoring (see StreamBase’s case studies in monitoring) or RFID. But if this technology could provide value to more general monitoring, and if it could be tightly integrated with a larger monitoring product, then it would move from a niche solution to a central component in the enterprise architecture.

P.S. I would have linked to the equivalent Aleri features in this post, but I could not find linkable online documentation.

P.P.S. SQL EPLs are not the only possibility here. My understanding is that other specialized event processing products like RuleCore support much of this functionality as well.

I think that most frequent readers know that I’m a fan of StreamBase. Having used version 5 very extensively, I am mostly impressed with the direction of the announced features for version 6 (which I have yet to use). My feeling is that the value of an EP product is exactly to increase developer productivity and reduce the skill set for developing soft real-time applications. So it makes sense (to me) to focus a release on improving that value.

I do want to point out one thing that really bugs me: some of the new features are not supported for hierarchical data types. Meaning that if you use those data types, you are SOL in terms of some key developer productivity features. That’s not cool for two reasons: first the same projects that benefit from hierarchical data types have wide tuples and would also benefit from developer productivity improvements and second, you can start out using these data types and put in a lot of work on the code before you find that you can’t unit test it. They should not have released hierarchical data types if they were not going to fully support them.

SB has a couple of smaller developer productivity features (from previous releases) that I really appreciate:

  • Their IDE is Eclipse, so you can edit your StreamSQL code (visually or in text) and your Java code in the same project. Then you can test and deploy the whole thing as one package. I find that this is a great time saver and is also nice when bringing new developers up to speed. Rather than two pages of instructions (or a script maintained by you) for building, assembling and deploying the project, you just open the project in the studio and click the Run button. This is one of many reasons that I love Eclipse.
  • Also being Eclipse, you get proper version control support from Eclipse plug-ins.
  • You can include and organize test data sets in the project as well, which you can open and play into the running application from the IDE. And now I imagine that you can combine this with the visual debugger. I find this to be easier than keeping a directory of data files separate from the project, and of course this also means that test data is integrated with version control. Some people insist that only code should be checked in to version control. For many reasons, I prefer to check in the entire current state of the project.

The addition of unit tests, though, is key and IMO shows a real step up in terms of thinking. I can’t imagine doing a big project these days without a million unit tests and I had to write my own testing framework for SB 5.

So keep up the good work StreamBase and all the vendors who have recently announced new features in their products.

So if you use a piece of software and you believe that, although that software is stable and working as documented, it could do more in a future version… does that make the software immature?

I find some of the arguments about CEP (Complex Event Processing) and maturity to be very strange. For example, it is very true that many people want CEP to make detecting complex causality available to untrained users. And many people also want a nicer car. But does the fact that I want a Bentley make my Mercedes immature? I want my car to fly, too. Is it immature because it can’t fly? I have read that flying cars are real, so should I be upset at my car dealership for selling me a model that is stuck on the ground?

Just because there is more work to be done in CEP, does not make CEP immature. Honestly, I can’t agree with positions like the one taken by Tim at The CEP Blog. In this post about product maturity, he says that no CEP product has ever existed. He says that in the ten years since David Luckham coined the phrase, no one has ever developed a CEP product. Even David Luckham apparantly did not envision a product that meets the “true” criteria of being CEP. Well then vendors are not trying to suppress a technology, they are simply offering to sell you technology that actually exists. Can anyone seriously say that CEP will be immature until it provides features that have not materialized despite 10 years of research? Is modern energy production immature because we should all be waiting for cold fusion?

Of course anyone would agree that providing a better interface to causality detection algorithms is a noble goal. But everyone should remember that any major advancement in this area will have very significant impact on the business world, and the use of that technology in EP will be but a part. Detecting causality is a huge deal and it is just as useful for static or stored data as for real-time events. So just remember that by asking for an easy interface to causality detection, you are not only asking for something that has never existed in the field of EP, but something that has never existed at all. All the advancements in data mining from the 50’s through today have yet to produce something that makes accurately detecting causality in complicated scenarios an easy task.

So yes, more improvements can be made to EP products. And honestly, there are plenty of simple and incremental improvements that can be made to every EP (CEP) product that I have used. For example, more adapters and language improvements. On this matter, I have to agree with Opher, who distinguishes between several kinds of maturity.

There are plenty of improvements to be made in the area of CEP (which I would call EP for Event Processing). But accusing vendors of hyping immature products? Come on. I mean seriously, we are all supposed to wait with baited breath for a class of products that has never existed? And that is what will be the real CEP? Remind me again, who is falling for hype here?

So I was reading some posts about whether CEP (Complex Event Processing) is mature, when Mark Palmer’s excellent post linked to this blog entry on WS&T. And while reading that post, I noticed a link to this article, also on WS&T. The article describes how hard it is to find programmers who can write highly parallel code that would take advantage of new multicore architectures. The first comment on the article calls it the biggest crisis facing computing since… something.

Whether or not “CEP is mature”, let’s look at the software available right now, and I’ll call it EP for Event Processing so as not to even get near the question of what Complex means. You have here an ecosystem of software from StreamBase, Progress Apama, Coral8, Tibco, Aleri and more, much of which has been battle tested for years now. They are mature in the sense that they do what they claim and they don’t crash or hit you with lots of bugs.

And what do they do? They each provide a high level language that helps a programmer write high performance applications without spending much time thinking about (or even knowing about) the performance plumbing. Here you have a bunch of players that are all in the game of bringing down the skill set needed to build high performance applications. Some of them automatically thread and provide a framework that guides the user to write an application that is inherently highly scalable, without ever knowing what it is that makes it scalable.

So if we have an impending crisis brought on by the growing number of cores, we also have a potential solution available today! Several potential solutions, even. And are these solutions mature enough to be deployed in banks? YES! They are already deployed in banks. I’ve seen StreamBase churn through literally all the electronic trading messages in a firm, in real-time, and stay up for full weeks without a hiccup, just as it’s supposed to. All the major vendors have such stories. Their software works. It could be used to make many applications more scalable right now (see my previous post on the subject).

In the future, I hope that the people worried about the impending core crisis go talk to the EP people (on the CEP forum) about how these two ideas can result in not just a problem but a solution. And I’d hope that any organization looking to evaluate CEP understands that if they need to lower latency, then this can be a place to start, before investing in the next generation of hardware. Again, see this previous post on creating a uniform architecture that ensures scalability.

I’ll write about maturity and such in a later post. Sufficed to say that whether or not modern “CEP” products meet everyone’s needs or could be improved…they work well for what they do, and that is to bring down the cost and the skill set required to build scalable and high performance applications. Even if they don’t have the one feature that this or that person believes will make them officially “CEP”, they do have many features that were driven by customer demand, integrated into stable platforms that can drop the TCO and time to market for low latency, scalable software. And that is something, IMO, to pay attention to.