On the term CEP

October 30, 2008

Paul Vincent of TIBCO BusinessEvents (BE) had some good comments on my last couple of posts (here and here) about the term Complex Event Processing (CEP).

Paul writes

Probably what you mean to say is that the technologies wrapped into CEP are in themselves nothing new – rules, queries, distributed agents, data grids, dashboards, business model layers, etc. But their combination with the continuous event processing idea is what makes them “interesting” and “useful”. But indeed it is “just another iteration” in IT technology.

I’ll agree with this statement, but it’s not the direction I was going. In fact, I think that anyone looking at a CEP product for the first time will find features that look more like a leap than an iteration of something that they have seen before. Surely all of these features are based on a technology or technique that someone, somewhere has seen before, but that is true of pretty much everything.

I was saying that the term CEP can’t stand alone, without meaning the use of one of these products.

Contrast the term CEP with the term SOA. You can look at an existing enterprise architecture and describe how it is different from an SOA. You will find weaknesses (of this existing architecture, compared to an SOA) that can be addressed through SOA concepts, and you can find strengths that SOA concepts have trouble with. Can you look at something and tell me how it is different from CEP? Other, of course, than showing how a CEP product could be used. No, I don’t think you can. If I ask how what I have now is different from CEP, you will show me how I can use your product to build something cool. You will not be able to point me to some fundamental properties that differentiate what I have now from CEP.

In fact, you use the term CEP because you’re hoping that I will recognize it from my IT reading. Not because your product is built to support a pre-existing idea called CEP. And I’m not faulting you for this.

Over time, I expect the EP-TS to produce some very interesting work that will lead to stand-alone theories, architectures and techniques. But the term CEP does not currently describe any of those things.

Paul also asks

“If you have done some research on CEP, you will notice that descriptions follow one of two patterns: They are vague and leave most of the substance to your imagination. Or they are copied from some other field, and provide no information on what makes CEP new.”

Sounds like a challenge to vendor marketing departments everywhere! Any particular examples?

Sure, no problem.

Example one: the TIBCO CEP page. To quote:

CEP gives businesses insight into which events will have the greatest operational impact so they can focus their resources to seize opportunities and mitigate risks.

Oh yeah? Name one feature of CEP that does this. I don’t mean a feature of BE, I mean a feature of CEP itself that justifies the above statement.

Here is how I would rewrite this sentence:

TIBCO BusinessEvents dramatically improves the process of creating systems that give businesses insight into which events will have the greatest operational impact, so they can focus their resources to seize opportunities and mitigate risks

Maybe not as sexy, but at least it doesn’t set my teeth on edge.

Example two: the description of CEP on The CEP Blog.

Complex event processing (CEP) is an emerging network technology that creates actionable, situational knowledge from distributed message-based systems, databases and applications in real time or near real time.

Again, name one feature of CEP that does this. Not a feature of a CEP product, but a feature of CEP itself.

These posts are titled “What is Complex Event Processing?” but all the content rehashes the JDL model for multisensor data fusion and shows how it can be used in a business context. Surely a good idea and a very informative set of articles. But remind me again how this is a description of CEP. If all reference to CEP were removed from these articles, would anyone read them and think “this is describing CEP”? No way. They are good articles, but they are not a description of anything that you could identify independantly as CEP.

Here’s a followup on my last post about how Complex Event Processing  (CEP) is like a brand for some new-ish software products and not a  technology, an architecture, a technique or a strategy.

If you have done some research on CEP, you will notice that descriptions follow one of two patterns:

  • They are vague and leave most of the substance to your imagination.
  • Or they are copied from some other field, and provide no information on what makes CEP new.

So what is new? The software products are new.

The folks selling these products need some rallying point in order to gain traction. They need some phrase to invoke that makes people take notice. And they have chosen Complex Event Processing.

There is plenty to be excited about here. Because the CEP products are innovative and useful. They recognize that databases, rules engines and application servers on the market today, while good for many tasks, have weaknesses. CEP products help process structured data quickly. They take some of the burden that has traditionally fallen on guru programmers – the software framework that empowers threaded, distributed and high performance processing. Some CEP products even make an effort to allow the business user to have input on logic might otherwise be buried in a pile of high performance code. They are utilities, like a grid or a cache, that help us write more business logic and less plumbing.

But what can CEP do? Nothing. CEP is not a useable thing. It’s the name that these software vendors use to differentiate themselves from more established products.

On JBOCE

October 29, 2008

I like the term Just a Bunch Of Complex Events. I would go a step farther and say Just a Bunch Of Events.

Complex Event Processing is exactly a group of software vendors that call themselves CEP vendors. There is very little usable about CEP other than these software products. And you only know that those products are “CEP” because that term is strewn about their web site and press releases.

There is a book about CEP, and that book introduced some fun and interesting ideas. But I defy you to use the book in practice.

An example of an incorrect use of the term: “CEP is a technique” Oh yeah? Name one element of that technique. Or rather, name one element that isn’t copied verbatim from some literature that never mentions the term CEP. Yeah, didn’t think so. Try again. CEP isn’t a technique, it’s a bunch of products.

So it’s true, EDA is more than CEP. But honestly guys, you don’t have to worry about customers getting confused. Because if someone can’t tell the difference between an EDA and a software product that “does CEP”, then whatever purchase they make will be the wrong one anyway.

The software products that constitute “CEP” are interesting because they fill important and growing needs. They demonstrate weaknesses in older (and by older, I don’t mean “antiquated”, I mean “have been around longer”) products, which need to be addressed by the market. And they are beginning to present a vision of the future that, while not yet complete, is intriguing. That’s all folks, there’s nothing more to see.

Event A followed by event B

October 14, 2008

The title of this post was inspired by a recent post by Opher on temporal semantics of event processing. If you’ve not been following Opher’s posts on this topic, then I suggest that you fully read the linked post before you continue on with my post.

In his post, Opher presents a scenario involving a temporal sequence, in other words “event A followed by event B”. His scenario includes two of these sequences and he illustrates how the use of a temporal sequence might not be as straight forward as it seems.

I think that the temporal sequence is even more complicated than Opher’s illustration shows. In fact, I think that using a temporal sequence will, for many event processing applications, be a time bomb.

The temporal sequence pattern is very tempting, but is inherently fragile. And there are some simple alternatives that can do better. These alternatives are appropriate not only for a scenario where we are tracking individual events but also where we are modeling and inferring about masses of events using a probabilistic approach.

Let’s think about events A and B, where we know the process by which these events are created. Now we know that events A and B each represent an action or a condition where the action or condition of event A must occur before the action or condition of event B. We want to detect the scenario where event A occurs and event B occurs (or where event A occurs and event B does not, or where event B occurs and event A does not).

Since we “know” some underlying facts about events A and B, we might be tempted to model this scenario as “event A followed by event B.” But here are a few things that we might not know, or which might change as time goes on:

  • We don’t know whether event A will be delivered to us before event B. And even if we do know this now, it might change in the future as applications are moved among servers or as the logic that generates A and B is refactored.
  • Even though the action for A will happen before the action for B, we do not know that the time stamps on events A and B will reflect this. A simple code change can cause event B to acquire a time stamp earlier than that of event A.

In other words, for many applications, even though we know that event A should happen before event B, we can’t count on either the time stamps on these events or on the delivery times to reflect this ordering.

We may know something about the underlying meanings of events A and B, but we don’t know whether the software or the network are cooperating with our understanding. Even worse, we can look into the software and the network and see a pattern that exists now, but we can’t guarantee that this pattern will remain constant over time (let’s say 4 or 5 years, the minimum life of our event processing solution).

Now let’s look at what we do know that will probably remain constant:

  • We may have some way to match an event B to a particular event A. Maybe there is a transaction ID or some other linking identifier included in the events.
  • We know that there is an underlying meaning to these events. Even if we see an event B before an event A, we know that in fact they represent the reverse order.

So a less fragile method of looking at these events:

  • We are looking for events A and B where the events have some way of matching to each other (transaction id or similar). If we don’t have this identifier, then we recognize that the pattern detection will be more fragile but we continue on with subsequent assumptions.
  • The events occurred within a certain time window of each other. In other words, we can be sure that “|timestamp(A) – timestamp(B)| < X” for some X. Where timestamp() is added to the events when they are generated.
  • We assume that there is a bound on the delay for the arrival of events. In other words, the network may be slow or the message bus may be down, but there is a some maximum delay for events. This maximum delay may be a few seconds for some applications and a few days or more for others. We would prefer not to have this bound, but in practice we can’t wait forever.

The above assumptions can be turned into an event detection pattern with very simple semantics. It works just like a temporal sequence (it is a partial ordering relation), but it’s less fragile.  And these assumptions are as appropriate for a probabilistic approach to event detection as they are to a track-and-trace approach.

My alternative pattern is not always appropriate and sometimes the strict temporal sequence is better. But I think that both options should be available from a truly complete event pattern matching product.

Focus is key in CEP

October 12, 2008

This post follows up on the recent posts about Orange and how it relates to Event Processing. Jack talks about features of Orange appearing in EP products. Marc seems to like the idea of Orange and also wonders when real-time OLAP will become a commodity.

I wanted to add that currently some CEP languages are good for doing queries and some are good for coding logic. But, and CEP users will know what I’m saying here, no language is yet a good combination of querying and logic coding. At the moment, each product seems to be hugging their original concept very tightly. With the exception of TIBCO, the vendors with good languages for coding logic refuse to incorporate the great querying features of the SQL concept. And the SQL vendors recognize that they are stronger in querying than in coding logic, but their approaches to adding the logic-coding features have, to date, been less than spectacular. No offense meant there, I know that everyone is working on improvements.

My opinion is that vendors need to have a complete product in one area, not half a product in two areas.

If they expect us to develop a full application with this product, then they need both querying and logic-coding to be excellent. A few prepackaged AI algorithms will not make up for a hard-to-use language or substandard query capabilities.

If they expect this product to focus on real-time OLAP, then they need to go all out on those features and make the product’s integration with .NET and Java top notch. Basic APIs for subscribing to streams are not enough if this is an RT-OLAP product.

I think that a CEP product that is half a so-so EP language and half a so-so real-time OLAP solution is doomed. Vendors should focus on a limited set of strengths until they have more compelling products, rather than branching out before their base product is totally solid.