EP and predictive analytics
I really like this post by, I think, Paul Haley, discussing a talk on predictive analytics. One good quote from the post that applies to EP (CEP) as to any field that might involve predictive analytics: “Being able to override the predictive analytic model with rules is a critical improvement…” Also his final conclusion is “Predictive analytics may help make good decisions but adapting makes decisions better.”
As Opher frequently mentions, EP, as with any field involving data analysis, is not monolithic. To be sure, there are needs for specialized inference engines. But as Haley’s article highlights, there is also a need for technology to tie those engines together, and to make decisions about which model applies and which kind of inference to use.
My guess about the future of EP would be a core of products that allow for a wide variety of real-time data processing and logic, but do not get into specialized decision making techniques. Rather, they will allow the user to code their specialized logic if desired. And these engines will, in turn, tie into a variety of more specialized decision and analytics products.
Causal inference, causality operators and actionable information
Following on a comment by David on the CEP forum, I think that it’s important to recognize the various ideas that are being discussed here.
We have been discussing the idea of a “causality operator,” by which I mean the kind discussed by David Luckham in The Power of Events (see this post). If you have a set of events and you can form a relation where you know exactly which events “caused” which other events, then you can write Y<X for “Y is caused by X” and will have created a POSET or DAG structure on those events.
But forming a graph of causality is not the same as causal inference and certainly not the same as recognizing a pattern that provides actionable information. Most pattern detection and even causal inference stops well short of forming a causality graph for all data. Many methods explicitly focus on summarizing all the relationships in a sample rather than on the relationship between any two specific observations. But none the less, these methods can produce actionable information. For example, no one would argue that good grades cause a student to be a better driver, but still most automobile insurance in the US gives students a discount for good grades.
To make forming the graph even harder, many causal relationships come down to a statement that two events were probably caused by the same external event, but we can’t really be sure what that external event was. We can’t even find a definition of a causality operator that everyone will agree on. Let’s say that both events X and Y “caused” event Z. We could write this as “Z<X and Z<Y”. But do we mean that Z would absolutely not have happened without both X and Y? Or that Z might have happened given only X, but then Y made Z certain to happen? Or that X and Y are the only two events that we know of, which contributed to Z, but that there are probably other factors?
And then sometimes we do have an identifiable and explicit causality relationship (like the ones that fill The Power of Events).
Given the broad nature of the topic here, I think that it’s easy to see why attempts to summarize the capability of particular pieces of software using single sentence descriptions doesn’t go over well. These applications are trying to solve many different problems at once and most of them do it in ways that defy a simple description.
How the term POSET is used in The Power of Events
I think that some of the confusion that people may feel about the term POSET could be alleviated by discussing how that term is used in David Luckham’s book “The Power of Events”.
In this book, David constructs an operator for causality, as Y is caused by X. It turns out that once you have constructed this operator for any set, the causality operator combined with the set forms a POSET. For example, Y “is caused by” X can be replaced algebraically by Y “is less than or equal to” X, with either resulting in same algebraic structure on the set containing X and Y.
None of this has anything to do with an a priori ordering on the input set. Meaning that it has nothing to do with whether the input set is sorted or even sortable. The POSET is with respect to the causality operator only. If I take a set of integers and construct a causality relation between them, I have created a POSET with respect to causality. Even though I could also consider the integers to be a TOSET with respect to the normal less than operator, that has nothing to do with the newly created causality operator.
So when David talked about working with POSETs, he didn’t mean that you can’t work with ordered sets. He didn’t mean that you’re not allowed to use ordering if it’s available. He meant that in the end, you need to construct causality relation and this relation, combined with your set of events, forms a POSET. So if I use ordering of the events, or time windows or whatever technique I want when I construct my causality operator, well that causality operator still forms a POSET on my events. Of course, if I manage to create my causality operator without using the ordering of the events, well then all the better because I probably have something more general and thus more resistant to problems in the ordering.
This is why it makes no sense to talk about “processing a cloud” versus “processing a stream.” The book is not talking about processing a POSET or a TOSET, it’s talking about forming a POSET by constructing a causality operator. If two techniques each result in a causality operator or a DAG on a set of events, they have both constructed a POSET.
Now IMO, it’s not even necessary to construct a causality operator in order to be useful. If I have a stream of market data or sensor data and I fit a curve to that data, and that curve is the most accurate method that I know for predicting future readings… well I’m better off using that curve to predict than using nothing at all. Even though I may or may not not be able to (or even want to try to) construct a causality operator from that curve fitting algorithm.
For another example, take a spatial clustering problem. Maybe with a bunch of RFID sensors, looking for clusters of spatial data. Let’s say that a toy store detects that lots of people with a particular toy stop in the restroom. Maybe they want to call security, maybe they want to look at the placement that toy display with respect to restrooms. No causality has yet been inferred, but we have still gotten useful information by finding the clustering pattern.
Manufacturing a debate about POSETs
It’s a little funny to watch Tim Bass digging in his heels on a subject that he is clearly unfamiliar with. It’s funny because he’d probably come to an agreement if only he would stop trying to win any and every argument and just listen for a minute. The people he’s debating with aren’t trying to mask some huge deficiency in the state of the art of EP products. The problem is that Tim is confusing the question of being able to process a POSET with the question of being the best solution to every problem.
In a hilarious turn of events, another EP customer has discovered that the stream versus cloud argument does not really hold much water (ha ha) and so now must have been hoodwinked! Just read Tim’s comment on this post. It couldn’t be that there’s a point that Tim is missing – nope, those vendors hyping their products have hoodwinked an otherwise smart and knowledgeable customer.
Tim’s argument always proceeds like this:
- Tim: The current crop of EP products can’t handle POSETs. This means that they are not truly CEP because CEP is defined as such-and-so-involving-POSETs.
- Vendor: Well that’s not quite true, we can process POSETs as follows…
- Tim: Dear Vendor, how dare you claim that your product can solve every class of EP problem? It doesn’t even have a neural network and do backward chaining, so how can you possibly solve every problem?
Tim starts with a bold (and incorrect, as I will demonstrate below) statement about processing POSETs and then twists the argument so that the vendor is claiming to be able to do everything. And of course, Tim can’t lose that argument because no product does everything. So if you’re casually browsing his blog, it reads as though Tim is in some struggle against these vendors who are all claiming to be able to solve every problem in EP.
But you’ll notice that no vendor has ever claimed to be able to do everything. This “claim” is simply invented by Tim to make sure he’s got an argument that he can’t lose. If you don’t believe me, go and try to find a single sentence on any EP vendor’s site, blog or community postings claiming to be the best solution to every EP problem.
The next time you read one of Tim’s posts about the POSET/cloud discussion, look closely and you’ll see that he’s misinterpreting every one of his references to make himself the advocate of a broad approach to processing events, while the other guy wants to reduce CEP to detecting simple patterns. That debate is invented by Tim. It’s a figment of his imagination. Read the material that he references and see how much of it claims to have a solution to every EP problem. Try to find anyone arguing that neural networks or Bayesian classifiers have no place in EP (or CEP). Or that EP (CEP) should be limited to detecting simple patterns over sorted data.
The reason that no vendor agrees with Tim’s “definition” of CEP is that it is arbitrary. He has found a few fancy sounding terms for which he feels that he has a basic understanding and he claims that all these terms define CEP. But the reality is more complicated. A basic Bayesian classifier doesn’t do backward chaining. Uh-oh, so is it CEP or not? And you can certainly use a neural network in a way that produces different results depending on the ordering and the timing of your data. Uh-oh again… now we have a neural network that needs ordering… is this CEP either? And if you process sliding time windows of events through a neural network, well, you get the picture. Of course the same thing goes for rules with fuzzy logic.
The simple fact is that you can’t sum up a product’s capabilities by determining whether it processes POSETs or not. POSETs are not really relevant to the discussion, as I describe here. Incidentally, and not that it makes any difference to this discussion, the fact is that every POSET can absolutely be split into TOSETs. Yup, that’s right. You can split the POSET into subsets where every element is related to every other element via the transitive property of the ordering relation. Each of these subsets is a TOSET and their union is the original POSET (meaning that every element of the original POSET is contained in at least one of these subsets). And so every “cloud” can be decomposed into “streams”. Tim’s claims otherwise are just misinformed.
But just because I’m not in agreement with Tim about using POSET processing to classify EP products, doesn’t mean that I think one particular product is the solution to all classes of EP problem. It doesn’t mean that I only want to think linearly and that I’m only interested in detecting simple patterns using time windows. Tim will probably try to twist my words to make it sounds as if this is what I’m saying, but that couldn’t be farther from the truth.
To get a better picture of the real issues behind classifying EP products, I suggest Opher’s blog. Unfortunately, you will have to do some reading because Opher doesn’t offer simple criteria the way Tim wants to. Instead, he has many posts on the various capabilities that different kinds of products have or could have in the future. But at least he’s not manufacturing a debate or classifying EP products based on arbitrary and logically inconsistent criteria.
And if you want to detect patterns using statistics, I’m afraid that you’ll still have to read a book on Data Mining or some such. Unfortunately, no one yet has found a way to simplify all pattern detection into a few bullet points that fit nicely on a blog page. Locate a pattern detection technique that you like and then see if there’s an EP product that supports this technique. There’s really no way to evaluate the pattern detection capabilities of any product (be it EP or data mining) other than to see if it supports the kinds of detection that you want to do.
In summary, the only hype currently going around about processing POSETs is the hype coming from Tim about how much hype there is. Go and look for yourself and see if you can find any of this supposed hype on any site other than The CEP Blog. Find one, single claim about how any EP product or class of proucts is the best solution to every EP problem and prove me wrong.
EP is real-time data mining?
Inspired again by a post from Marc that explores various topics in “practical EP” or EP as applied to a real situation.
The first thing that jumps to mind is that Marc is using many terms that are often collected together under data mining. For example, he list some ways to analyze events. Compare those techniques to the table of contents of this book, which many consider to be a reasonable survey of data mining techniques.
This brings up several interesting questions that have again been lingering in my mind, waiting to crystallize into words.
First, if a big part of EP value comes from data mining, then how much of that value really requires real-time analysis? In other words, if the goal here is advanced real-time data mining, how much value from an “EP solution” could be derived simply from better static data mining techniques? With the follow up question, is EP a vector for introducing better data mining techniques into an organization?
Clearly, not all of the value from EP could also come from analysis of static data. Real-time analysis allows real-time response, so right there we have an advantage for EP. Also, analysis of static data requires storage of all that data, which may be costly or impractical.
But still, one would think that if we are going to derive so much value from real-time data mining using EP, then we should at least wonder whether the value comes from “real-time” or from “data mining” or if the combination of the two is the killer application.
I have personally run into this question on an EP project at a big bank. A very significant amount of information can be derived by batch processing of logs and databases using data mining techniques. If that data is not going to provide any real-time value, well then why would I want to move that processing into an EP engine? After all, generating this data in real-time adds risk, so it had better also provide additional value.
Second question, will everyone need to hire a statistician to get the maximum value from EP?
I notice that Marc has hired a statistician (or at least someone familiar with various applied math techniques). He says that this is because he’s got no in-house expertise in many kinds of data mining techniques. So will this become the trend? I’m happy to see more jobs in statistics and applied math, but I have a vested interest here.
Is this similar to the time when data mining was, itself, the buzzword… or is this something new, with new opportunities for people trained in applied math? At the very least, we have one new statistician job in the finance industry (but outside the traditional role of financial statistics?) I know many people who will be interested to see if these jobs pop up more frequently.
POSETs and EP: a red herring
Update: for a better explanation of the use of POSET in David Luckham’s book The Power of Events, see this post.
Following up on a post by Marc referencing POSETs and the “event cloud”, I wanted to point something out.
AFAIK, when the term POSET is used in blogs discussing EP (CEP), it is a red herring. It’s of little practical use unless you are a mathematician. If you look at everything written by the POSET-loving portion of the EP community, what they do is discuss a particular data set that could be a TOSET, then make some vague reference to another way of processing that works on POSETS. But they never get as far as to discuss anything specific about this other way of processing, other than the fact that it does not use the total ordering.
They-who-talk-about-POSETs seem most often to be parroting some stuff said a while ago by David Luckham. He used the term POSET to try and differentiate between a system that requires ordering of events and one that doesn’t. You will notice that David hasn’t posted anything about POSETs in a while, and that all the examples of using a POSET from his book simply define another relation (like “caused by”) that has nothing at all to do with ordering (and so, could be used equally on a POSET or TOSET). So in David’s past writing, defining a set of events as a POSET is equivalent to stating that we will choose not to use their order when determining causality. It has nothing to do with using the set as a POSET.
A POSET is just an algebraic term. It is a little less restrictive than a TOSET, so if you can prove something for a POSET rather than a TOSET, then you have made a more general, and thus better, statement. But beyond algebraic proofs, POSETs are not useful and here’s why:
A POSET is a set on which some pairs of elements don’t relate via ordering. Fine, that’s understandable. But every method of detecting patterns requires exactly at least one relation between the elements (events). There are uncountably many relations that don’t exist on every set. But to do anything with your data, you need to find relations that do apply.
So here is the one example of a useful way to use a POSET: If you have a POSET (some people might call this a “cloud”) then some of your events do relate via some ordering. So you can partition your events into sets that are ordered by this ordering (and you could call these “streams” but I think this is misleading). Now you have just found a relation that you might be able to use in detecting patterns. Of course, there are probably several ways to order events.
Other than this, you can’t model a POSET because there is specifically nothing to model. To start modeling, you have to locate the relations that do apply between events, in addition to recognizing the ones that don’t.
Does EP change the nature of data analysis?
I wonder if EP (or CEP, whatever the difference may be) will change the nature of data analysis. If it does, this is a really big deal. But I’m skeptical. I’m not talking about the fact that EP software helps implement detection rules over real-time data. I’m talking about the theories and methods that we use to develop detection rules. Will EP usher in new ways of locating patterns in data and, if so, will those new methods or theories then shape new ideas in data analysis?
This post was, in some way, prompted by a recent post from Jack at Aleri. But the question has been lingering in my mind since Opher posted about the possibility of EP ushering in the widespread adoption of CoDA. I began wondering if EP is something other than the inevitable productizing of best practices and frameworks for real-time processing.
Data analysis is a huge field. We have mathematics (e.g. central limit theorems and much more) and plenty of applied techniques for analyzing different kinds of data. We also have visualization techniques and plenty of research going on there. And the list goes on. So what will EP contribute?
Visualizing data in real-time is useful in many cases, and EP software can be used here to slice and dice the data in real time. Jack points out a good idea in this area, applying a data-dicing UI to real-time data. But I would not exactly call this new. Even thought maybe it would happen in real time, it boils down to breaking data into windows. And windows are useful but not new (although the ease of declaring these windows, as provided by SQL-like EP solutions, is a clear advance). So EP contributes in this area by making these techniques available for real-time data, but has not yet produced a change in how people analyze data.
I have also noticed that the folks over at The CEP Blog apparently want CEP to provide some new (and possibly magical) pattern detection techniques. And I wonder: where are these techniques supposed to come from? As far as I know, they will come from research in mathematics and statistics. Or they will come from research in applying math to particular problem domains (e.g. network security, trading) or to visualization. So the question becomes: what will EP contribute to these established fields? Is EP’s role to contribute original ideas to data analysis, or to provide a convenient way to apply techniques developed by other research, to real-time data?

leave a comment