Event A followed by event B
The title of this post was inspired by a recent post by Opher on temporal semantics of event processing. If you’ve not been following Opher’s posts on this topic, then I suggest that you fully read the linked post before you continue on with my post.
In his post, Opher presents a scenario involving a temporal sequence, in other words “event A followed by event B”. His scenario includes two of these sequences and he illustrates how the use of a temporal sequence might not be as straight forward as it seems.
I think that the temporal sequence is even more complicated than Opher’s illustration shows. In fact, I think that using a temporal sequence will, for many event processing applications, be a time bomb.
The temporal sequence pattern is very tempting, but is inherently fragile. And there are some simple alternatives that can do better. These alternatives are appropriate not only for a scenario where we are tracking individual events but also where we are modeling and inferring about masses of events using a probabilistic approach.
Let’s think about events A and B, where we know the process by which these events are created. Now we know that events A and B each represent an action or a condition where the action or condition of event A must occur before the action or condition of event B. We want to detect the scenario where event A occurs and event B occurs (or where event A occurs and event B does not, or where event B occurs and event A does not).
Since we “know” some underlying facts about events A and B, we might be tempted to model this scenario as “event A followed by event B.” But here are a few things that we might not know, or which might change as time goes on:
- We don’t know whether event A will be delivered to us before event B. And even if we do know this now, it might change in the future as applications are moved among servers or as the logic that generates A and B is refactored.
- Even though the action for A will happen before the action for B, we do not know that the time stamps on events A and B will reflect this. A simple code change can cause event B to acquire a time stamp earlier than that of event A.
In other words, for many applications, even though we know that event A should happen before event B, we can’t count on either the time stamps on these events or on the delivery times to reflect this ordering.
We may know something about the underlying meanings of events A and B, but we don’t know whether the software or the network are cooperating with our understanding. Even worse, we can look into the software and the network and see a pattern that exists now, but we can’t guarantee that this pattern will remain constant over time (let’s say 4 or 5 years, the minimum life of our event processing solution).
Now let’s look at what we do know that will probably remain constant:
- We may have some way to match an event B to a particular event A. Maybe there is a transaction ID or some other linking identifier included in the events.
- We know that there is an underlying meaning to these events. Even if we see an event B before an event A, we know that in fact they represent the reverse order.
So a less fragile method of looking at these events:
- We are looking for events A and B where the events have some way of matching to each other (transaction id or similar). If we don’t have this identifier, then we recognize that the pattern detection will be more fragile but we continue on with subsequent assumptions.
- The events occurred within a certain time window of each other. In other words, we can be sure that “|timestamp(A) – timestamp(B)| < X” for some X. Where timestamp() is added to the events when they are generated.
- We assume that there is a bound on the delay for the arrival of events. In other words, the network may be slow or the message bus may be down, but there is a some maximum delay for events. This maximum delay may be a few seconds for some applications and a few days or more for others. We would prefer not to have this bound, but in practice we can’t wait forever.
The above assumptions can be turned into an event detection pattern with very simple semantics. It works just like a temporal sequence (it is a partial ordering relation), but it’s less fragile. And these assumptions are as appropriate for a probabilistic approach to event detection as they are to a track-and-trace approach.
My alternative pattern is not always appropriate and sometimes the strict temporal sequence is better. But I think that both options should be available from a truly complete event pattern matching product.