EP and analytics
June 25, 2008
I got an email this morning asking me what I think of the idea that CEP (complex event processing) must be combined with “analytics” to be useful to most customers. I suspect that this is coming from the very strong opinions expressed by Tim over at The CEP Blog. For example, this post about Tibco acquiring Insightful where he states that “any software company that discusses CEP and does not support or advocate advanced analytics are selling snake oil.” So here I’m posting my answer, in three parts.
First part: about the term CEP
There is an argument that because CEP is often described as being focussed on “detecting previously unknown patterns in events” then it must by definition involve analytics that are designed for such detection. Here I agree that the idea of detecting previously unknown patterns is mostly wishful thinking that is used to attract interest in CEP as a “strategic” technology. I can think of one exception: the case where you simply have too much data to work effectively with a traditional database. On a very large data set, CEP technology could help to summarize data or to test assumptions and screen for patterns on data sets that would be unmanageable with other technology (for example, data sets where you don’t even have to disk space to store a massive incoming data feed).
On a similar note, maybe CEP could lower the barrier to entry for cartain kinds of pattern detection on very large data streams. Storing and using large amounts of data with a database can add significant cost in the form of storage and processing overhead. But one could attach a CEP applitation to a stream of network data and test for patterns without ever even storing the data to disk.
But people should remember that goal of CEP has always been to lower the barrier to entry for real-time applications. When you start to talk about advanced analytics, you have two options:
- You might require users to have a strong background in that kind of analytics. For example, see my post about using EP as real-time data mining. Certainly S+ contains lots of great stuff for data mining and other analytics, but most of that stuff is not for the faint of heart and requires at least the equivalent of a masters degree to use effectively.
- You might design a solution that applies advanced analytics to a particular and very focussed problem, like detecting likely signs of a network intrusion. In this case, you can put the power of analytics in the hands of an less trained user, but they can only use that software for the very focussed problem domain.
Neither of these scenarios meets the goal of reducing the barrier to entry. There are already terms for applying analytics, for example applied math, some of the domains of statistics, or even detection theory or data analysis. If CEP were any of these things, it would not be called CEP – it would already have a name.
Second part: the utility of CEP solutions for advanced problems
I have heard a lot about how existing CEP products can’t be used for advanced problems because of how they process data. This is a very short sighted view. It’s like saying that C++ is not useful for advanced problems because it only comes built in with a few storage types. CEP products are platforms on which you build larger applications. They reduce the barrier to enrty for real-time processing. But in the end, they provide a way for you to build larger applications out of smaller components, and this is exactly what is needed to solve advanced problems. If I have a new model for some kind of advanced pattern detection, I might code that model using a CEP engine, thus saving myself all kinds of time and pain related to dealing with real-time data. When a product comes with certain built in data structures, the whole point is that you can combine them to provide more advanced functionality. That’s how advanced software gets built.
Third part: the market for current CEP products
Some have expressed the opinion that most customers that process real-time data need “advanced analytics.” This then leads to the idea that existing CEP vendors are not providing the tools that most customers need. So is this a statement confirmed by a market study, or the subjective opinion of one person? Matlab has market data connectivity and about as much in the way of analytics as you can buy. So why all the use of CEP in algorithmic trading? How did CEP even get started in this area that already has well developed analytics solutions? The answer is, AFAIK, that most users actually don’t require advanced analytics. Some do. Most don’t. Most businesses need advanced real-time analytics in a few (possibly vital) areas of their business, but to run the rest of the day-to-day operation, they just need to lower the barrier to entry for developing relatively simple real-time applications and then to manage the growth of those simple applications into large rule sets. It’s the same with data processing. Many static reports use advanced analytics, and in many cases those are key to the business strategy. But most reports are made of very simple components, assembled in meaningful ways. I have no market study to back that up, but I think that the evidence is stronger for this view.