EP is real-time data mining?
April 14, 2008
Inspired again by a post from Marc that explores various topics in “practical EP” or EP as applied to a real situation.
The first thing that jumps to mind is that Marc is using many terms that are often collected together under data mining. For example, he list some ways to analyze events. Compare those techniques to the table of contents of this book, which many consider to be a reasonable survey of data mining techniques.
This brings up several interesting questions that have again been lingering in my mind, waiting to crystallize into words.
First, if a big part of EP value comes from data mining, then how much of that value really requires real-time analysis? In other words, if the goal here is advanced real-time data mining, how much value from an “EP solution” could be derived simply from better static data mining techniques? With the follow up question, is EP a vector for introducing better data mining techniques into an organization?
Clearly, not all of the value from EP could also come from analysis of static data. Real-time analysis allows real-time response, so right there we have an advantage for EP. Also, analysis of static data requires storage of all that data, which may be costly or impractical.
But still, one would think that if we are going to derive so much value from real-time data mining using EP, then we should at least wonder whether the value comes from “real-time” or from “data mining” or if the combination of the two is the killer application.
I have personally run into this question on an EP project at a big bank. A very significant amount of information can be derived by batch processing of logs and databases using data mining techniques. If that data is not going to provide any real-time value, well then why would I want to move that processing into an EP engine? After all, generating this data in real-time adds risk, so it had better also provide additional value.
Second question, will everyone need to hire a statistician to get the maximum value from EP?
I notice that Marc has hired a statistician (or at least someone familiar with various applied math techniques). He says that this is because he’s got no in-house expertise in many kinds of data mining techniques. So will this become the trend? I’m happy to see more jobs in statistics and applied math, but I have a vested interest here.
Is this similar to the time when data mining was, itself, the buzzword… or is this something new, with new opportunities for people trained in applied math? At the very least, we have one new statistician job in the finance industry (but outside the traditional role of financial statistics?) I know many people who will be interested to see if these jobs pop up more frequently.
April 24, 2008 at 2:19 pm
I usually explain EP as a conventional data warehouse turned upside down — queries coming before data, rather than the other way around.
In any case, it’s fairly clear from the research activity in this area that many of the interesting new algorithms will come from statistical techniques that were originally developed for data mining, especially as regards sampling with provable deltas and epsilons.
June 11, 2008 at 9:22 am
Please consider defining your acronyms the first time you use them in a post – not everyone is conversant with everything out there, especially those who come to your blog from a link on another blog – could not find a definition of EP with Google – eventually pieced together that it seems to mean “event processing”.
Otherwise, this topic is very, very, intersting.
Thank you for your blog.
June 17, 2008 at 12:46 am
Hi Patrick, thanks for the suggestion. I added a bit to a text box on the upper right of each page. Hope that helps.