Hans Gilde’s weblog

On EPLs

Posted in eventprocessing by Hans on December 19, 2007

All snarkiness aside, now. On the discussion of EPL capabilities.

I start with a question: If programs like neural nets and Bayesian classifiers will make CEP truly useful to the vast majority of applications and thus cause an order of magnitude increase in revenue for the vendors… why do vendors of these types of inference engines not dominate the EP market? There are several well established implementations of these kinds of inference technologies, and if there are a few hundred million dollars to be made using these engine for EP, why don’t those vendors take advantage? After all, it would seem that the only thing standing between an inference engine and an EP engine is the connection to the source of events… a straight forward, if time consuming, bit of code to write.

Similarly, if customers were willing to throw wads of cash at some particular kind of statistical inference engine, why would a company like Tibco or Progress not simply include it as an add-on?

The answer: EP engines today are not popular because they provide revolutionary ways to detect patterns or make statistical inferences, but because they make straight forward tasks even more straight forward. They offer cost savings by making straight forward business rules more accessible to the less-technical user, or by removing the need to write and maintain certain kinds of code.

Unfortunately, once you start down the road of probability, tasks are no longer simple. No amount of GUI tooling or simple syntax will make statistical methods accessible to the untrained. An untrained user will not build a better security application using a neural network than using C++. The untrained user may fool themselves into thinking that they have a better security solution, but since they don’t understand the principals on which their system operates, they can’t really understand how effective it is.

When statistics come into play, the best that an untrained user can hope for is a piece of software that has been customized by experts to solve their problem. For example, a Bayesian spam blocker. This kind of product is only useful to a majority of system administrators because it has already been set up to solve the problem of spam detection. If you just provided the administrator with an un-configured Bayesian classifier and inference engine, it would be useless.

The barrier to using an inference engine is not the availability of that engine, but the training and research needed to apply the engine to a particular problem. Customers are not clamoring for these inference engines because, in large part, they are not willing to put the money into project that are so likely to fail. They prefer to wait for a pre-configured network security or fraud detection system than to put the extensive time and research into building one. Of course, there are plenty of R&D departments working on these problems, but they represent the minority of customers, not the majority. The majority of customers want something that already works, not to take on a big R&D project with a pretty good chance of failure. The chance of failure comes from the fact that, once you start in on statistics based research, you are never guaranteed of finding that revolutionary pattern or correlation in data that you were hoping for.

David Luckham’s book talks about a language that can express any kind of set relation and so, can be used to solve a very wide range of problems. Claudi has a similar vision. No one can argue that being able to do more with a language is probably better, as long as it doesn’t come at the cost of having simple tasks remain simple. In this sense, it is clear why Claudi says that CEP != ESP. He sees that modern EP products haven’t reached their peak of usefulness and he wants a name for software that he feels will be more useful. This is totally understandable.

Regardless of whether names like CEP or ESP are appropriate, the fact is that it’s possible to imagine improvements to modern EP technology. But does this mean that the next great revolution in CEP is just around the corner? Of course, it may be, but who can say for sure where it will come from? At the moment, grid computing seems to be gaining steam, and there are very many grid applications that do very simple work but very quickly or in a way that most efficiently distributes workload. This could be a revolution in event processing, but still not provide any more accessible statistical inference or the ability to use any more broad set relations.

Some of the problems that are now being considered in the EP area have been worked over for a very long time. For example, if you have a piece of functionality that is known to require exponential or greater time to complete or is known to require maintaining a huge set of data… well sure it would be nice to have this feature, but there is a very real limit to its usefulness. Not to say that research isn’t worthwhile, but it’s important to point out that the research may have to overcome some well known and very tough problems before it provides a revolution.

At the same time, a product that makes simple tasks even more simple and allows those simple tasks to be built up into complex solutions… this product may, in the end turn out to provide the best solutions to complex problems. Take C++ or any programming language. You start with very simple ideas and the language provides some features that make your life a little easier… and look at what can be built with it. This is why I root not only for revolutionary research, but to incrementally improve existing languages. We don’t know where the next revolution will come from, but we do know that we have useful tools today. And incrementally making those useful tools more useful very well could be what provides the next big thing in EP, just like it did in other programming languages. Who knows?

And if you don’t know which method will prove out to be most useful, how do you know which one to call complex? After all, you wouldn’t call C++ complex (well, in the sense of providing statistical inference or set relations) but it is used to solve some of the most complex problems in the world today.

More on Magical CEP

Posted in eventprocessing by Hans on December 18, 2007

I missed one point in my post about the future-revolution that will be CEP. Here, I illustrate another lesson that I have learned recently:

You can spot someone with really good event processing experience by how much they support the vision of future-CEP.

After all, future-CEP will be that magic piece that will tie mathematical models to your events. Will you need to develop these models rigorously or even understand the math? No way! So what if you don’t have a model that works for your problem? Future-CEP will come preconfigured with a model that works for you! And you can bet that the stuff that comes with future-CEP will be way better than those crappy old rules that you use now. Guaranteed. Are your current rules the result of 20 years of research in your industry? So what?!? Future-CEP is even better! Haven’t you ever heard of statistics? Duh… if you know anything, you know that statistics is way better than whatever you have now. After all, statistics is brand new – it just came out last year. Honestly, I can’t believe you’re not using it yet. Seriously, you’d better get on the ball with this statistics thing before you get left behind.

If you have ever really worked on a truly huge event processing project, and I mean one of galactic proportions, you know one thing: You can absolutely count on miracles from new technology. Think about it: when was the last time you planned a strategy around a miraculous new technology, which then failed to live up to your expectations? Never, that’s when!

So, to summarize: Anyone with a shred of decent event processing experience is building a strategy based on miracles. If you see anyone trying to match current EP product features to current project requirements: GOTCHA!! You know they have no valid experience. Because who in their right mind would not base their strategy on expecting a miracle?

The Bottom Line

Yes, advanced problems are being solved with advanced techniques. And if someone has taken the time to research and design a technique that works very well for their advanced problem, then hooking that software up to their event flow is the least of the challenges that they have faced.

There are already neural network systems that can hook up to a message bus. There are already statistical packages that can be used with modern EP engines. None of that will do you any good if you don’t know how to use it to solve your problem. Imagining that CEP will help you implement an advanced detection algorithm is like waiting for a miracle. The key is having the people that know how to work with advanced detection algorithms, not in investing in some magical software.

Arguing that industry can benefit from advanced research on pattern recognition and applied statistics is like arguing that the world needs better medicines. Right, we all knew that. NO ONE IS ARGUING THIS POINT. The discussion has moved past that.

CEP is Magic

Posted in eventprocessing by Hans on December 17, 2007

This post is meant in jest. It is a part of this discussion on the CEP forum about different types of Event Processing.

Have you heard about this mathematics thing?

Did you know that mathematics can be used to detect patterns in events? I nearly fell out of my chair when I read this. Imagine, using something like mathematics for something like pattern detection! This never occurred to me, but now that I think about it, it makes total sense! How could I have failed to notice this? Why, when this idea gets out, it could spawn whole fields of research.

CEP is Magic

As if this idea of mathematics weren’t enough, I have learned an even more important lesson today: True CEP is magic. Anything less than magic is not CEP, because it is simply not complex enough. The world has never seen a CEP engine, but when we do… watch out!! Oh boy, will it be great!!!!!

You may be thinking that this sounds too good to be true. If so, you need to wake up and realize that only through CEP can we achieve the following scenario:

EVENTS + MATH = MIRACLE

It seems so simple, but thousands of years of mathematics research has failed to find this. You see, the piece that everyone has been missing is CEP! It is the magic piece that will make the whole puzzle complete. Do not doubt the power of future-CEP!

If you want to be a visionary, you must help prepare the world for the mighty revolution that will be CEP, in the future. We must all be ready for a future of magical CEP software that will revolutionize our businesses and maybe even the very fabric of our lives. Imagine a world where even the most complex relationships among events can be summarized with one simple term: MATH. With CEP software, we will be able to take even the most complex set of EVENTS and feed it through the MATH processor, and out will come… a MIRACLE. Now you are beginning to see the power that will be CEP: EVENTS + MATH = MIRACLE !!!!

With the advent of CEP, math will no longer be some esoteric and complicated jumble of ideas. Instead, it will be as simple as a potato chip. CEP will turn events into miracles just like potatoes are turned into chips!!! Do you need a miracle? Why then just get some events and some CEP and just click the Math Button. Oh man, I can’t wait!!!!

I’m so happy to know that in the future, CEP will hand me the miracle that I so desperately need. I’ve already started hoarding events. After all, the more events I have, the more miracles future-CEP will produce for me. I am also holding my breath. Because I just know that it won’t be long until CEP produces my first miracle.

The Bottom Line

Many of the positions that are being argued on the CEP forum today come not from a lack of perspective or vision but from experience. There is no unifying mathematical theory that works for every pattern detection problem. So the important thing is not how many mathematical models come with an EP engine, but how well the engine lets us implement models (mathematical or otherwise). If an engine truly provides the features that are needed to implement a broad range of models, then the implementation of those models will follow. It doesn’t work the other way around.

It is fun to envision a future where any organization can react to events in real time and in a highly effective manner using complex mathematical models. But the most complex model is built from simple pieces. Many, many pieces, each of which can be surprisingly simple. It is the simple pieces that form the foundation. Without a foundation of effective simple pieces, no complex model will work.

Arguing that the future of CEP lies in mathematical models is missing the point. Everyone knows about mathematical models, there are whole schools devoted to them. The question is how well the simple pieces can be used to solve simple problems and how well those solutions can be assembled into layers of complexity. Problems in EP range from from trivially simple to toweringly complex. The true test of a generic EP product will be how it handles this range of complexity.

inversion of large matrices and streaming SQL

Posted in eventprocessing by Hans on December 10, 2007

I’m working on a few probability topics that involve inverting large matrices. Some of this work might be interesting to implement in streaming SQL or another EPL, since it would be useful to certain problems involving streaming data or windows of data. I’m wondering if there are examples in streaming SQL that involve matrix inversion. Particularly, it would be great to find examples that involve decomposing matrices so that the inversion can be done on many simultaneous threads. I think that this is a very tractable problem in streaming SQL, so I’m hoping that someone out there has already looked into it.

P.S. This is academic work, so my results will, at minimum, be published informally on this blog. Also, examples that even involve inverting small matrices would be very welcome.

Advice on monitoring trade flow with an EP engine

Posted in eventprocessing by Hans on December 2, 2007

I am inspired by Marc Adler‘s recent posts on EP evaluations. I very much agree that the community at large benefits greatly from blog posts, as long as they don’t give away competitive advantage. I did a project exactly like his and I thought I’d offer some general advice. I don’t work for that company anymore and it’s been long enough since the project was in production that hopefully I won’t rekindle any old resentment at my blogging habits. This is a very hastily written post and won’t make sense to people that don’t work for a bank, sorry.

So let’s say you want to monitor FIX traffic in the firm for whatever reason. Marc wants to detect abnormal activity in a sector, but I’m sure that they hope to expand the project over time to encompass lots of kinds of monitoring.

Here is some advice, in no particular order:

FIX messages are not always all so standard. There can be little differences in the use of the protocol (various fields used to specify trading parameters, different ways to specify the instrument to be traded) between different customers, internal systems and exchanges. If there’s not already something in place that will normalize messages, code in the EP engine will have to do it. This can lead to incorrectly low effort estimates if one determines the project complexity by coding a few simple detection rules based on only a subset of data (maybe from one, big customer or one exchange). Also, make sure that when in the EP product evaluation phase, you think about more than the most simple use cases. I don’t mean thinking about detecting advanced scenarios, I mean ensuring that the use cases apropriately represent the complexity of the FIX traffic. Annoying little edge cases in the use of FIX can complicate code a lot.

Don’t forget about list orders. This may be a small thing and may also be handled by the FIX adapter in the EP engine. But list orders from basket trades are on the rise and they might not be convenient to represent in tuple form. Thus, one list order may have to be broken down into its individual orders before going into the engine, and a list order can easily contain 100 or more orders.

For lookup tables that are stored internal to the EP engine, make sure that it’s easy to reload them on the fly and that there’s a good procedure in place to get the data when it changes at 9:20 EST. Last minute securities-master changes are common in all regions of the world, and not just for botched corporate actions. Also, if the firm keeps a good securities master database (which they probably do), then load that in directly. Try not to apply corporate actions to your lookup tables, just clear them out and reload them. Obviously if you have positions or historical pricing data, then this may need to be updated for corporate actions.

Watch out for WAN network hiccups. Marc’s project may only be sensitive to the data rate in order to keep the data current. But if you start matching things like customer orders/fills to exchange orders/fills, then delays in the network can cause matches to time out and result in lots of alerts. I wrote a post on risks of a project like this.

Limit the number of alerts that can be sent, hopefully per alert type, per detected situation. Lots of things can go wrong (from code bugs to misinterpreted business rules) that might cause the application to spew out alert spam. Try and design in a way to catch this.

Before the project really gets going, make sure you understand how all the developers will work as a team with the EP product. Will they all have to do their dev testing on a central dev instance of the EP product? If so, how will this work? Also with version control – SQL languages often allow you to break up the application into as many or as few text files as you want. Make sure it’s broken up in such a way that (a) developers will be easily able to trace the logic of related streams and (b) everyone won’t always need the same 5 files checked out.

Think about unit testing. Not just think about it, try out the unit testing procedure early in the project and maybe include it as part of the evaluation. Unit testing with EP languages can be tricky. For example, some of these products rely on a constant stream of data to keep things moving through the engine (they rely on tuples to trigger timeouts in windows, joins, etc). A unit test might not have this constant stream, so figure out in advance how it will work.

That’s all for now, hopefully this is somewhat useful to someone.

Follow

Get every new post delivered to your Inbox.