Hans Gilde’s weblog

Event Processing in Action review, part 2

Posted in eventprocessing, financial services, programming by Hans on December 8, 2009

Per my previous post, I’m currently reading a preview copy of Event Processing in Action (EPIA) from Manning. I’ll write a short summary mixed with review, composed of several posts.

Unfortunately, the name of Manning’s “In Action” series was ruined for me by a book from another publisher; I just can’t read In Action without thinking Inaction. I will try to put this prejudice aside.

I only have time to go into Chapter 1 right now. I doubt that I’ll have time for one post per chapter, but this one turned out longer than I thought.

The book starts slow, because the authors are very thorough about capturing the basics. Chapter 1 is mostly an introduction to terminology. It also goes into many examples of Event Processing in use today. Finally, it introduces an example Event Processing application (a flower delivery service) that will be used throughout the book.

As I’m already familiar with event driven systems, some of the pages on terminology were a little boring. At the same time, the level of detail and the real world examples are great for a less experienced reader. This book has an academic flavor, and I’m the type to read through the boring introductory chapter of a textbook before getting started on the subject. That’s because I do learn a little (maybe more than I realize at the time) and it gets me on the same page (so to speak) as the author for the rest of the book.

So from Chapter 1, I see that Event Processing in Action is about events and the processing thereof. These “events” are the same ones that drive an Event Driven Architecture (EDA). The point of the book is to look beyond the basic pattern of EDA logic that says “send an event and interested parties will eventually receive it”. Beyond this simple pattern lie many patterns of processing logic that are common to most or all event driven systems.

For example (and I’m extrapolating a little here), as an Event Driven Architecture implementation grows and matures, there’s a natural desire to extract more and more information and value from the event flow. Where each event type may have started with just one interested consumer, others find uses for existing events.  Architects start devising applications that combine and rehash events in new ways. Often an EDA is adopted with grand visions of squeezing ever increasing value from events.

Over time, we see common patterns in the logic used to extract information from events in the EDA. We can use those patterns to design the logic at a higher level than code and code-level design patterns. We can use our logic designs to compare the goals and the logic of our EDA to another EDA, and to learn lessons and develop best practices for exactly how we will extract more value from events. Those logic patterns and their use is the core of … Event Processing.

One might not be interested in an EDA per-se, but still want to extract information and value from events. The most common example seems to be automatic (electronic) trading (other examples are listed in Chapter 1). Trading systems get events (mostly feeds of quotes, news and order executions) and extract information from them. While the media seems to focus on the mathematics, the fact is that most of the logic of trading is exactly the same as logic that extracts information from events in any EDA. The former deals with feeds and trading connectivity, the latter with message buses and shared event definitions. Trading may require maximum performance, an EDA may require guaranteed delivery. But mostly they share patterns in the logic, because they are both doing Event Processing.

Interesting enhancements to MapReduce

Posted in eventprocessing, financial services by Hans on October 19, 2009

Interesting paper about changes to M-R, in part to enable online processing. Ideas like pipelining and better inter-job data flow have been on the radar for a while.

This kind of thing will likely be useful on the Amazon cloud. Rather than uploading data and then running M-R, it might be possible to begin the job as the data is uploading, thus getting results back sooner.

Fun with color maps: visualizing financial time series

Posted in decision making, financial services, R, statistics by Hans on October 2, 2009

Here’s an interesting visualization of daily stock returns for 50 components of the S&P 500. I used the same kind of heat map plot from my previous post.

Again, this plot conveys a lot about the multiple series, but you have to look for a minute to see why. Once you start to look, you see a surprising amount of information come out – to me, more information than from plotting all these series in the usual chart formats.

The plot shows the percent change in price for 50 random components of the S&P 500 (on the Y-axis, one stock per row) for the 250 periods (time is the x-axis, left to right) prior to October 1, 2009. The 250 periods corresponds to just short of one year of trading, so here we see summarized a year of trading in 50 stocks.

Note: The returns are capped at the lower 20% quantile and the upper 80% quantile. So while the legend shows -3% to 3%, really we round anything above or below these values. This ensures that really high and low values don’t throw off the colors. A better approach would be a custom coloring scheme designed for this kind of data.

stock_heat

At first, yes, this looks like colorful noise.

But I see many patterns (admittedly, I have good eyesight):

  • October 2008 was a bad period for these stocks, as shown by all the red to the left.
  • In this bad period, we also see plenty of big price movements, as shown by rows having alternating red and blue. That represents sequences of plus or minus almost 3% on alternating days.
  • Also during the bad period, some stocks fared better than others. We see several rows with red on the left, but becoming green much faster than the others
  • October 2009 is much more calm, as shown by all the green, representing small changes, on the right.
  • Yet there are some stocks that see much more volatility than others. Just pick out rows of alternating reds and blues from the fields of green. These “colorful” rows are more volatile stocks.

I’m sure there are plenty of other meaningful patterns in here.

So overall, an interesting technique for visualizing many time series together.

The data comes from here. The R code to process it is below, and you’ll need to install the Heatplus library per my previous post (not from CRAN).

stocks.raw=read.csv("sp500hst.txt", header=FALSE)
names(stocks.raw)=c("date", "symbol", "open", "high", "low", "close", "volume")

stock.close=tapply(stocks.raw$close, stocks.raw$symbol, function(x){x})
stock.close.cleaned=stock.close[lapply(stock.close, length)==251]

set.seed(1234567)
stock.names=sample(names(stock.close.cleaned), 50)

stock.1=stock.close.cleaned[stock.names]

stock.returns=t(sapply(stock.1, function(d) {(d[2:251]-d[1:250])/d[1:250]}, simplify=TRUE))

heatmap_2(stock.returns, col=rainbow(length(stock.returns[1,]), end=4/6), Rowv=NA, Colv=NA,
do.dendro=c(FALSE,FALSE), scale="none", legend=2,
main="Stock returns", trim=.8)

Cool new idea for cloud-based scientific computing

Posted in financial services, MATLAB, R, statistics by Hans on August 19, 2009

Just looked at a very cool company: Monkey Analytics

They provide an AJAX interface to an Octave (MATLAB language) session, running on EC2. You upload your data and your scripts, then you run them in interactive mode via the remote interface. This is an alternative to keeping a dedicated high end box at your desk, or running a session on a shared server. Of course uploading big data is annoying, but that’s a known trade off.

They now offer interactive Octave sessions, an editor for your .m files (not a great editor yet), and an interface to run Python scripts. They intend to add R in the near future.

This is a really creative idea – the interactive session is very nice, much better than running a remote script. Plus the session persists over time, so you can use it from multiple computers (home, work, the web terminal at your boring vacation resort).

Hopefully, they can provide a convenient way for folks to run their big analytics without the hassle of maintaining additional hardware. Good luck Monkey Analytics!

Caches, BusinessEvents and trading infrastructure

Posted in eventprocessing, financial services by Hans on April 1, 2009

Lots of front office shops are moving (or have long since moved) to build their order management and position keeping (and other stuff) over a distributed cache. Caches are much better than they used to be, as is our understanding of what caches can and should do. For example, modern “caches” should deploy, distribute and partition processing just as well as they do data.

One interesting idea is to add in a rules engine into this mix. Conceptually, this makes sense because both inference and ECA rules are good at a lot (not all) of the logic that we want to build over a cache. For example, logic like “when XYZ state exists” or “when ABC event happens” then update another object, send an event or trigger another activity. The hard part is making the cache and the rules engine work together efficiently.

TIBCO seems to be approaching this issue from another direction with BusinessEvents, per this blog post. They’ve built a distributed cache under their rules engine, to help with distributed rules processing.

If they can incorporate the features of a modern distributed cache with a  good rules engine and framework for distributed processing, they could wind up with a killer product for front office infrastructure. Something to keep an eye on at least.

Cargill: food and … intelligence gathering

Posted in decision making, eventprocessing, financial services by Hans on January 13, 2009

The WSJ has an article “Cargill’s Inside View Helps It Buck Downturn” with good insight into Cargill’s commodities trading. They collect and use an unprecidented amount of information on global events that will affect food prices, leading to success in commodities trading even in poor market conditions. A great example of a modern intelligence gathering operation in action. And a reminder that fancy analytics can be beat with good old knowledge.

The Modelers’ Hippocratic Oath

Posted in decision making, financial services, statistics by Hans on January 8, 2009

An insightful modeler’s manifesto on Wilmott.com that can be applied well beyond finance models:

MODELERS OF ALL MARKETS, UNITE! You have nothing to lose but your illusions.

The Modelers’ Hippocratic Oath

~ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.

~ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.

~ I will never sacrifice reality for elegance without explaining why I have done so.

~ Nor will I give the people who use my model false comfort about its accuracy.
Instead, I will make explicit its assumptions and oversights.

~ I understand that my work may have enormous effects on society and the economy,
many of them beyond my comprehension.

This Manifesto can also be downloaded here (for logged in members of wilmott.com)

Lessons on probability from the credit crisis

Posted in decision making, financial services, statistics by Hans on December 11, 2008

I began to compose a reply to a comment on this blog by Will Dwinnell but it turned into this post. Here, I summarize how the credit crisis results from an age old problem faced by statisticians: properly mixing gut instinct with statistical methods.

Our current situation results from (at least) two kinds of predictions that went bad. First there came the predictions about the percents of default on certain debts. These predictions were used to calculate fair values of securities based on that debt, meaning that if you are genuinely sure about the percent of defaults, you can pretty much predict how much will be paid back over the life of the loans.

Those models of the rate of default had big flaws, but if you look at the data, you see that even the unprecedented rise in mortgage defaults could not possibly explain the fall in prices of all mortgage debt securities. There is plenty of debt that is in little danger of default, but has dropped in value by vast amounts. Sure, some debt securities will not be paid back because there were more defaults than predicted. But how come the whole debt market has gone crazy? I mean, this happens all the time with stocks – some stocks that once looked good turn out to have been far overvalued, and that can lead to big price swings, but generally not total collapse.

So we turn to the next part of the story: borrowing against debt securities. Once you own a security, you can use it as collateral to borrow money. This is called margin in the stock world. If you have a brokerage account with margin, you will notice that if you have $1 in stock, you will not be able to borrow $1 using this stock as collateral. Rather, you will be able to borrow much less than $1. Why is this? It’s because the lender wants to be absolutely sure that in the worst case, they will be able to recover all their money by forcing you to sell the stock. And if they lent you $1 but the stock is now worth $.80, and then you fail to pay, they will not have enough collateral to recover their money. So they have models that tell them how much the stock is likely to drop in the worst case. And they will lend you just enough so that if the stock drops according to their model, and then you default, they still have enough collateral to recover all their money.

If you have borrowed against a stock and that stock drops far enough, the lender will come around and force you to immediately pay back the difference between what you owe and what their model shows your stock to be worth in the worst case. This is called a margin call.

Institutions made a similar arrangement in terms of borrowing against debt securities. They own the security, so they can use it as collateral for a loan. Then they use that borrowed money… to buy more securities. Then they borrow against those. And so on – this is what they call leverage. Now the lenders will require that the borrower has, as collateral, a certain percent of the value of the loan. When the value of their collateral drops below this percent, the borrower will get a margin call and be forced to pay back some of this loan.

Now let’s say that I’ve got $1 of a stock and I borrow $.80 against it. I then buy another $.80 of that stock and then I borrow $.64 against that. And I keep going until I can’t borrow any more. Here I’m borrowing 80% of the value at each step, so I first borrow $.80, then $.64, then $.512 and a little figuring will show that from my original $1 I can borrow over $3.50. But I own $4.50 in collateral (the stock), so even if it drops in value by 20% I will still have just enough collateral to pay back my $3.50 in loans.

Now what if I borrow 90% at each step. Hmm, then I can borrow over $8. And if I borrow 95% of the value at each step, then I’m at $17 in loans from my original $1. Now to start with, I have enough collateral to cover these loans totaling 17 times my original money. But if this stock drops in value together as little as 5.5%, then my collateral is worth less than what I borrowed (if I borrowed 95% of the value at every step). Now if I default on that loan – then the lenders are out money. And that money can never be recovered – the underlying assets can’t be sold to recover the money and it has simply evaporated. Unless the lender is willing to take the stock in payment and wait for it to go back up in value, a risky proposition at best.

And so, you can see that there is a balancing act. On one hand, there is a level of risk that the lender should be willing to take. Maybe they are confident that my stock won’t drop 20%, 30%, 50% – but there is some level at which they are willing to make the loan. Actually, that’s not true – lenders often stop offering margin on the most volatile stocks – but let’s say we’re talking about a stock with a solid record of performance.

On the other hand, it would be insane to allow me to do this at 95%. Even if I show that, for the past 20 years, my stock has never dropped by 5.5% in a given year, any normal person would still know, in the back of their mind, that a 5.5% drop can happen with just a little market hiccup. No matter how much I show you that I have found a stock that is historically rock solid, you would still be an idiot to rely very heavily on the prediction that my stock will not drop 5.5% in the future.

But that is exactly what the finance firms did. They predicted the values of many debt securities to be so stable that they could count them essentially as cash in terms of risk (the chance of decreasing in value). And if I have $1 in cash, well you’d feel pretty safe lending me $.98 using that $1 as collateral. Now, we do the above calculation with me borrowing 98% of the value at every step. I can borrow $44 based on that $1 initial capital. If the securities keep going up, everything is just great – money is flowing like water. But a drop of 5% and now I am underwater with not enough collateral to pay back a default. And after a drop of 15%, it’s hopeless and if I default, the lenders stand to lose 13% of their money. And let’s say that instead of $1 in starting capital, I had $1 billion. After a 15% drop in price, my default on debt evaporates over $5 billion in borrowed money (in addition to my original capital).

So then the chain begins: Once one kind of debt security started to go down (sub-prime),  the selling started. And when the selling started, prices went down and firms began to get margin calls – and that resulted in more selling, and more margin calls, until firms were forced to sell debt that wasn’t in danger of default problems. And that is what no one expected. All this “safe” debt was suddenly falling in value simply because there were many sellers and zero buyers in the market. Firms began to get margin calls on debt that had no reason to drop in value – debt that is still perfectly safe to this day. The margin calls began to multiply, driving the debt value further down. Since the market was on its way down, this selling to meet margin calls was losing money by the bucket full.

And then it gets really bad, since many institutions rely on daily, weekly and monthly borrowing to operate, and it became uncertain whether some would have the collateral to back up that borrowing (especially since the stock prices were tanking as well, so the firms couldn’t even use their own stock for collateral). All it takes is one week where they can’t borrow money and they are out of business and defaulting on their loans, destroying money now by the bail. And that prospect makes everyone very nervous to lend to anyone.

But it was a house of cards, ready to fall, ready for that chain reaction to begin – because of this fundamental assumption that some debt securities could not drop in value. Even though the price of those securities is determined just like any other price – by what others are willing to pay. None of the models predicting the value of those debt securities took into account the fact that, at a fundamental level, all it takes to make the securities fall in value is a market with more sellers than buyers. It doesn’t matter what the “fair value” or long term return of the securities is – today they are worth exactly what you can sell them for.

Will says “They should have hired a better class of statistician.” But honestly, what would a statistician have done? Everyone knew that, in theory, the debt securities could drop in value. The executives don’t like to talk about it, but anyone in the industry who (a) understands the models and (b) is being honest, will tell you that the idea crossed their mind at some point in the past few years. The folks making the decisions made the classic and human mistake of misinterpreting a probabilistic statement (that the securities were unlikely to drop in value) as a fact.

Imagine the skeptical statistician in this scenario, speaking up in a big meeting: “hey everyone, you know that just because these securities have a stable value now, doesn’t mean they can’t go down.” So the room grows silent, all eyes are on this one realistic person and some executive asks: “Ok… tell me the probability that the securities will drop in value this year.” Remembering his (or her) training, the statistician says honestly: “Well it’s never happened, so we have no evidence with which to predict. So no one really knows. Even if we had a model, without any evidence we have no way to know how good it is.”

Now what choices does the executive have here? The statistician can’t quantify the risk, but it’s definitely there. They are back to that gut feeling – should we buy into this unquantifiable risk or not? Some said no. Many said yes, especially after they saw how well it was working for the folks who jumped on the bandwagon early. In order to prevent the situation, many executives would have had to make that gut call to say “I don’t care what history says, it’s just not possible that these debt securities are as safe as cash. And I’m willing to forgo a whole lot of potential profit based on this belief.” The statistician, no matter how professional they are, can’t make a call like that (unless they’re also the executive, I suppose).

What’s required is a fundamental shift in thinking about probability in business. Maybe as part of bailouts, firms should be forced to introduce mandatory training on ethical and logical interpretations of probability and chance.

Seminar on Computational Finance with R at Columbia

Posted in financial services, statistics by Hans on November 18, 2008

See announcement below for an interesting seminar. Inconvenient time, though.

Computational Finance with R
http://www.stat.columbia.edu/pages/ComputationalFinance/index.html

Department of Statistics organizes a workshop about using statistical
computing with R in finance. The conference would like to bring
together both academics and practitioners, and it is open to public.
Admission is free, however we require that the participants register
in advance.

Registration Link
http://www.stat.columbia.edu/pages/ComputationalFinance/register.html

The conference is co-sponsored by REvolution Computing
(http://www.revolution-computing.com/home)

Schedule:

1:45 – 2:00PM   Refreshments

2:00 – 2:05PM   Opening Remarks

2:05 – 2:40PM   Whit Armstrong – Discount Curve Construction with fts,
RLIM, and RFincad (KLS Diversified Asset Management)

2:40 – 3:15PM   Anthony Brockwell – Quantitative Trading in Practice
(Horton Point LLC)

3:15 – 3:50PM   Bryan Lewis – High Performance R with Rpro (REvolution
Computing)

3:50 – 4:05PM   Coffee Break

4:05 – 4:40PM   Scott Payesur – Comparing Multivariate GARCH models
using Realized Covariance (UBS Asset Management)

4:40 – 5:15PM   Peter Carl and Brian Peterson – Performance Analysis
in R (PerformanceAnalytics)

5:15 – 5:50PM   Jeff Ryan – Quantmod Package (Quantmod)

6:00 – 6:30PM   Closing Reception

EP can help the “core crisis”?

Posted in eventprocessing, financial services by Hans on June 1, 2008

So I was reading some posts about whether CEP (Complex Event Processing) is mature, when Mark Palmer’s excellent post linked to this blog entry on WS&T. And while reading that post, I noticed a link to this article, also on WS&T. The article describes how hard it is to find programmers who can write highly parallel code that would take advantage of new multicore architectures. The first comment on the article calls it the biggest crisis facing computing since… something.

Whether or not “CEP is mature”, let’s look at the software available right now, and I’ll call it EP for Event Processing so as not to even get near the question of what Complex means. You have here an ecosystem of software from StreamBase, Progress Apama, Coral8, Tibco, Aleri and more, much of which has been battle tested for years now. They are mature in the sense that they do what they claim and they don’t crash or hit you with lots of bugs.

And what do they do? They each provide a high level language that helps a programmer write high performance applications without spending much time thinking about (or even knowing about) the performance plumbing. Here you have a bunch of players that are all in the game of bringing down the skill set needed to build high performance applications. Some of them automatically thread and provide a framework that guides the user to write an application that is inherently highly scalable, without ever knowing what it is that makes it scalable.

So if we have an impending crisis brought on by the growing number of cores, we also have a potential solution available today! Several potential solutions, even. And are these solutions mature enough to be deployed in banks? YES! They are already deployed in banks. I’ve seen StreamBase churn through literally all the electronic trading messages in a firm, in real-time, and stay up for full weeks without a hiccup, just as it’s supposed to. All the major vendors have such stories. Their software works. It could be used to make many applications more scalable right now (see my previous post on the subject).

In the future, I hope that the people worried about the impending core crisis go talk to the EP people (on the CEP forum) about how these two ideas can result in not just a problem but a solution. And I’d hope that any organization looking to evaluate CEP understands that if they need to lower latency, then this can be a place to start, before investing in the next generation of hardware. Again, see this previous post on creating a uniform architecture that ensures scalability.

I’ll write about maturity and such in a later post. Sufficed to say that whether or not modern “CEP” products meet everyone’s needs or could be improved…they work well for what they do, and that is to bring down the cost and the skill set required to build scalable and high performance applications. Even if they don’t have the one feature that this or that person believes will make them officially “CEP”, they do have many features that were driven by customer demand, integrated into stable platforms that can drop the TCO and time to market for low latency, scalable software. And that is something, IMO, to pay attention to.

Follow

Get every new post delivered to your Inbox.