So that's what caused that "flash crash"

The authorities have recently been insinuating that the “flash crash” five years ago was caused by a guy in London who was trading on his own account from his parents’ house, trying to manipulate the market. Of course they have been careful to not say this explicitly, since it can’t really be true. There is an old jibe that “if people built houses the same way that programmers write programs, the first woodpecker that came along would destroy civilization”. That’s unfair to careful programmers, but it’d be fair to apply it to the programmers of a stock market where trillion-dollar swings in valuation could be caused by the misdeeds of someone playing with mere millions of dollars. Not that there was serious reason to believe that even that happened; the announced link to the flash crash seems more like an attempt to grab headlines for a minor arrest than a real attempt to explain the flash crash. And the coverage from people who know finance (see, for example, columns by Matt Levine and by Michael Lewis) has indeed been appropriately skeptical.

But it’s not like they have a compelling alternative theory of the cause of the flash crash. Lewis hints at high-frequency traders being to blame, but that’s a little too easy to really be satisfying. With a fast crash, one can expect that people who trade fast had a lot to do with it. But that does not say who exactly was at fault nor what exactly they did wrong.

Lewis does, though, link to a report by Nanex which offers some much more decided opinions. Nanex reports seem to mostly be by Eric Hunsader, although they are unattributed, so perhaps others are involved. In any case, Nanex is a small company whose business is collecting, analyzing, and disseminating market data. Nanex noticed that at the start of the flash crash

“… quotes from NYSE [the New York Stock Exchange] began to queue, but because they were time stamped after exiting the queue, the delay was undetectable to systems processing those quotes.”

Perhaps one has to have spent many hours pondering the mysteries of feedback and stability for those words to leap off the page proclaiming “this was the cause!”. But perhaps I can explain.

Pretty much any feedback control system can be deranged by adding delay to the feedback; and the algorithms for trading on the market constitute, collectively, one huge feedback control system. Each algorithm sends orders into the exchange while monitoring the price quotes that come back from the exchange (the feedback). Large parts of the control laws of the system are kept secret (people don’t reveal their trading strategies, nor the algorithms that implement them), and the whole system is immensely complicated, but one can get an idea of how devastating added delay is by looking at simpler disciplines.

Electrical engineers, for instance, often get bitten when they try to design a system in which an op amp drives a capacitive load. Op amps are generally used with feedback, and a capacitive load delays that feedback, which often turns what would otherwise be a well-behaved circuit into one that oscillates madly. Solving this might involve removing the capacitive load (or at least moving it outside the feedback loop), but might also involve increasing it (to where the added capacitance forms the “dominant pole”).

In aircraft design, there is something known as a “pilot-induced oscillation”, PIO for short. An example can be seen in this video of a prototype F-22 aircraft crashing. But by convention, using the term “PIO” does not imply that the pilot was to blame in any moral or legal sense. He might be; but the usual idea of “PIO” is that the plane confused the pilot by how it behaved – in particular, by the delay in its responses to the controls. If an aircraft responds instantly to the controls, PIOs seldom occur. Yet on the other extreme, if an aircraft responds quite slowly to the controls, as in a large airliner, there is again seldom a problem. (This parallels the electrical engineering situation in which increasing the capacitance can tame an unruly circuit.) PIO problems occur mostly when the aircraft’s response time is similar to the pilot’s reaction time.

PIOs are generally cases of pilots getting confused by a constant delay; if aircraft were to randomly introduce additional delays into the control response, we can only imagine how much confusion it would cause, since no sane aircraft designer would ever do such a thing. Introducing additional delays right when things got hairy would be the worst possible scenario. But that’s apparently what happened with the stock market.

The undetectability of the delay in stock market quotes is what makes the flash crash comparable to the above two examples. Both in an electrical circuit and in piloting, no information is available to the control system about how long the delay in feedback is. The feedback voltage just takes longer to change, or the airplane takes longer to respond. It does not tell anyone that it has been delayed, it just is delayed. If quotes from the stock market had correct timestamps, that would be a considerably more benign situation: the quotes would be delayed, but they’d be saying how much they were delayed, so algorithms could adapt accordingly. (They might not adapt well, since the code for dealing with long delays would be rarely exercised and thus likely to be buggy, but at least they could try to adapt rather than being left in the dark.)

As a third example of feedback delay causing misbehavior, somewhat closer to the matter at hand (indeed, it might even have been involved), there is the problem of bufferbloat. That is a term for the problems that occur when networking hardware and software has buffers that are too large. Computer memory, in recent years, has become so cheap that adding oversized buffers can be done at very little cost, even in cheap consumer devices. It is common for network devices these days to have enough buffering capacity to store several seconds’ worth of data packets before forwarding them.

At first glance the increased buffer size seems innocuous: instead of packets getting discarded due to lack of buffer space, they are stored and later forwarded correctly. The problem is that TCP, the Internet’s main connection protocol, was not designed to deal with this. Packets are supposed to get discarded; that is how the computer sending the packets figures out that the link is congested, whereupon it throttles down the data rate. But for the TCP control algorithm to work smoothly, it must get this feedback (telling it that packets have gone missing) in a timely fashion. If the feedback is delayed, the algorithm overcompensates, oscillating between sending data too fast and sending it too slowly. This wreaks havoc not just on that connection but on any others which happen to share the communications channel. With TCP, delays in feedback are somewhat detectable (due to the timestamps defined in RFC 1323), but the system somehow still manages to misbehave.

Undetectable delay is such a potent destabilizing influence that one might wonder why there haven’t been even more flash crashes. The answer seems to be that not all participants were victimized by the delay. This will take a bit of explanation of the stock market system as it exists today.

The authorities have decreed that there can be multiple stock exchanges all trading the same stocks, but that they have to be bound together to all more or less have the same prices, to be part of a “national market system”. Rather than just letting each exchange operate independently and letting traders arbitrage between them to even out the prices, there is a “consolidated quotation system” that gets data feeds from all the exchanges and computes a “national best bid and offer”: a listing, for every stock, of the best price at which anyone will buy and the best price at which anyone will sell. But this transmission and consolidation of data takes time (several milliseconds), so the system can never be quite up to date.

The rules dictate that orders sent into any exchange must be routed to the exchange with the best price. But that is more a promise than a guarantee, since the transmission takes time; information can’t travel any faster than the speed of light. So when the order arrives the price may no longer be available. And it’s not just that accidents occasionally happen; high-frequency traders play games with this, quickly withdrawing orders and substituting ones with worse prices. To really get the simplicity that the system pretends to, the regulators would have to dictate that each stock be traded only on a single exchange; all orders for it would be sent to that exchange. There could still be competition between exchanges, but each stock would have to decide which of the competing exchanges it was to be listed on.

Alternatively, they could just drop the pretense of the system being a unified whole and rely on arbitrage to equalize prices between exchanges; even that would be simpler than a system that tries to be unified but really isn’t.

In any case, that is the environment that the flash crash took place in: multiple exchanges which are supposed to act like a single unified system but don’t quite, with the consequence that high-frequency traders and other heavy hitters find themselves needing to get direct data feeds from all the exchanges rather than just subscribing to the consolidated feed. But since direct feeds are expensive, the vast majority of traders just get the consolidated feed.

Though that first Nanex flash crash report identifies mis-timestamping, it doesn’t say who exactly did it. Subsequent reports (e.g. this one) fill in that detail: it was the consolidated quote system, which timestamped the quotes after they arrived at the data center which does the consolidation. Indeed, it is still doing so, though that should change due to the recent SEC ruling that exchanges must start sending timestamps to the consolidator – a ruling that came to public attention two weeks ago when the NYSE, in preparing to implement it, broke their systems for a large part of a day. (What seems to be disconfirmed, though, are the Nanex claims that there were already timestamps on the data coming into the consolidator and that those correct timestamps were being replaced with incorrect timestamps.)

Another Nanex article adds a further detail: while the direct feeds to colocated high-frequency traders use the UDP protocol, the feed to the consolidated system uses TCP. (Yes, the stock exchanges use the same sort of networking hardware that the Internet does, though of course operating over private links rather than over the public net.) TCP, it will be recalled, is the main victim of bufferbloat; UDP is less affected. The “U” in UDP doesn’t actually stand for “unreliable”, but that’s the easiest way to remember it: in UDP, data packets are just sent over the network without keeping track of whether any of them have been lost or making sure they arrive in the right order. UDP may in fact be reliable if the underlying network is reliable, but it doesn’t protect the data like TCP does, and thus is not subject to the complicated pathologies that can arise from trying to protect the data. Still, even with UDP the large buffers are still there, invisible until some link gets overloaded but then adding delays to the transmission of data.

In 2010, the year of the flash crash, the available networking hardware and software generally suffered from severe bufferbloat; widespread awareness of the bufferbloat problem dates from 2009, when Jim Gettys coined the term and wrote several articles on it. Though good algorithms for managing buffers have since been developed, getting them widely fielded in networking hardware is another matter, and is still (in 2015) very far from complete. The flash crash involved, at its worst, delays of tens of seconds; that’s more than enough for a TCP connection to exhibit pathological behavior. So bufferbloat probably made the flash crash worse, though to what extent is hard to tell.

The official SEC / CFTC report on the flash crash reads like its authors had heard of the Nanex argument that delay was the cause, thought they had to say something regarding that argument, but didn’t think they had to really take it seriously. Indeed, from a social point of view they didn’t; the Nanex report gives an account of how delay caused the flash crash, but it’s the sort of account that even when correct does not convince people. It is terse enough and has enough jargon that it can’t reach the general public; and specialists will wonder what parts of it the author really knows and what parts he’s just guessing at. I am not too sure myself – which is why, though not disagreeing with it, I have not relied on it here, but instead have just been arguing for the general proposition that delays in feedback are destabilizing. But for those who want a blow-by-blow account, this, again, is the link.

The official report’s section on delays focuses on the worst delays, which occurred rather late in the crash and thus couldn’t have had a causative role. The word “timestamp” (or even just “stamp”) occurs nowhere in the entire report. There is a mention of a delay of five seconds in a direct feed; but there are many direct feeds, and nothing is said about how prevalent this might have been, nor about whether any delays in direct feeds occurred early in the crash and might have fed it. On the whole the report reads like they were looking for a villain, rather than regarding the situation as a dynamical system and trying to understand how the system’s reactions could have been so messed up.

They did in fact find a villain, or at least thought they did: a company which had been selling a large number of “E-Mini” futures (futures on the S&P 500 stock index). The Chicago Mercantile Exchange disagreed about this being at all villainous, as did Nanex, who said that the SEC/CFTC had even gotten their facts wrong about the trading algorithm that the company used. That Nanex article then goes on to argue that the actual cause of the system overload and delays was that high-frequency traders had dumped a lot of E-Mini contracts very quickly – and that since the E-Mini is something like a future on the whole market, this propagated via arbitrage into quite a lot of buy and sell orders on various stocks and other financial instruments, which is what produced the overload and delays in the quote system.

Looking at the graph where those large dumps are marked, they seem to have a total value of something like 500 million dollars (about ten thousand contracts, with each contract being 50 times the value of the S&P 500, which is roughly a thousand). This is still nowhere near the sort of impetus that is normally required to create a trillion-dollar shift in the stock market. But it is plausible as something that increased the level of market activity to where the quote system became overloaded.

This tactic of sending a big pulse of orders strongly resembles the behavior of the “Thor” algorithm that Michael Lewis describes in Flash Boys as being a way for investors to escape the depredations of high-frequency traders. The idea is that the big pulse of orders all gets filled before anyone can react by moving prices. But here it was done by high-frequency traders. It’s also something that can be done to people like that guy in London who was trying to manipulate the market: slamming the market with a massive pulse of buy orders would make his not-meant-to-be-filled sell orders actually get filled, giving him notice that he was swimming in deeper water than he’d thought and with larger predators.

In any case, the details of who exactly overloaded the system seem relatively unimportant: any system has to expect occasional overloads and should handle them gracefully. That might include a delay but does not include mis-timestamping which conceals the existence of the delay.

To finally get back to the question of why there haven’t been more flash crashes, as mentioned above, high-frequency traders and other large players in the market have direct feeds; they don’t rely on the consolidated feed. Its misbehavior in the flash crash no doubt prompted them to use it even less, at least as a source of ground truth. But they probably use it more as a source of information that reveals what other traders are misinformed about. Preying on the misinformed can be quite profitable even if the misinformation is just a matter of 200 milliseconds of delay. And though predatory, it probably is predatory in a way that stabilizes the market, since taking advantage of misinformed people moves the market in the opposite direction from the way they’d move it.

But it would be even better if there were no misinformation in the first place. Having correct timestamps, so that delays are evident, is a good first step. As mentioned above, the SEC has dictated that timestamps be sent from the exchanges, and this is being implemented. If these timestamps are passed on to the the consolidated feed’s customers (as they should be), a lot of customers will suddenly notice that they’re getting a fair bit more delay than they thought they were getting, and will complain.

And they should indeed complain, since there’s no reason for quotes to get stuck in buffers on the channel between the exchange and the consolidator. The fix (courtesy of the bufferbloat community) is for the software that sends the quotes to rate-limit its output. It would keep its own modest buffer (perhaps enough to hold a millisecond’s worth of quotes) and would discard any quote that overflowed the buffer. It would empty that buffer at a fixed rate which was slightly below the data rate of the outgoing channel. This way data would always be arriving at any point in the channel at a slower pace than it could be sent out at, so buffers in the channel would not fill up.

At least that would be the simplest fix; more sophistication could be applied. For instance quotes closer to the best bid/offer could be given priority over quotes farther from it, and could bump them from the buffer. Also, a level of fairness among stocks could be enforced so that huge quote activity in one stock wouldn’t crowd out occasional quotes in other stocks. But even the simplest solution would be an improvement over leaving the buffers uncontrolled.

The above fix is just for delays on the link from an exchange to the consolidator. Delays from the consolidator to subscribers are harder, since they go over many different channels and share them with other data. There can also be delays in the matching engine of the exchange itself; and there can be congestion on the way to the exchange. In those cases, too, it’s probably best to adopt the general policy to throttle rather than delay – that is, to discard orders that are coming in too fast rather than letting them accumulate in a big buffer to later emerge stale. How exactly that might be done is a large topic; but at first glance it seems that what is needed is not some Big Idea but rather a lot of petty negotiations about who pays for what network services and exactly what they get in exchange – negotiations that already take place, but should be expanded to include stipulations about what exactly happens when a link gets congested: whose packets get priority (if anyone’s), what data rate is guaranteed to the subscriber, and what the maximum delay might be.

In modern programming, there are a lot of things that are done not because the computer needs them for the code to work but because the programmer needs them to cover for his natural human failings; they make the programming environment simpler and more predictable. In the stock market, eliminating congestion delays falls into this category. Code that deals with markets will normally be written and tested with delays that are relatively minimal. When all these codes are suddenly thrown into a high-delay environment the results will be unpredictable; even if each individual piece of code does something locally sane, they may interact badly. The players with direct feeds perhaps have enough of an information advantage and enough money to take advantage of such people and ruthlessly crush their attempts to destabilize the market, but it still would be best if they didn’t have to.

Also, in the case of high-frequency traders, the “enough money” part may be doubted: such companies try hard to keep the level of their holdings small. After the flash crash was well under way, there was a “hot potato” period where HFTs traded E-Mini contracts among one another, continually passing them on at lower and lower prices. If any of those firms had had the nerve to hold onto those contracts for a few minutes until the market had recovered, it would have made a very healthy profit. They no doubt realized this to their chagrin after the fact, and at least thought of changing their algorithms to deal better with such situations.

Indeed, in general, flash excursions like this (“excursions” because they can happen upwards as well as downwards) represent someone being foolish and are an opportunity for others to take advantage of that foolishness. Thus does the market discipline its own. But the more complicated the system is – the more it misbehaves in weird ways – the longer it takes for participants to learn how to deal with it. And when an event is as large as the flash crash was, one can expect that the system itself, not just participants, displayed some rather serious misbehavior. The hidden delays in the consolidated feed definitely qualify as serious misbehavior.

Of course, while I think this was the cause of the crash, I’m not pretending to have offered proof of it – not only because I haven’t delved deep into details of what happened that day, but also because in cases like this even people who agree about what caused what in the chain of events can disagree about which of those causes was truly blameworthy and points to something that should be changed to prevent a recurrence. But in this case eliminating congestion delays seems to be the easiest prescription: it does not involve any moral crusades but is just a matter of fixing things on a technical level.