How exchanges turn order books into distributed logs
Posted by rundef 7 days ago
Comments
Comment by alexpotato 2 days ago
A couple of quants had built a random forest regression model that could take inputs like time of day, exchange, order volume etc and spit out an interval of what latency had historically been in that range.
If the latency moved outside that range, an alert would fire and then I would co-ordinate a response with the a variety of teams e.g. trading, networking, Linux etc
If we excluded changes on our side as the culprit, we would reach out to the exchange and talk to our sales rep there would might also pull in networking etc.
Some exchanges, EUREX comes to mind, were phenomenal at helping us identify issues. e.g. they once swapped out a cable that was a few feet longer than the older cable and that's why the latency increased.
One day, it's IEX, of Flash Boys fame, that triggers an alert. Nothing changed on our side so we call them. We are going back and forth with the networking engineer and then the sales rep says, in almost hushed tones:
"Look, I've worked at other exchange so I get where you are coming from in asking these questions. Problem is, b/c of our founding ethos, we are actually not allowed to track our own internal latency so we really can't help you identify the root cause. I REALLY wish it was different."
I love this story b/c HN, as a technology focused site, often thinks all problems have technical solutions but sometimes it's actually a people or process solution.
Also, incentives and "philosophy of the founders" matter a lot too.
Comment by philipov 2 days ago
This company's official ethical foundation is "Don't Get Caught."
Comment by WJW 2 days ago
Comment by pants2 2 days ago
Comment by atomicnumber3 2 days ago
Flash Boys was always poorly researched and largely ignorant of actual market microstructure and who the relevant market participants were, but it also aged quite poorly as all of their "activism" was useless because the market participants just all smartened up purely profit-driven.
If you want to be activist about something, the best bet for 2026 is probably that so much volume is moving off the lit exchanges into internal matching and it degrades the quality of price discovery happening. But honestly even that's a hard sell because much of that flow is "dumb money" just wanting to transact at the NBBO.
Actually, here's the best thing to be upset about: apps gamifying stock trading / investing into basically SEC-regulated gambling.
Comment by Workaccount2 2 days ago
Or leave things in place, but put a 1 minute transaction freeze during binary events, and fill the order book during that time with no regard for when an order was placed, just random allocation of order fills coming out of the 1 minute pause.
These funds would lose their shit if they had to go back to knowledge being the only edge rather than speed and knowledge.
Comment by neonbrain 2 days ago
Comment by stuxnet79 2 days ago
It might add a bit of color to this conversation.
Comment by lopatin 2 days ago
Comment by noitpmeder 2 days ago
Comment by alexpotato 2 days ago
Many firms have them and they are a hybrid of:
- DevOps (e.g. we help, or own, deployments to production)
- SRE (e.g. we own the dashboards that monitored trading and would manage outages etc)
- Trading Operations (e.g. we would work with exchanges to set up connections, cancel orders etc)
My background is:
- CompSci/Economics BA
- MBA
- ~20 years of basically doing the above roles. I started supporting an in house Order Management System at a large bank and then went from there.
For more detail, here is my LinkedIn: https://www.linkedin.com/in/alex-elliott-3210352/
I also have a thread about the types of outages you see in this line of work here: https://x.com/alexpotato/status/1215876962809339904?s=20
(I have a lot of other trading/SRE related threads here: https://x.com/alexpotato/status/1212223167944478720?s=20)
Comment by noitpmeder 2 days ago
I'm a front office engineer at a prop firm -- always interesting to get insight into how others do it.
We have fairly similar parallels, maybe with the exception of throwing new exchange connections to the dedicated networking group.
Always love watching their incident responses from afar (usually while getting impacted desks to put away the pitchforks). Great examples of crisis management, effectiveness and prioritization under pressure, ... All while being extremely pragmatic about actual vs perceived risk.
(I'm sure joining KCG in August of 2012 was a wild time...)
Comment by alexpotato 2 days ago
It's definitely a job that you don't hear much about but has a lot of interesting positives for people who like technology and trading. Especially if you prefer shorter term, high intensity work vs long term projects (e.g. like developers).
> Always love watching their incident responses from afar
I actually have a thread on that too: https://x.com/alexpotato/status/1227335960788160513?s=20
> (I'm sure joining KCG in August of 2012 was a wild time...)
And not surprisingly, a thread on that as well: https://x.com/alexpotato/status/1501174282969305093?s=20
Comment by 8cvor6j844qw_d6 2 days ago
I did consider applying for a role in a very similar field, but figured I'll be fighting an uphill battle with no knowledge in trading/stock market/etc.
Comment by alexpotato 2 days ago
but that story is not the most efficient way (although I do talk about a better approach at the end).
To summarize:
A LOT of hedge funds hire non finance people for specific roles e.g. cloud, Linux tuning, networking etc
The smarter ones have realized that there are great people everywhere e.g. Gaming company SREs have a lot of relevant experience due to high traffic load, short SLAs and lots of financial risk due to outages. Applying for a role in one of those departments is a lot easier than trying to jump directly into a trading desk/operations role.
Finally, knowing someone on the inside also helps a lot which is made MUCH easier by LinkedIn, Twitter etc
Comment by TheJoeMan 2 days ago
Comment by kasey_junk 2 days ago
Comment by noitpmeder 1 day ago
Now, I'm not disqualifying them if they dont read the journal. But if they can't demonstrate any proactive interest in finance, or tell me about some happenings/events/stories they personally find interesting (theres a ton of interesting stuff happening) its definitely an amber flag.
Comment by reactordev 2 days ago
Comment by dboreham 2 days ago
That was not why. Possibly the cable made a difference (had an open circuit that made the NICs back down to a lower speed; noisy leading to retransmissions) but it wasn't the length per se.
Comment by thadt 2 days ago
When we're measuring time on the scale of nanoseconds then, yes, cable length is definitely something we care about and will reliably show up in measurements. In some situations, we not only care about the cable length, but also its temperature.
Comment by thijson 2 days ago
Comment by jshaqaw 2 days ago
Comment by shawabawa3 2 days ago
I'm probably missing some second order effects but it feels like this would mitigate the need for race to the bottom latencies and would also provide protection against fat fingered executions in that every trading algorithm would have a full second to arbitrage it
Comment by quickthrowman 2 days ago
I’d rather have penny-wide spreads on SPY than restrict trading speed for HFTs. Providing liquidity is beneficial to everyone, even if insane amounts of money are spent by HFTs to gain an edge.
Comment by Workaccount2 2 days ago
The bad part of HFT is paying the smartest young minds this country has to offer to figure out how the parse GDP data as fast as computationally possible so they can send in an order before other players can. That's a dumb game that doesn't provide much benefit (besides speed in sparse critical moments adding a few % to the funds ROI).
They can arbitrage all day, but don't let them buy every Taylor Swift concert ticket the moment it goes on sale because they have a co-located office with a direct fiber line, ASIC filled servers, and API access.
Comment by jshaqaw 1 day ago
I have also seen enough to be quite sure that many hft strategies are quite normie investor predatory.
Again, I’m not zealot. I trade stuff. I love liquidity. I’m happy to pay someone some fraction of a penny to change my mind. Service provided. But the returns from vanilla liquidity provision commoditized long ago to uninteresting margins. That leaves a lot more of the hft alpha pool in the predatory strategies and capital flows where the incentives are.
Comment by blibble 2 days ago
"continuous periodic auctions"
Comment by dmurray 2 days ago
First, it is of course possible to apply horizontal scaling through sharding. My order on Tesla doesn't affect your order on Apple, so it's possible to run each product on its own matching engine, its own set of gateways, etc. Most exchanges don't go this far: they might have one cluster for stocks starting A-E, etc. So they don't even exhaust the benefits available from horizontal scaling, partly because this would be expensive.
On the other hand, it's not just the sequencer that has to process all these events in strict order - which might make you think it's just a matter of returning a single increasing sequence number for every request. The matching engine which sits downstream of the sequencer also has to consume all the events and apply a much more complicated algorithm: the matching algorithm described in the article as "a pure function of the log".
Components outside of that can generally be scaled more easily: for example, a gateway cares only about activity on the orders it originally received.
The article is largely correct that separating the sequencer from the matching engine allows you to recover if the latter crashes. But this may only be a theoretical benefit. Replaying and reprocessing a day's worth of messages takes a substantial fraction of the day, because the system is already operating close to its capacity. And after it crashed, you still need to figure out which customers think they got their orders executed, and allow them to cancel outstanding orders.
Comment by londons_explore 2 days ago
For example, Order A and order B might interact with eachother... but they also might not. If we assume they do not, we can have them processed totally independently and in parallel, and then only if we later determine they should have interacted with each other then we throw away the results and reprocess.
It is very similar to the way speculative execution happens in CPU's. Assume something then throw away the results if your assumption was wrong.
Comment by noitpmeder 2 days ago
Happy to discuss more, might be off the mark... these optimizations are always very interesting in their theoretical vs actual perf impact.
Comment by eep_social 2 days ago
Comment by noitpmeder 2 days ago
Comment by lanstin 2 days ago
Comment by blibble 2 days ago
not necessarily
many exchanges allow orders into one instrument to match on another
(very, very common on derivatives exchanges)
Comment by dmurray 2 days ago
a) I don't know of any exchange where this could be true for specifically Apple and Tesla, so the example is OK
b) you can still get some level of isolation, even on commodities exchanges you can't typically affect the gold book with your crude oil order (the typical case is that your order to buy oil futures in 2027 matches against someone selling oil in 2026, plus someone selling a calendar spread)
c) for exchanges that do offer this kind of functionality, one of the ways they deal with high volumes is by temporarily disabling this feature.
Comment by halfmatthalfcat 2 days ago
Comment by alexpotato 2 days ago
A notable edge case here is that if EVERYTHING (e.g. market data AND orders) goes through the sequencer then you can, essentially, Denial of Service to key parts of the trading flow.
e.g. one of the first exchanges to switch to a sequencer model was famous for having big market data bursts and then huge order entry delays b/c each order got stuck in the sequencer queue. In other words, the queue would be 99.99% market data with orders sprinkled in randomly.
Comment by blibble 2 days ago
for an exchange: market data is a projection of the order book, an observer that sits on the stream but doesn't contribute to it
and client ports have rate limits
Comment by alexpotato 2 days ago
e.g. a lot of these systems have a "replay" node that can be used by components that just restarted. You want the replay to include ALL of the messages seen so you can rebuild the state at any given point.
(There are, of course, tradeoffs to this so I'm just commenting on the "single sequencer" design philosophy)
Comment by blibble 2 days ago
even for systems built on a sequencer which do (e.g. an OMS), the volume is too large
the usual strategy is for processes which require it, is to sample it and them stamp it on commands
which maintains the invariants
(my background: I have been a developer on one of Mike Blum's original sequencers)
Comment by alexpotato 1 day ago
Comment by usefulcat 2 days ago
Comment by nly 4 hours ago
One thing often missed here is that most orders, even from most hedge funds and prop trading shops, still go via broker systems. Direct Market Access is getting more common but it's often a pain in the arse from a regulatory and disclosures perspective, and means you lose out on short locate (shares that you can borrow from your broker to short sell).
"Sponsored Access", where you connect directly to an exchange but your broker monitors your activity via a drop copy, is a happy middle ground.
Surprisingly though, I've heard of at least one trading venue where going direct is slower, because the venues own risk checks are slower than the the ones implemented by at least one broker, and the broker themselves are allowed to bypass the risk checks put in place at the exchange for general DMA clients. "Direct" is clearly subject to negotiation.
I've also heard of brokers who tried to implement their gateways in FPGA, and have later shuttered the project, having gone back to relatively slow software gateways for the flexibility.
A lot of trading still happens via FIX, which is a slow ASCII protocol. Most prop shops will have aggressively optimized FIX parsers and serialisers out of necessity.
People think all trading happens in these elite, bleeding-edge hardcore sub-microsecond systems, but a lot of it is just dogshit.
Things are a bit more optimised in the derivatives space because of the insane volumes (Options trading just for US equities is easily into the petabytes of storage per year).
Comment by croemer 2 days ago
Comment by bwfan123 2 days ago
Comment by andrepd 2 days ago
Comment by genidoi 2 days ago
> The log is the truth; the order book is just a real-time projection of this sequence.
> The book is fast; the log is truth.
> Matching engines can crash; the log cannot.
Comment by bwfan123 2 days ago
Comment by genidoi 2 days ago
Now I have taken note to auto-distrust any “article” that lacks an author name, who is willing to personally own any accusations of the article being AI slop.
Comment by Nextgrid 2 days ago
Comment by rundef 2 days ago
Comment by genidoi 2 days ago
Comment by Scubabear68 2 days ago
What happens outside the exchange really doesn’t matter. The ordering will not happen until it hits the exchange.
And that is why algorithmic traders want their algos in a closet as close to the exchange both physically and also in terms of network hops as possible.
Comment by cgio 2 days ago
Comment by tcbawo 2 days ago
Comment by cgio 2 days ago
Comment by teleforce 1 day ago
But for the required stringent latency, Kafka for head of line (HoL) blocking under concurrent events can be an issue though [1].
[1] What If We Could Rebuild Kafka from Scratch? (220 comments)
Comment by rhodey 2 days ago
Comment by nick0garvey 2 days ago
How is this avoiding data loss if the lead sequencer goes down after acking but without the replica receiving the write?
Comment by thijson 2 days ago
GPS can provide fairly accurate timestamps. There's a few other GLONASS systems as well for extra reliability.
Comment by SOTGO 2 days ago
Comment by extraduder_ire 2 days ago
Comment by hamiecod 2 days ago
Comment by bob1029 1 day ago
Comment by 8cvor6j844qw_d6 2 days ago
Comment by hamiecod 2 days ago
Comment by HolyLampshade 2 days ago
Of the many thing trading platforms are attempting to do, the two most relevant here are the overall latency and more importantly where serialization occurs on the system.
Latency itself is only relevant as it applies to the “uncertainty” period where capital is tied up before the result of the instruction is acknowledged. Firms can only have so much capital risk, and so these moments end up being little dead periods. So long as the latency is reasonably deterministic though it’s mostly inconsequential if a platform takes 25us or 25ms to return an order acknowledgement (this is slightly more relevant in environments where there are potentially multiple venues to trade a product on, but in terms of global financial systems these environments are exceptions and not the norm). Latency is really only important when factored alongside some metric indicating a failure of business logic (failures to execute on aggressive orders or failures to cancel in time are two typical metrics)
The most important to many participants is where serialization occurs on the trading venue (what the initial portion of this blog is about; determining who was “first”). Usually this is to the tune of 1-2ns (in some cases lower). There are diminishing returns however to making this absolute in physical terms. A small handful of venues have attempted to address serialization at the very edge of their systems, but the net result is just a change in how firms that are extremely sensitive to being first apply technical expertise to the problem.
Most “good” venues permit an amount of slop in their systems (usually to the tune of 5-10% of the overall latency) which reduces the benefits of playing the sorts of ridiculous games to be “first”. There ends up being a hard limit to the economic benefit of throwing man hours and infrastructure at the problem.
Comment by contingencies 1 day ago