How exchanges turn order books into distributed logs

Posted by rundef 7 days ago

Comments

Comment by alexpotato 2 days ago

At a past job (hedge fund), my role was to co-ordinate investigations into why latency may have changed when sending orders.

A couple of quants had built a random forest regression model that could take inputs like time of day, exchange, order volume etc and spit out an interval of what latency had historically been in that range.

If the latency moved outside that range, an alert would fire and then I would co-ordinate a response with the a variety of teams e.g. trading, networking, Linux etc

If we excluded changes on our side as the culprit, we would reach out to the exchange and talk to our sales rep there would might also pull in networking etc.

Some exchanges, EUREX comes to mind, were phenomenal at helping us identify issues. e.g. they once swapped out a cable that was a few feet longer than the older cable and that's why the latency increased.

One day, it's IEX, of Flash Boys fame, that triggers an alert. Nothing changed on our side so we call them. We are going back and forth with the networking engineer and then the sales rep says, in almost hushed tones:

"Look, I've worked at other exchange so I get where you are coming from in asking these questions. Problem is, b/c of our founding ethos, we are actually not allowed to track our own internal latency so we really can't help you identify the root cause. I REALLY wish it was different."

I love this story b/c HN, as a technology focused site, often thinks all problems have technical solutions but sometimes it's actually a people or process solution.

Also, incentives and "philosophy of the founders" matter a lot too.

Comment by philipov 2 days ago

What kind of founding ethos doesn't allow tracking internal latency? Is their founding ethos "Never Admit Responsibility?"; "Never Leave A Paper Trail?"

This company's official ethical foundation is "Don't Get Caught."

Comment by WJW 2 days ago

From the wiki about IEX: "It was founded in 2012 in order to mitigate the effects of high-frequency trading." I can see how they don't want to track internal latency as part of that, or at least not share those numbers with outsiders. That just encourages high frequency traders again.

Comment by pants2 2 days ago

One would hope for a more technical solution to HFT than willful ignorance lol. For example, they could batch up orders every second and randomize them.

Comment by atomicnumber3 2 days ago

I worked in HFT. (Though am now completely out of fintech and have no skin in the game). "Flash Boys" traditional HFT is dead already, the trade collapsed in 2016-2018 when both larger institutions got less dumb with order execution, and also several HFTs "switched sides" and basically offered "non-dumb order execution" as a service to any institutions who were unable to play the speed game themselves. Look at how Virtu's revenue changed from mostly trading to mostly order execution services over that time period.

Flash Boys was always poorly researched and largely ignorant of actual market microstructure and who the relevant market participants were, but it also aged quite poorly as all of their "activism" was useless because the market participants just all smartened up purely profit-driven.

If you want to be activist about something, the best bet for 2026 is probably that so much volume is moving off the lit exchanges into internal matching and it degrades the quality of price discovery happening. But honestly even that's a hard sell because much of that flow is "dumb money" just wanting to transact at the NBBO.

Actually, here's the best thing to be upset about: apps gamifying stock trading / investing into basically SEC-regulated gambling.

Comment by Workaccount2 2 days ago

This is what should happen, because what the game actually being played is to profit off those who cannot react fast enough to news event, rather than profit off those who mispriced their order.

Or leave things in place, but put a 1 minute transaction freeze during binary events, and fill the order book during that time with no regard for when an order was placed, just random allocation of order fills coming out of the 1 minute pause.

These funds would lose their shit if they had to go back to knowledge being the only edge rather than speed and knowledge.

Comment by neonbrain 2 days ago

This isn't a good approach because it assumes there are no market makers on trading venues, and that they (as well as exchanges) do not compete for order flow. Also, maybe you haven't noticed, but stocks are often frozen during news announcements by regulatory request, so such pauses are already in place and are designed to maintain market integrity, not disrupt it with arbitrary fills.

Comment by stuxnet79 2 days ago

The founder Brad Katsuyama talks about his background and motivation for starting the company here - https://www.youtube.com/watch?v=N9hoqFpDjVs

It might add a bit of color to this conversation.

Comment by lopatin 2 days ago

Can you talk a bit more about the incentives to trade latency sensitive strategies on IEX in the first place? Is it still lucrative for its liquidity despite them artificially slowing down orders? Does a meta game evolve with HFTs all working around their system, essentially making it still a HFT playground but with extra steps? Do you think their unexpected latency increase for you guys was intentional, to free the water from sharks?

Comment by noitpmeder 2 days ago

Curious what your actual role was -- sounds very interesting! Project manager? Dev? Operations specialist? E.g. were you hired into this role, and what were the requisites?

Comment by alexpotato 2 days ago

I was what was called "Trade Desk".

Many firms have them and they are a hybrid of:

- DevOps (e.g. we help, or own, deployments to production)

- SRE (e.g. we own the dashboards that monitored trading and would manage outages etc)

- Trading Operations (e.g. we would work with exchanges to set up connections, cancel orders etc)

My background is:

- CompSci/Economics BA

- MBA

- ~20 years of basically doing the above roles. I started supporting an in house Order Management System at a large bank and then went from there.

For more detail, here is my LinkedIn: https://www.linkedin.com/in/alex-elliott-3210352/

I also have a thread about the types of outages you see in this line of work here: https://x.com/alexpotato/status/1215876962809339904?s=20

(I have a lot of other trading/SRE related threads here: https://x.com/alexpotato/status/1212223167944478720?s=20)

Comment by noitpmeder 2 days ago

Thanks for all the info!

I'm a front office engineer at a prop firm -- always interesting to get insight into how others do it.

We have fairly similar parallels, maybe with the exception of throwing new exchange connections to the dedicated networking group.

Always love watching their incident responses from afar (usually while getting impacted desks to put away the pitchforks). Great examples of crisis management, effectiveness and prioritization under pressure, ... All while being extremely pragmatic about actual vs perceived risk.

(I'm sure joining KCG in August of 2012 was a wild time...)

Comment by alexpotato 2 days ago

You are very welcome!

It's definitely a job that you don't hear much about but has a lot of interesting positives for people who like technology and trading. Especially if you prefer shorter term, high intensity work vs long term projects (e.g. like developers).

> Always love watching their incident responses from afar

I actually have a thread on that too: https://x.com/alexpotato/status/1227335960788160513?s=20

> (I'm sure joining KCG in August of 2012 was a wild time...)

And not surprisingly, a thread on that as well: https://x.com/alexpotato/status/1501174282969305093?s=20

Comment by 8cvor6j844qw_d6 2 days ago

May I know if someone with no trading knowledge can get into this field? Or do new hires that you've seen generally have some background knowledge on related to trading, etc.?

I did consider applying for a role in a very similar field, but figured I'll be fighting an uphill battle with no knowledge in trading/stock market/etc.

Comment by alexpotato 2 days ago

So I wrote up how I ended there: https://x.com/alexpotato/status/1663668616233885699?s=20

but that story is not the most efficient way (although I do talk about a better approach at the end).

To summarize:

A LOT of hedge funds hire non finance people for specific roles e.g. cloud, Linux tuning, networking etc

The smarter ones have realized that there are great people everywhere e.g. Gaming company SREs have a lot of relevant experience due to high traffic load, short SLAs and lots of financial risk due to outages. Applying for a role in one of those departments is a lot easier than trying to jump directly into a trading desk/operations role.

Finally, knowing someone on the inside also helps a lot which is made MUCH easier by LinkedIn, Twitter etc

Comment by TheJoeMan 2 days ago

You know how coders are expected to grind out leetcode interviews? For the finance fields, a common interview topic is what you read in “The Journal” (WSJ). So just stay on top of it for a few weeks, see some trends, etc.

Comment by kasey_junk 2 days ago

I’ve worked in finance for 25 years and never even heard of this coming up in an interview.

Comment by noitpmeder 1 day ago

Imo it's not out of place in context if one is trying to determine if the candidate has an interest in finance.

Now, I'm not disqualifying them if they dont read the journal. But if they can't demonstrate any proactive interest in finance, or tell me about some happenings/events/stories they personally find interesting (theres a ton of interesting stuff happening) its definitely an amber flag.

Comment by reactordev 2 days ago

All technical problems are people problems

Comment by dboreham 2 days ago

> they once swapped out a cable that was a few feet longer than the older cable and that's why the latency increased

That was not why. Possibly the cable made a difference (had an open circuit that made the NICs back down to a lower speed; noisy leading to retransmissions) but it wasn't the length per se.

Comment by thadt 2 days ago

Well, it depends on the granularity of the time scale right? When you're measuring milliseconds, then the cable length probably isn't a thing factoring into your latency calculation.

When we're measuring time on the scale of nanoseconds then, yes, cable length is definitely something we care about and will reliably show up in measurements. In some situations, we not only care about the cable length, but also its temperature.

Comment by thijson 2 days ago

It can be the length, there's about 1ns latency per foot.

Comment by jshaqaw 2 days ago

This is interesting but also just hilarious at a meta level. I was a “low frequency” ie manual fundamental based hedge fund investor for many years. In general I think hft is a net benefit to liquidity when done in compliance with the text and spirit of regulations. But no real world allocation of resources is improved by having to game transactions to this level of time granularity. This is just society pouring resources down a zero sum black hole. Open to hearing contrary views of course.

Comment by shawabawa3 2 days ago

I've been wondering if the stock market would be more efficient if trades executed only every <small time interval> instead of continuously, i.e. every 1 second an opening trade style cross book clearance happens. Orders would have to be on the book for a full interval to execute to prevent last millisecond rushes at the end of an interval

I'm probably missing some second order effects but it feels like this would mitigate the need for race to the bottom latencies and would also provide protection against fat fingered executions in that every trading algorithm would have a full second to arbitrage it

Comment by quickthrowman 2 days ago

You could do this but the cost would be wider bid/ask spreads for all market participants. If you make it harder for market makers to hedge their position, they will collect a larger spread to account for that. A whole lot of liquidity can disappear in a second when news hits.

I’d rather have penny-wide spreads on SPY than restrict trading speed for HFTs. Providing liquidity is beneficial to everyone, even if insane amounts of money are spent by HFTs to gain an edge.

Comment by Workaccount2 2 days ago

It's really binary events that they should throttle execution and do batch orders.

The bad part of HFT is paying the smartest young minds this country has to offer to figure out how the parse GDP data as fast as computationally possible so they can send in an order before other players can. That's a dumb game that doesn't provide much benefit (besides speed in sparse critical moments adding a few % to the funds ROI).

They can arbitrage all day, but don't let them buy every Taylor Swift concert ticket the moment it goes on sale because they have a co-located office with a direct fiber line, ASIC filled servers, and API access.

Comment by jshaqaw 1 day ago

Would be interested to see real numbers around societal value from marginal added liquidity versus aggregate spend into the zero sum arms race.

I have also seen enough to be quite sure that many hft strategies are quite normie investor predatory.

Again, I’m not zealot. I trade stuff. I love liquidity. I’m happy to pay someone some fraction of a penny to change my mind. Service provided. But the returns from vanilla liquidity provision commoditized long ago to uninteresting margins. That leaves a lot more of the hft alpha pool in the predatory strategies and capital flows where the incentives are.

Comment by blibble 2 days ago

this is exactly what many dark pools do

"continuous periodic auctions"

Comment by dmurray 2 days ago

This article both undersells and oversells the technical challenge exchanges solve.

First, it is of course possible to apply horizontal scaling through sharding. My order on Tesla doesn't affect your order on Apple, so it's possible to run each product on its own matching engine, its own set of gateways, etc. Most exchanges don't go this far: they might have one cluster for stocks starting A-E, etc. So they don't even exhaust the benefits available from horizontal scaling, partly because this would be expensive.

On the other hand, it's not just the sequencer that has to process all these events in strict order - which might make you think it's just a matter of returning a single increasing sequence number for every request. The matching engine which sits downstream of the sequencer also has to consume all the events and apply a much more complicated algorithm: the matching algorithm described in the article as "a pure function of the log".

Components outside of that can generally be scaled more easily: for example, a gateway cares only about activity on the orders it originally received.

The article is largely correct that separating the sequencer from the matching engine allows you to recover if the latter crashes. But this may only be a theoretical benefit. Replaying and reprocessing a day's worth of messages takes a substantial fraction of the day, because the system is already operating close to its capacity. And after it crashed, you still need to figure out which customers think they got their orders executed, and allow them to cancel outstanding orders.

Comment by londons_explore 2 days ago

Once sequencing is done, the matching algorithm can run with some parallelism.

For example, Order A and order B might interact with eachother... but they also might not. If we assume they do not, we can have them processed totally independently and in parallel, and then only if we later determine they should have interacted with each other then we throw away the results and reprocess.

It is very similar to the way speculative execution happens in CPU's. Assume something then throw away the results if your assumption was wrong.

Comment by noitpmeder 2 days ago

Off the cuff, id expect this leads to less improvement than you might think. The vast majority of orders, especially orders arriving in sequence close to one another, are likely on a small set of extremely liquid symbols, and usually all for prices at or near the top of the book for those symbols.

Happy to discuss more, might be off the mark... these optimizations are always very interesting in their theoretical vs actual perf impact.

Comment by eep_social 2 days ago

in high scale stateless app services this approach is typically used to lower tail latency. two identical service instances will be sent the same request and whichever one returns faster “wins” which protects you from a bad instance or even one which happens to be heavily loaded.

Comment by noitpmeder 2 days ago

I'm not sure I follow. In this instance we're talking about multiple backend matching engines... Correct? By definition they must be kept in sync, or at least have total omnipotent knowledge about the state of all other backend book states.

Comment by lanstin 2 days ago

And the tail latencies are wildly improved with each addition dup. Has to be idempotent of course.

Comment by blibble 2 days ago

> My order on Tesla doesn't affect your order on Apple

not necessarily

many exchanges allow orders into one instrument to match on another

(very, very common on derivatives exchanges)

Comment by dmurray 2 days ago

Yes, I was going to note that this doesn't necessarily apply on derivatives exchanges. But

a) I don't know of any exchange where this could be true for specifically Apple and Tesla, so the example is OK

b) you can still get some level of isolation, even on commodities exchanges you can't typically affect the gold book with your crude oil order (the typical case is that your order to buy oil futures in 2027 matches against someone selling oil in 2026, plus someone selling a calendar spread)

c) for exchanges that do offer this kind of functionality, one of the ways they deal with high volumes is by temporarily disabling this feature.

Comment by 2 days ago

Comment by halfmatthalfcat 2 days ago

The amount of multi-leg strategies is insane

Comment by alexpotato 2 days ago

> Every modern exchange has a single logical sequencer. No matter how many gateways feed the system, all events flow into one component whose job is to assign the next sequence number. That integer defines the global timeline.

A notable edge case here is that if EVERYTHING (e.g. market data AND orders) goes through the sequencer then you can, essentially, Denial of Service to key parts of the trading flow.

e.g. one of the first exchanges to switch to a sequencer model was famous for having big market data bursts and then huge order entry delays b/c each order got stuck in the sequencer queue. In other words, the queue would be 99.99% market data with orders sprinkled in randomly.

Comment by blibble 2 days ago

why would market data go through the sequenced stream on an exchange?

for an exchange: market data is a projection of the order book, an observer that sits on the stream but doesn't contribute to it

and client ports have rate limits

Comment by alexpotato 2 days ago

B/c, by design, you want the archived stream of events to include everything.

e.g. a lot of these systems have a "replay" node that can be used by components that just restarted. You want the replay to include ALL of the messages seen so you can rebuild the state at any given point.

(There are, of course, tradeoffs to this so I'm just commenting on the "single sequencer" design philosophy)

Comment by blibble 2 days ago

by definition: an exchange doesn't need any reference to outside market data

even for systems built on a sequencer which do (e.g. an OMS), the volume is too large

the usual strategy is for processes which require it, is to sample it and them stamp it on commands

which maintains the invariants

(my background: I have been a developer on one of Mike Blum's original sequencers)

Comment by alexpotato 1 day ago

Fair point that you don't actually need the market data you send out if you have the original order book data internally.

Comment by usefulcat 2 days ago

How would an exchange enforce the trade through rule without any outside market data?

Comment by nly 4 hours ago

These distributed sequencer solutions are for resilience, and they add a lot of latency because each node needs to do something like RAFT. Exchanges generally don't care aggressively about low latency, they care about resilience and fairness. It's the hedge funds etc looking for an edge.

One thing often missed here is that most orders, even from most hedge funds and prop trading shops, still go via broker systems. Direct Market Access is getting more common but it's often a pain in the arse from a regulatory and disclosures perspective, and means you lose out on short locate (shares that you can borrow from your broker to short sell).

"Sponsored Access", where you connect directly to an exchange but your broker monitors your activity via a drop copy, is a happy middle ground.

Surprisingly though, I've heard of at least one trading venue where going direct is slower, because the venues own risk checks are slower than the the ones implemented by at least one broker, and the broker themselves are allowed to bypass the risk checks put in place at the exchange for general DMA clients. "Direct" is clearly subject to negotiation.

I've also heard of brokers who tried to implement their gateways in FPGA, and have later shuttered the project, having gone back to relatively slow software gateways for the flexibility.

A lot of trading still happens via FIX, which is a slow ASCII protocol. Most prop shops will have aggressively optimized FIX parsers and serialisers out of necessity.

People think all trading happens in these elite, bleeding-edge hardcore sub-microsecond systems, but a lot of it is just dogshit.

Things are a bit more optimised in the derivatives space because of the insane volumes (Options trading just for US equities is easily into the petabytes of storage per year).

Comment by croemer 2 days ago

Smells of AI writing: "Timestamps aren't enough. Exchanges need a stronger ordering primitive." etc

Comment by bwfan123 2 days ago

Interesting comment, I "felt" the ai too in an undescribable way. What are some obvious tells ?

Comment by andrepd 2 days ago

The incessant bullet lists and the conclusion titled "Conclusion" give it away. And above all, the complete lack of "voice". You can tell when a human is speaking and when a sanitised amorphous blob of averageness is speaking.

Comment by genidoi 2 days ago

Also three uses of a semi-colon for no reason. Nobody writes like this.

> The log is the truth; the order book is just a real-time projection of this sequence.

> The book is fast; the log is truth.

> Matching engines can crash; the log cannot.

Comment by bwfan123 2 days ago

I need to sharpen my BS sensor. At first glance, I struggled to parse the voice in the article. Going back to the article I can now see the obvious gaps. Generally, AI tends to say things that make us go - "what the hell is this" ? for example in the article "The Problem: Ordering Chaos" is a very weird way to phrase it. As a human I struggled to accept it, and I did by stretching the meaning of that phrase to world model where it made sense. ie, our tendency is to give wide leeway to what we read or see and be very "accepting" in that sense. Instead, i think a better option is to reject everything we see or read as the default.

Comment by genidoi 2 days ago

I didn’t catch it either on the first pass but also felt something was off about the article, as if a human had sanitised most of the AI idiosyncrasies out.

Now I have taken note to auto-distrust any “article” that lacks an author name, who is willing to personally own any accusations of the article being AI slop.

Comment by Nextgrid 2 days ago

"It isn't <this>, it's <that>" and variations thereof are the biggest tell.

Comment by rundef 2 days ago

yes, this is AI-assisted writing, I'm not hiding from it. but this isn't just copy-pasting. I still spend up to 15 hours working on each article.

Comment by genidoi 2 days ago

Considering that you chose to not include your name or even a HN username in the byline of the article, there is an argument to be made that you are, in fact, hiding from it.

Comment by Scubabear68 2 days ago

I wish the article had stuck with the technical topic at hand and left out the embellishment. In particular the opening piece talking about what is happening outside the exchange.

What happens outside the exchange really doesn’t matter. The ordering will not happen until it hits the exchange.

And that is why algorithmic traders want their algos in a closet as close to the exchange both physically and also in terms of network hops as possible.

Comment by 2 days ago

Comment by cgio 2 days ago

The title is obviously the wrong way around, exchanges turn distributed logs into order books. The distributed part is a resilience decision but not essential to the design (technically writing to a disk would give persistence with less ability to recover, or with some potential gaps in the case of failure (remember there is a sequence published on the other end too, the market data feed)). As noted in the article, the sequencer is a single-threaded, not parallelisable process. Distribution is just a configuration of that single threaded path. Parallelisation is feasible to some extent by sharding across order books themselves (dependencies between books may complicate this).

Comment by tcbawo 2 days ago

It would not surprise me at all if the sequencing step was done via FPGA processing many network inputs at line rate with a shared monotonic clock. This would give it some amount of parallelism.

Comment by cgio 2 days ago

good point, sequencing is very minimal, therefore some parallelism is feasible that way, but the pipeline is not that deep, at least ideally. Of course if people are chasing nano-seconds, it may make sense.

Comment by teleforce 1 day ago

This distributed logs nature of the exchanges is very much suitable for Kafka.

But for the required stringent latency, Kafka for head of line (HoL) blocking under concurrent events can be an issue though [1].

[1] What If We Could Rebuild Kafka from Scratch? (220 comments)

https://news.ycombinator.com/item?id=43790420

Comment by rhodey 2 days ago

Always fun to read about HFT. If anyone wants to learn about the Order Book data structure you can find it in JS here:

https://github.com/rhodey/limit-order-book

https://www.npmjs.com/package/limit-order-book

Comment by nick0garvey 2 days ago

> Pipelined replication: the sequencer assigns a sequence number immediately and ships the event to replicas in parallel. Matching doesn't wait for the replicas to acknowledge.

How is this avoiding data loss if the lead sequencer goes down after acking but without the replica receiving the write?

Comment by thijson 2 days ago

The article says it's not enough to accurately timestamp orders at the various order entry portals. I didn't understand why that's not enough.

GPS can provide fairly accurate timestamps. There's a few other GLONASS systems as well for extra reliability.

Comment by SOTGO 2 days ago

It's probably possible to use timestamps, but I suppose you would have to handle ties in more places, with sequence numbers you only break ties once. It appears that the FIX specifications allows up to microsecond precision, but given the volume of messages it's still likely a problem. It's also easier to work with integer sequence numbers than timestamps, but that's also a small consideration.

Comment by extraduder_ire 2 days ago

GNSS is the generic term, GLONASS is the name for the Russian system.

Comment by hamiecod 2 days ago

How long can the exchanges scale their sequencer systems (which are sequential) vertically? The trading volume is only rising with time at a higher rate than the advancement of low latency tech.

Comment by bob1029 1 day ago

In the most ideal case a sequencer can handle ~half a billion orders per second if all it's doing is assigning a number to each item. LMAX Disruptor using value types pushes 4-500 million events/s on modern hardware.

Comment by 8cvor6j844qw_d6 2 days ago

Very interesting. I wished to know the author. The site doesn't seem to have readily available information on the author.

Comment by hamiecod 2 days ago

After reading some other articles on the site, I have a feeling that it could be written by AI.

Comment by HolyLampshade 2 days ago

I’m a tad late to the party, but it’s worth providing a little context to the technical conversation.

Of the many thing trading platforms are attempting to do, the two most relevant here are the overall latency and more importantly where serialization occurs on the system.

Latency itself is only relevant as it applies to the “uncertainty” period where capital is tied up before the result of the instruction is acknowledged. Firms can only have so much capital risk, and so these moments end up being little dead periods. So long as the latency is reasonably deterministic though it’s mostly inconsequential if a platform takes 25us or 25ms to return an order acknowledgement (this is slightly more relevant in environments where there are potentially multiple venues to trade a product on, but in terms of global financial systems these environments are exceptions and not the norm). Latency is really only important when factored alongside some metric indicating a failure of business logic (failures to execute on aggressive orders or failures to cancel in time are two typical metrics)

The most important to many participants is where serialization occurs on the trading venue (what the initial portion of this blog is about; determining who was “first”). Usually this is to the tune of 1-2ns (in some cases lower). There are diminishing returns however to making this absolute in physical terms. A small handful of venues have attempted to address serialization at the very edge of their systems, but the net result is just a change in how firms that are extremely sensitive to being first apply technical expertise to the problem.

Most “good” venues permit an amount of slop in their systems (usually to the tune of 5-10% of the overall latency) which reduces the benefits of playing the sorts of ridiculous games to be “first”. There ends up being a hard limit to the economic benefit of throwing man hours and infrastructure at the problem.

Comment by contingencies 1 day ago

So many wrong statements here it's difficult to know where to start. Perhaps "Why Eventual Consistency Is Impossible in Finance" which is glaring: most of the economy runs on eventual consistency (brokers, banks, credit cards, crypto consensus).