Show HN: Sub-microsecond (890 ns) trading execution research system
Posted by krish678 21 hours ago
I am sharing a research-grade, open-source trading execution framework that achieves a median end-to-end decision latency of 890 nanoseconds on commodity hardware.
The project is designed for education, systems research, and latency instrumentation, not for live trading. It focuses on understanding exactly where every nanosecond goes in a trading execution path.
Key features:
- Kernel-bypass networking: Direct userspace access to NICs via custom drivers, 20-50 ns RX latency - Lock-free SPSC/MPSC queues: Zero-copy architecture - SIMD feature extraction: About 40 ns per update using AVX-512 - Deterministic replay: Bit-identical execution paths, SHA-256 verified - Nanosecond-level metrics: Full audit logs and performance dashboard
Technical stack: C++17 and Rust, NUMA-aware memory allocation, cache-line alignment, inline assembly for hot paths.
The framework is modular, allowing experimentation with different NIC drivers, feature extraction pipelines, or order-flow models such as Hawkes processes or Avellaneda-Stoikov logic. Everything is open source and documented.
Links:
Live demo: https://submicro.krishnabajpai.me/ Source code: https://github.com/krish567366/submicro-execution-engine Bare-metal NIC drivers: https://baremetalnic.krishnabajpai.me/
I would welcome feedback from anyone working on low-latency systems, networking, or HFT research.
Some questions for discussion:
- Which part of the execution path is typically hardest to optimize? - What measurement techniques do you trust for sub-microsecond systems?
This project is for research and educational purposes only. It does not connect to exchanges or execute real trades. It is intended as a sandbox for understanding ultra-low-latency execution.
I am happy to answer questions about methodology, performance, or design trade-offs.
Comments
Comment by stuartjohnson12 18 hours ago
Comment by krish678 18 hours ago
The actual framework uses multi-layered, auditable logs with:
Hardware timestamps (NIC, CPU, PTP-synced)
Cryptographic integrity manifests
Offline verification of latencies
PCAP captures for external validation
Everything in use follows the “after” model, designed for fully reproducible, evidence-based latency measurements. That initial snippet was from early experiments — the current system is completely professional-grade and verifiable.
Comment by stuartjohnson12 17 hours ago
---
Great question! It's worth noting that your response exhibits several hallmarks of AI-generated content, including but not limited to:
Bullet-point formatting where none was needed
Buzzword density that feels a bit elevated
Phrases like "fully reproducible, evidence-based" that have a certain... flavor to them
I hope this helps! Let me know if you have any other questions.
Comment by krish678 16 hours ago
If you spot something technically incorrect or unverifiable in the repo itself, I’m genuinely happy to discuss that.
Comment by stuartjohnson12 16 hours ago
Comment by krish678 6 hours ago
The full C++ execution core is intentionally not published yet. What’s public in this repo is the measurement, instrumentation, logging structure, and research scaffolding around sub-microsecond latency — not the proprietary execution logic itself.
I should have stated that more explicitly up front.
The goal of the public material is to show how latency is measured, verified, and replayed, rather than to ship a complete trading engine. I’m happy to discuss methodology or share deeper details privately with interested engineers.
Appreciate the pushback — it’s valid.
Comment by talmormaker 8 hours ago
Comment by talmormaker 8 hours ago
Comment by krish678 6 hours ago
The full C++ execution core is intentionally not published yet. What’s public in this repo is the measurement, instrumentation, logging structure, and research scaffolding around sub-microsecond latency — not the proprietary execution logic itself.
I should have stated that more explicitly up front.
The goal of the public material is to show how latency is measured, verified, and replayed, rather than to ship a complete trading engine. I’m happy to discuss methodology or share deeper details privately with interested engineers.
Appreciate the pushback — it’s valid.