A cache-friendly IPv6 LPM with AVX-512 (linearized B+-tree, real BGP benchmarks)
Posted by debugga 1 day ago
Comments
Comment by debugga 1 day ago
Includes: - AVX-512 SIMD path + scalar fallback - Wait-free lookups with rebuild-and-swap dynamic FIB - Benchmarks on synthetic data and real RIPE RIS BGP (~254K prefixes)
Interesting result: on real BGP + uniform random lookups, a plain Patricia trie can sometimes match or beat the SIMD tree due to cache locality and early exits.
Would love feedback, especially comparisons with PopTrie / CP-Trie.
Comment by talsania 1 day ago
Comment by Sesse__ 1 day ago
Comment by zx2c4 1 day ago
Comment by newman314 1 day ago
Comment by matt-p 19 hours ago
E.g with a CPU with say 256MB L3 cache lookups are many many times more performant because you don't need to check ram on many/any lookups. Hot top levels in L2 > hot path in local CCD L3 > rest somewhere in socket L3 > DRAM misses (ideally almost 0)
Comment by throwaway81523 1 day ago
Comment by ozgrakkurt 1 day ago
Comment by ozgrakkurt 1 day ago
[0] https://github.com/esutcu/planb-lpm/blob/748d19d5fbd945cefa3...
[1] https://github.com/esutcu/planb-lpm/blob/748d19d5fbd945cefa3...
Comment by NooneAtAll3 1 day ago
Comment by camel-cdr 1 day ago
__m512i vx = _mm512_set1_epi64(static_cast<long long>(x));
__m512i vk = _mm512_load_si512(reinterpret_cast<const __m512i*>(base));
__mmask8 m = _mm512_cmp_epu64_mask(vx, vk, _MM_CMPINT_GE);
return static_cast<std::uint32_t>(__builtin_popcount(m));
would be replaced with: return __riscv_vcpop(__riscv_vmsgeu(__riscv_vle64_v_u64m1(base, FANOUT), x, FANOUT), FANOUT);
and you set FANOUT to __riscv_vsetvlmax_e32m1() at runtime.Alternatively, if you don't want a dynamic FANOUT you keep the FANOUT=8 (or another constant) and do a stripmining loop
size_t cnt = 0;
for (size_t vl, n = 8; n > 0; n -= vl, base += vl) {
vl = __riscv_vsetvl_e64m1(n);
cnt += __riscv_vcpop(__riscv_vmsgeu(__riscv_vle64_v_u64m1(base, vl), x, vl), vl);
}
return cnt;
This will take FANOUT/VLEN iterations and the branches will be essentially almost predicted.If you know FANOUT is always 8 and you'll never want to changes it, you could alternatively use select the optimal LMUL:
size_t vl = __riscv_vsetvlmax_e32m1();
if (vl == 2) return __riscv_vcpop(__riscv_vmsgeu(__riscv_vle64_v_u64m4(base, 8), x, 8), 8);
if (vl == 4) return __riscv_vcpop(__riscv_vmsge(u__riscv_vle64_v_u64m2(base, 8), x, 8), 8);
return __riscv_vcpop(__riscv_vmsgeu(__riscv_vle64_v_u64m1(base, 8), x, 8), 8);Comment by sylware 1 day ago
Comment by ozgrakkurt 1 day ago
It does look a bit AI generated though
Comment by sylware 2 hours ago
Comment by simoncion 1 day ago
These days, when I hear a project owner/manager describe the project as a "clean room reimplementation", I expect that they got an LLM [0] to extrude it. This expectation will not always be correct, but it'll be correct more likely than not.
[0] ...whose "training" data almost certainly contains at least one implementation of whatever it is that it's being instructed to extrude...
Comment by sylware 2 hours ago
It seems there is a signal (here on HN) that coding LLM would be really good at mass porting c++ code to plain and simple C to remove the c++ kludge dependency.
Comment by pmarreck 22 hours ago
Comment by simoncion 14 hours ago
There's also the matter of copyright laundering and the still-unsettled issue of license laundering, but I understand that a very vocal subset of programmers and tech management gives zero shit about those sorts of things. [0]
[0] I would argue that -most of the time- a program that you're not legally permitted to run (or distribute to others, if your intention was to distribute that program) is just as incorrect as one that produces the wrong output. If a program-extrusion tool intermittently produces programs that you're not permitted to distribute, then that tool is broken. [1]
[1] For those with sensitive knees: do note that I said "the still-unsettled issue of license laundering" in my last paragraph. Footnote zero is talking about a possible future where it is determined that the mere act of running gobs of code through an LLM does not mean that the output of that LLM is not a derived work of the code the tool was "trained" on. Perhaps license-washing will end up being legal, but I don't see Google, Microsoft, and other tech megacorps being very happy about the possibility of someone being totally free to run their cash cow codebases through an LLM, produce a good-enough "reimplementation", and stand up a competitor business on the cheap [2] by bypassing the squillions of dollars in R&D costs needed to produce those cash cow codebases.
[2] ...or simply release the code as Free Software...
Comment by alex_duf 21 hours ago
That goes for a lot of comments here accusing each other of being a bot.
I feel like we've known internet trust at its highest and it can only go one way now.