The Cost of a Closure in C
Posted by ingve 5 days ago
Comments
Comment by kazinator 4 days ago
[disclaimer] Without brushing up on the details of this, I strongly suspect that this is about removing the need for executable stacks than performance. Allocating a trampoline on the stack rather than heap is good for efficiency.
These days, many GNU/Linux distros are disabling executable stacks by default in their toolchain configuration, both for building the distro and for the toolchain offered by the system to the user.
When you use GCC local functions, it overrides the linker behavior so that the executable is marked for executable stacks.
Of course, that is a security concession because when your stack is executable, that enables malicious remote execution code to work that relies on injecting code into the stack via a buffer overflow and tricking the process into jumping to it.
If trampolines can be allocated in a heap, then you don't need an executable stack. You do need an executable heap, or an executable dedicated heap for these allocations. (Trampolines are all the same size, so they could be packed into an array.)
Programs which indirect upon GCC local functions are not aware of the trampolines. The trampolines are deallocated naturally when the stack rolls back on function return or longjmp, or a C++ exception passing through.
Heap-allocated trampolines have an obvious deallocation problem; it would be interesting to see what strategy is used for that.
Comment by fuhsnn 4 days ago
With -ftrampoline-impl=heap, GCC automatically insert[1] pairs of constructor/destructor routines from libgcc which were built around mmap/munmap.
Comment by kazinator 4 days ago
It probably has some logic for dealing with different threads, and also for dealing with stack frames abandoned by longjmp or a longjmp-like mechanism.
Comment by mananaysiempre 3 days ago
Long story short:
# (*funp)();
testl #1, %eax
jz 1f
movq 8(%eax), r10
movq (%eax), %eax
1: callq *%eax
[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg00853.h...Comment by mananaysiempre 1 day ago
Comment by Rochus 4 days ago
The most striking surprise is the magnitude of the gap between std::function and std::function_ref. It turns out std::function (the owning container) forces a "copy-by-value" semantics deeply into the recursion. In the "Man-or-Boy" test, this apparently causes an exponential explosion of copying the closure state at every recursive step. std::function_ref (the non-owning view) avoids this entirely.
Comment by gpderetta 4 days ago
Comment by Rochus 4 days ago
Comment by boris 4 days ago
Comment by gpderetta 4 days ago
Differently from GCC14, GCC15 itself does seem to be able to optimize the allocation (and the whole std::function) in trivial cases though (independently of what the library does).
Comment by unwind 4 days ago
Therefore it's very jarring with this text after the first C code example:
This uses a static variable to have it persist between both the compare function calls that qsort makes and the main call which (potentially) changes its value to be 1 instead of 0
This feels completely made up, and/or some confusion about things that I would expect an author of a piece like this to really know.
In reality, in this usage (at the global outermost scope level) `static` has nothing to do with persistence. All it does is make the variable "private" to the translation unit (C parliance, read as "C source code file"). The value will "persist" since the global outermost scope can't go out of scope while the program is running.
It's different when used inside a function, then it makes the value persist between invocations, in practice typically by moving the variable from the stack to the "global data" which is generally heap-allocated as the program loads. Note that C does not mention the existence of a stack for local variables, but of course that is the typical implementation on modern systems.
Comment by CerryuDu 4 days ago
The fact that you are questioning the use of the term shows that you are not familiar with the ISO C standard. What the author alludes to is static storage duration. And whether or not you use the "static" keyword in that declaration (also definition), the storage duration of the object remains "static". People mostly call those things "global variables", but the proper standardese is "static storage duration". In that sense, the author was right to use "static" for the lifetime of the object.
EDIT: if you drop "static" from that declaration, what changes is the linkage of the identifier (from internal to external).
Comment by gldrk 4 days ago
The only misleading thing here is that ‘static’ is monospaced in the article (this can’t be seen on HN). Other than that, ‘static variable’ can plausibly refer to an object with a static storage duration, which is what the C standard would call it.
>moving the variable from the stack to the "global data" which is generally heap-allocated as the program loads
It is not heap-allocated because you can’t free() it. Non-zero static data is not even anonymously mapped, it is file-backed with copy-on-write.
Comment by sfpotter 4 days ago
Comment by pjmlp 4 days ago
Comment by steveklabnik 4 days ago
This doesn’t mean that it’s impossible to make mistakes, but still.
Comment by uecker 4 days ago
Comment by steveklabnik 4 days ago
Comment by uecker 4 days ago
Comment by debugnik 4 days ago
Comment by kreco 4 days ago
If I follow your comment, you mean that he could have use a non-static global variable instead and avoid mentioning "static" keyword afterward?
Comment by unwind 4 days ago
Yes, the `static` can simply be dropped, it does no additional work for a single-file snippet like this.
I tried diving into Compiler Explorer to examine this, and it actually produces slightly different code for the with/without `static` cases, but it was confusing to deeply understand quickly enough to use the output here. Sorry.
Comment by mananaysiempre 4 days ago
Comment by Chabsff 4 days ago
Also, the difference manifests in the symbols table, not the assembly.
Comment by mananaysiempre 4 days ago
Comment by RossBencina 4 days ago
Something I've been thinking about lately is having a "state" keyword for declaring variables in a "stateful" function. This works just like "static" except instead of having a single global instance of each variable the variables are added to an automatically defined struct, whose type is available using "statetype(foo)" or some other mechanism, then you can invoke foo as with an instance of the state (in C this would be an explicit first parameter also marked with the "state" parameter.) Stateful functions are colored in the sense that if you invoke a nested stateful function its state gets added to the caller's state. This probably won't fly with separate compilation though.
Comment by vintagedave 4 days ago
C++Builder’s entire UI system is built around __closure and it is remarkably efficient: effectively, a very neat fat pointer of object instance and method.
[*] Edit: two dates on the paper, but “bound pointer to member” and they note the connection to events too: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n13...
Comment by 1f60c 4 days ago
Raku (née Perl 6) has this! https://docs.raku.org/language/variables#The_state_declarato...
Comment by juvoly 4 days ago
- where does the automatically defined struct live? Data segment might work for static, but doesn't allow dynamic use. Stack will be garbage if closure outlives function context (ie. callback, future). Heap might work, but how do you prevent leaks without C++/Rust RAII?
- while a function pointer may be copied or moved, the state area probably cannot. It may contain pointers to stack object or point into itself (think Rust's pinning)
- you already mention recursion, compilation
- ...
Comment by fuhsnn 4 days ago
Comment by RossBencina 2 days ago
Comment by RossBencina 2 days ago
Another complication is that it would be beneficial to be able to optimize state storage in the same way that stack frame resources are optimized, including things like coalescing equal values in conceptually distinct state instances. This would (I think) preclude things like sizeof(statetype(f)) which you really want for certain types of manual memory management, or it would require multiple compiler passes.
Comment by fuhsnn 4 days ago
[1] https://github.com/ThePhD/future_cxx/issues/55#issuecomment-...
Comment by tyushk 4 days ago
Comment by RossBencina 2 days ago
Comment by Progge 4 days ago
int main(int argc, char* argv[]) {
if (argc > 1) {
char\* r_loc = strchr(argv[1], 'r');
if (r_loc != NULL) {
ptrdiff_t r_from_start = (r_loc - argv[1]);
if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
in_reverse = 1;
}
}
}
...
}Why not
if (argc > 1 && strcmp(argv[1], "-r") == 0) {
in_reverse = 1;
}for example?
Comment by tapete2 4 days ago
Your solution is perfectly fine. Even if you don't have access to strchr for some reason, the original snippet is really convoluted.
You could just write (strlen(argv[1]) > 1 && argv[1][0] == '-' && argv[1][0] == 'r') if you really want to.
Comment by microtherion 4 days ago
And if you ever find yourself actually doing command line parsing, use getopt(). It handles all the corner cases reliably, and consistent with other tools.
Comment by unwind 4 days ago
Also, the use of a convoluted `if` to conditionally assign a literal boolean is a code smell (to me), I would drop the `if` and just use:
in_reverse = argc > 0 && argv[1][0] == '-' && argv[1][1] == 'r';
if a more forward-thinking/strict check is not needed.Comment by eska 4 days ago
Comment by CerryuDu 4 days ago
And then we have this "modern" way of spelling pointers, "const int* right" (note the space). In C, declaration syntax mirrors use, so it should be "const int *right", because "*right" is a "const int".
I feel too old for this shit. :(
Comment by Joker_vD 4 days ago
const int left = *(const int*)untyped_left, right = *(const int*)untyped_right;
return in_reverse?
(right > left) - (right < left)
: (left > right) - (left < right);
I wonder if there is a way to actually do it with only arithmetic, without comparisons?Comment by adrian_b 3 days ago
The feature that integer comparison must be correct regardless of overflow has been a requirement in CPU design since the earliest times. Very few CPUs, the most notable examples being Intel 8080 and RISC-V, lacked this feature. The competitors and successors of Intel 8080, i.e. Motorola MC6800 and Zilog Z80 have added it, making them much more similar to the previously existing CPUs than Intel 8080 was. The fact that even microprocessors implemented with a few thousand transistors around 1975 had this feature emphasizes how weird is that RISC-V lacks such a feature in 2025, after 50 years and being implemented with a number of transistors many orders of magnitude greater.
Comment by Joker_vD 3 days ago
SLT a2, a0, a1
SLT a0, a1, a0
SUB a0, a0, a2
RETComment by camel-cdr 3 days ago
subs w0, w10, w11
b.vx trap
RISC-V: subw a0, t0, t1
sub a1, t0, t1
bne a0, a1, trapComment by adrian_b 3 days ago
The workaround suggested by the RISC-V documentation consists in replacing a very large fraction of all instructions of a program (because there are a lot of integer additions, subtractions and comparisons in any program, close to a half of all instructions) with 3 or more instructions, in order to approximate what in any other CPU is done with single instructions.
The other ridiculous workaround proposed to save RISC-V is that any high-performance implementation must supplant its missing features by instruction fusion.
Yes, the missing hardware for overflow detection can be replaced by multiplying the number of instructions for any operation and the missing addressing modes can be synthesized by instruction fusion, but such implementation solutions are extraordinarily more expensive than the normal solutions used for 3 quarters of century in the other computers, since they were made with vacuum tubes.
Because of the extreme overhead of checking for overflow, I bet that most programs compiled for RISC-V do not check for overflow, which is not acceptable in reliable programs (even when using C/C++, I always compile them with overflow checking enabled, which should have been the default option, to be disabled only in specific cases where it can be proven that overflow is impossible and the checks reduce the performance).
Comment by Joker_vD 3 days ago
Comment by zozbot234 3 days ago
Comment by adrian_b 3 days ago
This is not what is normal overflow checking. Normal overflow checking just raises a specific exception when integer overflow happens.
This has absolutely no effect upon compiler optimizations. The compiler always generates code ignoring the possibility of exceptions. When exceptions happen, the control is passed far away to the exception handler, which decides what to do, e.g. to save debugging information and abort partially or totally the offending program, because an overflow is normally the consequence of an unforeseen program bug and it is impossible to do any action that will allow the continuation of the execution.
You should remember that there is nothing special about integer overflow, almost every instruction that the compiler generates can raise an exception at run time. Any branch instruction, any memory-access instruction, any floating-point instruction, any vector instruction can raise an exception due to hardware. In recent CPUs, integer overflow is not raised implicitly, so you have to insert a conditional branch, but this is irrelevant.
If your theory that the possibility of raising exceptions can influence compiler optimizations were true, there would exist no compiler optimizations, because from every 10 or so instructions generated by a compiler at least a half can raise various kinds of exceptions, in a manner completely unpredictable by the compiler. Adding integer overflow exceptions changes nothing.
Comment by camel-cdr 3 days ago
For testing, I use a custom qemu plugin to calculate the dynamic instruction count, dynamic uop count, and dynamic instruction size. Every instruction with multiple register writebacks was counted as one uop per writeback, and to make the results more comparable, SIMD was disabled.
I used this setup to run self-compiling single-file versions of chibicc (assembling) and tinycc (generating object file), which are small C compilers of 9K and 24K LOC respectively. Both compilers were cross-compiled using clang-22 and were benchmarked cross-compiling themselves to x86.
Let's look at the impact of -ftrapv first. In chibicc O3/O2/Os the dynamic upos increased due to -ftrapv for RISC-V by 5.3%/5.1%/6.7%, and for ARM by 5.1%/5.0%/6.4%. Interestingly, in tinycc it only increased for RISC-V by 1.6%/1.0%/1.0%, while ARM increased slightly more with 1.6%/2.0%/1.3%.
In terms of dynamic instruction count, ARM needed to execute 6%/15% fewer instructions than RISC-V for chibicc/tinycc. Looking at the uops, RISC-V needs to execute 6% more uops in tinycc, but ARM needs to execute 0.5% more uops with chibicc. The dynamic instruction size, which estimates the pressure on icache and fetch bandwidth, was 24%/10% lower in RISC-V for chibicc/tinycc.
Note that this did not model any instruction fusion in RISC-V and only treated incrementing loads and load pairs as multiple uops (to mirror Apple Silicon).
If the only fusion pair you implement is adjacent compressed sp relative stores, then RISC-V ends up with a lower uop count for both programs. They are trivial to implement because you can just interpret the two adjacent 16-bit instructions as a single 32-bit instruction, and compilers always generate them next to each other and in sorted order in function prolog code. You can do this directly in your RVC expander; it only adds minimal additional delay (zero with a trick), which is constant regardless of decode width.
Raw data:
chibicc/clang-O3-armv9: insns: 419886184 uops: 450136257 bytes: 1679544736
chibicc/clang-O3-armv9-trap: insns: 450205913 uops: 474206409 bytes: 1800823652
chibicc/clang-O3-rva23: insns: 449328186 uops: 449328186 bytes: 1288202666
chibicc/clang-O3-rva23-trap: insns: 474623648 uops: 474623648 bytes: 1375991094
chibicc/clang-O2-armv9: insns: 421810039 uops: 451501004 bytes: 1687240156
chibicc/clang-O2-armv9-trap: insns: 451642152 uops: 475084965 bytes: 1806568608
chibicc/clang-O2-rva23: insns: 449625081 uops: 449625081 bytes: 1286452180
chibicc/clang-O2-rva23-trap: insns: 473682134 uops: 473682134 bytes: 1369720036
chibicc/clang-Os-armv9: insns: 457841653 uops: 489902437 bytes: 1831366612
chibicc/clang-Os-armv9-trap: insns: 497189616 uops: 523323893 bytes: 1988758464
chibicc/clang-Os-rva23: insns: 486216287 uops: 486216287 bytes: 1363135906
chibicc/clang-Os-rva23-trap: insns: 520889604 uops: 520889604 bytes: 1473263784
tinycc/clang-O3-armv9: insns: 115189179 uops: 126358884 bytes: 460756716
tinycc/clang-O3-armv9-trap: insns: 117139555 uops: 128361973 bytes: 468558220
tinycc/clang-O3-rva23: insns: 137035509 uops: 137035509 bytes: 427878586
tinycc/clang-O3-rva23-trap: insns: 139248009 uops: 139248009 bytes: 436548988
tinycc/clang-O2-armv9: insns: 115184314 uops: 126568360 bytes: 460737256
tinycc/clang-O2-armv9-trap: insns: 117651772 uops: 129195276 bytes: 470607088
tinycc/clang-O2-rva23: insns: 137362294 uops: 137362294 bytes: 420468990
tinycc/clang-O2-rva23-trap: insns: 138649335 uops: 138649335 bytes: 428680948
tinycc/clang-Os-armv9: insns: 130661270 uops: 144718253 bytes: 522645080
tinycc/clang-Os-armv9-trap: insns: 132574148 uops: 146565708 bytes: 530296592
tinycc/clang-Os-rva23: insns: 152798316 uops: 152798316 bytes: 452181732
tinycc/clang-Os-rva23-trap: insns: 154232874 uops: 154232874 bytes: 458257882Comment by CerryuDu 3 days ago
return in_reverse?
(right > left) - (right < left)
: (left > right) - (left < right);
I prefer (with "greater" being ±1, defaulting to +1): return left < right ? -greater :
left > right ? greater :
0;Comment by adrian_b 3 days ago
When there is no overflow, the sign of the subtraction result provides the same information as a comparison operator, but this is no longer true when overflow happens.
Comment by Joker_vD 4 days ago
Comment by uecker 4 days ago
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3654.pdf
(and I am not impressed by micro benchmarks)
Comment by CerryuDu 4 days ago
Comment by uecker 4 days ago
Comment by CerryuDu 3 days ago
I disagree because I've seen closures shine (in OCaml) and suck terribly (in C++). Same concept, extremely different programming experience and debuggability. Syntax matters, language psychology matters. Closures naturally and obviously suit programming languages that are genuinely functional, or at least manage memory transparently for you. C is neither, and so "alloca", "defer", the cleanup attribute, closures -- all stick out sorely. The whole selling point of C is explicitness. The tedium of C is the price we pay for the great control that C offers us with a relatively simple vocabulary. I couldn't be more content with that deal.
C++ is impossible for any single person to learn, there is so much insanely complicated implicit behavior in it. C can mostly be learned by a single (persistent) person, but it's been getting harder.
I don't want C to be fashionable, or attractive. I want it to remain minimal. If someone feels hamstrung by it, there are so many other languages to choose from. I simply want the particular tradeoff that C offers (or used to offer) to remain in existence. And that is what's been going away, with each issue of ISO C being the only official standard (obsoleting/superseding all earlier issues of the standard).
Why give people closures or "defer" or ... whatever ... when they can't even remember the concept of the usual arithmetic conversions? Which has been standard since C89? Have you met a "practitioner" (= any C programmer with no particular interest in the standard proper) that could explain the effective type rules? Why make it more complicated?
I apologize -- I guess this is just my semi-diplomatic way to say, "please, get off my lawn". (Not to you personally, of course!) I'm very sorry.
EDIT: for more of the same sentiment, see the reddit thread: https://old.reddit.com/r/programming/comments/1pk3whx/the_co...
Comment by uecker 2 days ago
Well, I also do not want to see C++ style closures in C and I fully agree about your point regarding control and explicitness. I also agree that some of the initiatives we see now are regrettably motivated by the attempt to make C fashionable, and sometimes by poorly adopting C++ features.
Yet, I think nested functions fit C perfectly way and I use them for a long time in some projects. They exist in very similar languages (PASCAL, Ada, D, ...) and even in C's ancestor ALGOL. This also shows that this type of nested functions are also not a functional programming concept. There is not really anything to learn, as syntax and semantics follow very naturally from all existing rules and the improvement in code quality for callbacks or higher-level iteration over data types is very real. I agree that it gets much more complicated once you treat them more as objects that can escape their original function, pointing this out is what I tried t o do with my paper.
The usual arithmetic conversion have seen unfair criticism in my opinion. Effective types rules are mess, to some degree also because compilers invented their own rules or simply ignore the C standard. But this is a different topic. From a programmer's point of view, the rule that you just access each variable should have one type that is used consistently is enough to know to stay on the safe side.
Comment by dfawcus 2 days ago
There they allowed nested functions, but also what they termed "full function values", being a form of fat pointer. Certainly I came across it in High-C v1.7 in 1990, and the full manual for an earlier version (1.2?) from around '85 can be found on Bitsavers.
It had a syntax like:
extern void Quick_sort(
int Lo, int Hi, int Compare(int a, int b)!,
void Swap(int a,int b)!
);
static Sort_private_table() {
Entry Entries[100];
int Compare(int a,int b) {
return Entries[a] < Entries[b];
}
void Swap(int a,int b) {
Entry Temp = Entries[a];
Entries[a] = Entries[b];
Entries[b] = Temp;
}
...
Quick_sort(1,100,Compare,Swap);
}
The above is an extract from their language reference, which you can find here:https://archive.org/download/Yoshizuki_UnRenamed_Files__D-V/...
Comment by dfawcus 2 days ago
I believe the High-C compiler with this support is still available, for modern embedded CPUs.
Comment by CerryuDu 2 days ago
Comment by sirwhinesalot 4 days ago
You can call the local functions directly and get the benefits of the specialized code.
There's no way to spell out this function's type, and no way to store it anywhere. This is true of regular functions too!
To pass it around you need to use the type-erased "fat pointer" version.
I don't see how anything else makes sense for C.
Comment by __phantomderp 2 days ago
https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Functio...
Comment by sirwhinesalot 22 hours ago
I do like the trampoline trick in 3.2.4, however, neat alternative to a fat pointer!
Comment by gpderetta 4 days ago
well regular functions decay to function pointers. You could have the moral equivalent of std::function_ref (or similarly, borland __closure) in C of course and have closures decay to it.
Comment by nutjob2 4 days ago
I'm a fan of nested functions but don't think the executable stack hack is worth it, and using a 'display' is a better solution.
See the Dragon Book or Compiler Construction: Principles and Practice (1984) by Louden
Comment by sirwhinesalot 4 days ago
Comment by nutjob2 4 days ago
Comment by LegionMammal978 4 days ago
Comment by dfawcus 3 days ago
Comment by sirwhinesalot 3 days ago
Comment by sylware 4 days ago
Comment by zzo38computer 3 days ago
With "static", it is implemented as an ordinary function, but the name is local to the function that contains it; it cannot access stuff within the function containing it unless those things are also declared as "static".
With "register", the address of the function cannot be taken, and if the function accesses other stuff within the function that contains it then the compiler will add additional arguments to the function so that its type does not necessarily match the type which is specified in the program.
This is not good enough for many uses though, so having the other extensions would also be helpful (possibly including implementing Apple Blocks in GCC).
Comment by kazinator 4 days ago
Comment by CerryuDu 4 days ago
I've used lambdas extensively in modern C++. I hate them with a passion.
I've also used OCaml. An awesome language where this stuff is super natural and beautiful.
I don't understand why people want to shoehorn functional programming into C. C++ was terrible already, and is now worse for it.
> we’re going to be focusing on and looking specifically at Closures in C and C++, since this is going to be about trying to work with and – eventually – standardize something for ISO C that works for everyone.
Sigh. My heart sinks.
Comment by Asooka 4 days ago
Comment by eqvinox 4 days ago
Comment by dfawcus 3 days ago
(I can't be bothered to run his benchmarks)
#include <stdio.h>
typedef struct env_ E;
typedef struct fat_ptr_ Fp;
typedef int fn(E*);
struct fat_ptr_ {
fn *f;
E *e;
};
#define INT(body) ({ int lambda(E*){ return body; }; (Fp){lambda,0}; })
struct env_ {
int k;
Fp xl; Fp x2; Fp x3; Fp x4;
};
#define FpMk(fn,e) {fn, e}
#define FpCall(fn) (fn.f(fn.e))
int main(){
int a(E env, Fp x5){
int b(E *ep){
return a( (E){--(ep->k), FpMk(b, ep), ep->xl, ep->x2, ep->x3}, ep->x4 );
}
return env.k<=0 ? FpCall(env.x4) + FpCall(x5) : b(&env);
}
printf(" %d\n", a( (E){10, INT(1), INT(-1), INT(-1), INT(1)}, INT(0)) );
}Comment by groundzeros2015 4 days ago
Comment by sparkie 4 days ago
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
typedef int (*comp)(const void *untyped_left, const void *untyped_right);
thread_local int in_reverse = 0;
__attribute__((noinline))
int compare_impl(const void *untyped_left, const void *untyped_right, int in_reverse) {
const int* left = untyped_left;
const int* right = untyped_right;
return (in_reverse) ? *right - *left : *left - *right;
}
comp make_sort(int direction) {
in_reverse = direction;
int compare(const void *untyped_left, const void *untyped_right) {
return compare_impl(untyped_left, untyped_right, in_reverse);
}
return compare;
}
int main(int argc, char* argv[]) {
int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
comp normal_sort = make_sort(0);
comp reverse_sort = make_sort(1);
qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), normal_sort);
return list[0];
}
Because we create `reverse_sort` between creating `normal_sort` and calling it, we end up with a reverse sort despite clearly asking for a normal sort.Comment by groundzeros2015 4 days ago
Comment by sparkie 4 days ago
Then it's clearly only half a solution.
The example I gave above should work fine in any language with first-class closures.
Comment by groundzeros2015 4 days ago
_Thread_local struct {
void *data;
int (*compare)(const void *a, const void*, void*);
} _qsort2_closure ;
static int _qsort2_helper(const void *a, const void *b) {
return _qsort2_closure.compare(a, b, _qsort2_closure.data);
}
void qsort2(void *base, size_t elements, size_t width, int (*compare)(const void *a, const void*, void*), void *userData)
{
_qsort2_closure.data = userData;
_qsort2_closure.compare = compare;
qsort(base, elements, width, _qsort2_helper);
}Comment by gpderetta 3 days ago
Comment by groundzeros2015 3 days ago
No I do not. It will reassigned next call.
> But again you are reinventing dynamic scoping
No. I’m not reinventing anything. I’m using the existing feature of thread local variables.
The usage of such is entirely an implementation detail of qsort2 with the exception of recursion.
Dynamic scoping typically refers to defining variables which have scope outside of their call stack. No usage of this API requires it.
Can you just try to learn something new?
Comment by gpderetta 3 days ago
Comment by groundzeros2015 3 days ago
Once again, the caller of the API does not declare any variables so there is no dynamic scoping.
Comment by srcreigh 4 days ago
It’s confusing to me that thread locals are “not the best idea outside small snippets” meanwhile the top solution is templating on recursion depth with a constexpr limit of 11.
Comment by groundzeros2015 4 days ago
Comment by gpderetta 4 days ago
Comment by groundzeros2015 4 days ago
Comment by Quekid5 4 days ago
(You could solve that with a manually maintained stack for the context in a thread local, but you'd have to do that case-by-case)
Comment by groundzeros2015 3 days ago
I think the times you need to do this are few. And this version is much more pruden.
Comment by Quekid5 3 days ago
Anyway, the larger point is that a re-entrant general solution is desirable. The sort example might be a bit misguided, because who calls sort-inside-sort[0]? Nobody, realistically, but these types of issues are prevalent in the "how to do closures" area... and In C every API does it slightly differently, even if they're even aware of the issues.
[0] Because there's no community that likes nitpicking like the C (or C++) community. I considered preempting that objection :). C++ has solved this, so there's that.
Comment by groundzeros2015 2 days ago
That you do not call it recursively by checking that the thread local is nil before invocation.
> a re-entrant general solution is desirable.
I know what you mean, but I just don't know why you want to emulate that in C. There is a real problem of people writing APIs that don't let you pass in data with your function pointer - the thread local method can solve 99% of those without changes to the original API.
But if you really want to do all kinds of first class functions with data, do you want to use C?
Comment by nesarkvechnep 4 days ago
I have a case where I need to create a static templated lambda to be passed to C as a pointer. Such thing is impossible in Rust, which I considered at first.
Comment by pornel 4 days ago
let mut state = 1;
let mut fat_closure = || state += 1;
let (fnptr, userdata) = make_trampoline(&mut &mut fat_closure);
unsafe {
fnptr(userdata);
}
assert_eq!(state, 2);
use std::ffi::c_void;
fn make_trampoline<C: FnMut()>(closure: &mut &mut C) -> (unsafe fn(*mut c_void), *mut c_void) {
let fnptr = |userdata: *mut c_void| {
let closure: *mut &mut C = userdata.cast();
(unsafe { &mut *closure })()
};
(fnptr, closure as *mut _ as *mut c_void)
}
It requires a userdata arg for the C function, since there's no allocation or executable-stack magic to give a unique function pointer to each data instance. OTOH it's zero-cost. The generic make_trampoline inlines code of the closure, so there's no extra indirection.Comment by skavi 4 days ago
This isn’t fully accurate. In your example, `&mut C` actually has the same layout as usize. It’s not a fat pointer. `C` is a concrete type and essentially just an anonymous struct with FnMut implemented for it.
You’re probably thinking of `&mut dyn FnMut` which is a fat pointer that pairs a pointer to the data with a pointer to a VTable.
So in your specific example, the double indirection is unnecessary.
The following passes miri: https://play.rust-lang.org/?version=nightly&mode=debug&editi...
(did this on mobile, so please excuse any messiness).
Comment by MindSpunk 4 days ago
Well, capturing closures that are implemented like C++ lambdas or Rust closures anyway. The executable stack crimes do make a thin fn-ptr with state.
Comment by eqvinox 4 days ago
Unfortunately a lot of existing C APIs won't have the user arg in the place you need it, it's a mix of first, last, and sometimes even middle.
Comment by nesarkvechnep 4 days ago
Comment by pornel 4 days ago
The only unsafe here is to demonstrate it works with C/C++ FFI (where void* userdata is actually not type safe)
Comment by nesarkvechnep 4 days ago
Comment by queuebert 4 days ago
Comment by hyperbolablabla 4 days ago
// imagine my_function takes 3 ints, the first 2 args are captured and curried.
Function<void(int)> my_closure(&my_function, 1, 2);
my_closure(3);
I've never implemented it myself, as I don't use C++ features all too much, but as a pet project I'd like to someday. I wonder how something like that compares!Comment by spacechild1 4 days ago
Comment by mgaunard 4 days ago
Practically speaking all lambda options except for the one involving allocation (why would you even do that) are equivalent modulo inlining.
In particular, the caveat with the type erasure/helper variants is precisely that it prevents inlining, but given everything is in the same translation unit and isn't runtime-driven, it's still possible for the compiler to devirtualize.
I think it would be more interesting to make measurements when controlling explicitly whether inlining happens or the function type can be deduced statically.
Comment by gpderetta 4 days ago
In a simple test I see that GCC14 has no problems completely removing the overhead of std::function_ref, but plain std::function is a huge mess.
Eventually we will get there [1], but in the meantime I prefer not to rely on devirtualization, and heap elision is more of a party trick.
edit: to compare early vs late inlining: while gcc 14 can remove one layer of function_ref, it seems that it cannot remove two layers, as apparently doesn't rerun the required passes to take advantage of the new opportunity. It has no problem of course removing an arbitrary large (but finite) layers of plain lambdas.
edit2: GCC15 can remove trivial uses of std::function, but this is very fragile. It still can't remove two function_ref.
[1] for example 25 years ago compilers were terrible at removing abstraction overhead of the STL, today there is very little cost.
Comment by mgaunard 4 days ago
Comment by ddtaylor 4 days ago
Comment by keymasta 4 days ago
Comment by psyclobe 4 days ago
Comment by capestart 4 days ago
Comment by stefantalpalaru 4 days ago
Comment by trgn 4 days ago