43 points by matt_d 4 days ago | 18 comments

cryptonector 2 hours ago [-]

> Fil-C's pollchecks also support stop-the-world (via the FILC_THREAD_STATE_STOP_REQUESTED bit). This is used for:

> - Implementing fork(2), which needs all threads to stop at a known-good point before the fork child jettisons them.

Makes me wonder how it handles `vfork()`, but I think it needs just a safepoint and no stop-the-world since, after all, the child side of `vfork()` is executing in the same address space as the parent (until exec-or-exit), so it's as though it's the same thread as in the parent (which is stopped waiting for the child to exec-or-exit). All the more reasons that `fork()` is evil and `vfork()` much better.

foota 1 hours ago [-]

I was looking at https://fil-c.org/programs_that_work and saw "dash 0.5.12. One tiny change: use fork(2) instead of vfork(2).", so maybe it doesn't :)

pizlonator 41 minutes ago [-]

I don't support vfork(2) right now, only fork(2).

I have some super crazy ideas about how to make vfork(2) work, but I don't particularly like any of them. I'll implement it if I find some situation where vfork(2) is strongly required (so far I haven't found even one such case). The closest is that I've found cases where I have to add the CLOEXEC pipe trick to do error handling the fork(2) way rather than the vfork(2) way.

cryptonector 19 minutes ago [-]

Remember that the child side of `vfork(2)` can only do async-signal-safe things as `vfork(2)` is documented, and it's really as if it is executing in the same thread as the parent (down to thread-locals, I do believe), so really, `vfork(2)` shouldn't really be all that special for Fil-C! You might want to try it.

As u/foota notes, it's important. The point of `vfork(2)` is that it's _fast_. https://news.ycombinator.com/item?id=30502392

pizlonator 15 minutes ago [-]

> Remember that the child side of `vfork(2)` can only do async-signal-safe things as `vfork(2)` is documented

And if it does anything that isn't async-signal-safe, then all bets are off.

So, the Fil-C implementation of vfork(2) would have to have some way of checking (either statically or dynamically) that nothing async-signal-unsafe happens. That's the hard bit. That's why I think that implementing it is crazy.

I'd have to have so many checks in so many weird places!

> The point of `vfork(2)` is that it's _fast_

Yup. That's the reason to have it. But I haven't yet found a situation where a program I want to run is slow because I don't have vfork(2). The closest thing is that bash is visibly slower in Fil-C due to forking, but bash always uses fork(2) anyway - so in that case what I need to do is make my fork(2) impl faster, not implement vfork(2).

cryptonector 9 minutes ago [-]

> And if it does anything that isn't async-signal-safe, then all bets are off.

> So, the Fil-C implementation of vfork(2) would have to have some way of checking (either statically or dynamically) that nothing async-signal-unsafe happens. That's the hard bit. That's why I think that implementing it is crazy.

Not really. See, Fil-C already makes those things that would not be safe be safe except returning from the caller of `vfork(2)`. Treat the child as if it's the same thread as in the parent and let it do whatever it would do.

Well, I guess the biggest issue is that you'd have to deal with how to communicate between the child and the GC thread, if you have to. One option is to just not allow the GC to run while a child of `vfork(2)` hasn't exec'ed-or-exit'ed.

> > The point of `vfork(2)` is that it's _fast_

> Yup. That's the reason to have it. But I haven't yet found a situation where a program I want to run is slow because I don't have vfork(2). The closest thing is that bash is visibly slower in Fil-C due to forking, but bash always uses fork(2) anyway - so in that case what I need to do is make my fork(2) impl faster, not implement vfork(2).

Typically it's programs with large RSS that need it, so things like JVMs, which probably wouldn't run under Fil-C.

cryptonector 2 hours ago [-]

@pizlonator I wonder if you couldn't bracket all assembly with `filc_exit`/`filc_enter` as you do system calls. When you know the assembly doesn't allocate memory then this should work fine but.. ah, it's the stack allocations that are also interesting, right? So you'd have to ensure that the assembly has enough stack space to execute _and_ that it does not invoke the heap allocator. But that seems doable for things like cryptography in in OpenSSL's libcrypto.

pizlonator 38 minutes ago [-]

Yeah you could do that. It's tantamount to having an FFI. It's super risky though - lots of ways that the assembly code could break Fil-C's assumptions and then you'd get memory safety issues.

My long term plan is to give folks a way to write memory safe assembly. I have some crazy ideas for how to do that

cryptonector 21 minutes ago [-]

Yeah, but it would go a looong way to getting us (the public) to use Fil-C in production. Like I'd really like to be able to run OpenSSH using Fil-C and I really don't want to have to worry about the C-coded crypto being non-constant-time.

What I suggest as a medium-term solution might be to have macros that expand to nothing if Fil-C is not used but which expand to a Fil-C macro decoration that indicates that the author thinks the assembly (or assembly coded function) is Fil-C-safe.

Now... I know, there's more, right, like you burn a register to keep a pointer to the Fil-C thread data structure, and the assembly needs to not step on that, so ok, it's probably harder than I'm making it seem, but maybe not much harder because you can save that pointer as part of the exit and restore it as part of the enter.

I do trust that your crazy ideas are crazy but workable though!

pizlonator 13 minutes ago [-]

That's a good point.

Note that statically checked inline asm is very achievable, so those folks who do constant time crypto by concealing their math operators behind inline asm will get what they need.

But I guess you really want the OpenSSL out-of-line assembly to work?

correct_horse 2 hours ago [-]

Fil-C seems interesting, and I didn’t understand the details of how multi-threaded garbage collectors worked before reading it (I still don’t but I’m closer!). The tradeoff between a compacting garbage collector (Java) vs what you can bolt on to C without forking LLVM is particularly interesting.

pjmlp 41 minutes ago [-]

Note that it is very simplistic to say Java has a compacting garbage collector, between Oracle JDK, OpenJDK, Temurin, Azul, PTC, Aicas, microEJ, OpenJ9, GraalVM, Jikes RVM, and the cousin Android, there is pleothora of GC configurations to chose from, that go beyond that.

whizzter 7 minutes ago [-]

Not to mention that many of them have multiple GC's that are selected as startup options.

Personally though I can't wait for Azul's patents to start running out so that OS vendors can start implementing similar machinery into mainline OS's for all runtimes to use.

cryptonector 2 hours ago [-]

TFA is really good at explaining how a threaded GC works with minimal impact when it's not running.

cyberax 45 minutes ago [-]

Polling for the safepoint signal adds overhead to tight loop, so Golang uses asynchronous signals to interrupt threads. It needs it for async pre-emption, not just GC. It also results in not knowing the exact state of the frame, so it has to scan the last frame of the stack conservatively.

There are no good ways to deal with that currently. Some VMs even used CPU instruction simulators to interpret the code until it hit the known good safepoint.

I had one idea about improving that: have a "doppelganger" mirror for the inner tight loops, that is instruction-by-instruction identical to the regular version. Except that backjumps are replaced with the call into the GC safepoint handler.

It'd be interesting to try this approach with Fil-C.

pizlonator 39 minutes ago [-]

> Polling for the safepoint signal adds overhead to tight loop

The usual trick, which Fil-C doesn't do yet, is to just unroll super tight loops

Also, the pollcheck doesn't have to be a load-and-branch. There are other ways. There's a bottomless pit of optimization strategies you can do.

Conservative scanning wouldn't be sound in Fil-C, because the LLVM passes that run after FilPizlonator could break the fundamental assumptions of conservative scanning

cyberax 19 minutes ago [-]

What other methods of stopping at safepoints do you think are viable? Another approach is code patching, but it's not possible on iOS, and it's a bad idea in general.

pizlonator 11 minutes ago [-]

The most commonly used optimization is to have just a load, or just a store, rather than a load-and-branch. Basically you access a page that you mprotect to trigger the handshake.

Unrolling loops is also super common. Recognizing loops that have a bounded runtime is also common.

My favorite technique to try one day is:

1. just record where the pollcheck points are but don't emit code there

2. to handshake with a thread, shoot it with a signal and have the signal handler interpret the machine code starting at the ucontext until it gets to a pollcheck

Loading comments...

cryptonector 2 hours ago [-]

> Fil-C's pollchecks also support stop-the-world (via the FILC_THREAD_STATE_STOP_REQUESTED bit). This is used for:

> - Implementing fork(2), which needs all threads to stop at a known-good point before the fork child jettisons them.

foota 1 hours ago [-]

I was looking at https://fil-c.org/programs_that_work and saw "dash 0.5.12. One tiny change: use fork(2) instead of vfork(2).", so maybe it doesn't :)

pizlonator 41 minutes ago [-]

I don't support vfork(2) right now, only fork(2).

cryptonector 19 minutes ago [-]

As u/foota notes, it's important. The point of `vfork(2)` is that it's _fast_. https://news.ycombinator.com/item?id=30502392

pizlonator 15 minutes ago [-]

> Remember that the child side of `vfork(2)` can only do async-signal-safe things as `vfork(2)` is documented

And if it does anything that isn't async-signal-safe, then all bets are off.

I'd have to have so many checks in so many weird places!

> The point of `vfork(2)` is that it's _fast_

cryptonector 9 minutes ago [-]

> And if it does anything that isn't async-signal-safe, then all bets are off.

> > The point of `vfork(2)` is that it's _fast_

Typically it's programs with large RSS that need it, so things like JVMs, which probably wouldn't run under Fil-C.

cryptonector 2 hours ago [-]

pizlonator 38 minutes ago [-]

Yeah you could do that. It's tantamount to having an FFI. It's super risky though - lots of ways that the assembly code could break Fil-C's assumptions and then you'd get memory safety issues.

My long term plan is to give folks a way to write memory safe assembly. I have some crazy ideas for how to do that

cryptonector 21 minutes ago [-]

I do trust that your crazy ideas are crazy but workable though!

pizlonator 13 minutes ago [-]

That's a good point.

Note that statically checked inline asm is very achievable, so those folks who do constant time crypto by concealing their math operators behind inline asm will get what they need.

But I guess you really want the OpenSSL out-of-line assembly to work?

correct_horse 2 hours ago [-]

pjmlp 41 minutes ago [-]

whizzter 7 minutes ago [-]

Not to mention that many of them have multiple GC's that are selected as startup options.

Personally though I can't wait for Azul's patents to start running out so that OS vendors can start implementing similar machinery into mainline OS's for all runtimes to use.

cryptonector 2 hours ago [-]

TFA is really good at explaining how a threaded GC works with minimal impact when it's not running.

cyberax 45 minutes ago [-]

There are no good ways to deal with that currently. Some VMs even used CPU instruction simulators to interpret the code until it hit the known good safepoint.

It'd be interesting to try this approach with Fil-C.

pizlonator 39 minutes ago [-]

> Polling for the safepoint signal adds overhead to tight loop

The usual trick, which Fil-C doesn't do yet, is to just unroll super tight loops

Also, the pollcheck doesn't have to be a load-and-branch. There are other ways. There's a bottomless pit of optimization strategies you can do.

Conservative scanning wouldn't be sound in Fil-C, because the LLVM passes that run after FilPizlonator could break the fundamental assumptions of conservative scanning

cyberax 19 minutes ago [-]

What other methods of stopping at safepoints do you think are viable? Another approach is code patching, but it's not possible on iOS, and it's a bad idea in general.

pizlonator 11 minutes ago [-]

The most commonly used optimization is to have just a load, or just a store, rather than a load-and-branch. Basically you access a page that you mprotect to trigger the handshake.

Unrolling loops is also super common. Recognizing loops that have a bounded runtime is also common.

My favorite technique to try one day is:

1. just record where the pollcheck points are but don't emit code there

2. to handshake with a thread, shoot it with a signal and have the signal handler interpret the machine code starting at the ucontext until it gets to a pollcheck