GCC Builds Failing After sbuild Refactoring – Emanuele Rocca
Something is causing the build to end prematurely. It’s not the OOM killer, and
the kernel does not have anything useful to say in the logs. Can it be that the
D language tests are sending signals to some process, and that is what’s
killing make
? We start tracing signals sent with bpftrace
by writing the
following script, signals.bt
:
tracepoint:signal:signal_generate { printf("%s PID %d (%s) sent signal %d to PID %d\n", comm, pid, args->sig, args->pid); }
And executing it with sudo bpftrace signals.bt
.
The build takes its sweet time, and it fails. Looking at the trace output
there’s a suspicious process.exe
terminating stuff.
process.exe (PID: 2868133) sent signal 15 to PID 711826
That looks interesting, but we have no clue what PID 711826 may be. Let’s change
the script a bit, and trace signals received as well.
tracepoint:signal:signal_generate { printf("PID %d (%s) sent signal %d to %d\n", pid, comm, args->sig, args->pid); } tracepoint:signal:signal_deliver { printf("PID %d (%s) received signal %d\n", pid, comm, args->sig); }
The working version of sbuild was using dumb-init
, whereas the new one
features
a
little init in perl. We patch the current version of sbuild by making it use
dumb-init
instead, and trace two builds: one with the perl init, one with
dumb-init
.
Here are the signals observed when building with dumb-init
.
PID 3590011 (process.exe) sent signal 2 to 3590014 PID 3590014 (sleep) received signal 9 PID 3590011 (process.exe) sent signal 15 to 3590063 PID 3590063 (std.process tem) received signal 9 PID 3590011 (process.exe) sent signal 9 to 3590065 PID 3590065 (std.process tem) received signal 9
And this is what happens with the new init in perl:
PID 3589274 (process.exe) sent signal 2 to 3589291 PID 3589291 (sleep) received signal 9 PID 3589274 (process.exe) sent signal 15 to 3589338 PID 3589338 (std.process tem) received signal 9 PID 3589274 (process.exe) sent signal 9 to 3589340 PID 3589340 (std.process tem) received signal 9 PID 3589274 (process.exe) sent signal 15 to 3589341 PID 3589274 (process.exe) sent signal 15 to 3589323 PID 3589274 (process.exe) sent signal 15 to 3589320 PID 3589274 (process.exe) sent signal 15 to 3589274 PID 3589274 (process.exe) received signal 9 PID 3589341 (sleep) received signal 9 PID 3589273 (sbuild-usernsex) sent signal 9 to 3589320 PID 3589273 (sbuild-usernsex) sent signal 9 to 3589323
There are a few additional SIGTERM being sent when using the perl init, that’s
helpful. At this point we are fairly convinced that process.exe
is worth
additional inspection. The
source
code of process.d shows something interesting:
1221 @system unittest 1222 { [...] 1247 auto pid = spawnProcess(["sleep", "10000"], [...] 1260 // kill the spawned process with SIGINT 1261 // and send its return code 1262 spawn((shared Pid pid) { 1263 auto p = cast() pid; 1264 kill(p, SIGINT);
So yes, there’s our sleep
and the SIGINT (signal 2) right in the unit tests
of process.d
, just like we have observed in the bpftrace output.
Can we study the behavior of process.exe
in isolation, separatedly from the
build? Indeed we can. Let’s take the executable from a failed build, and try
running it under /usr/libexec/sbuild-usernsexec.
First, we prepare a chroot inside a suitable user namespace:
unshare --map-auto --setuid 0 --setgid 0 mkdir /tmp/rootfs cd /tmp/rootfs cat /home/ema/.cache/sbuild/unstable-arm64.tar | unshare --map-auto --setuid 0 --setgid 0 tar xf - unshare --map-auto --setuid 0 --setgid 0 mkdir /tmp/rootfs/whatever unshare --map-auto --setuid 0 --setgid 0 cp process.exe /tmp/rootfs/
Now we can run process.exe
on its own using the perl init, and trace signals at will:
/usr/libexec/sbuild-usernsexec --pivotroot --nonet u:0:100000:65536 g:0:100000:65536 /tmp/rootfs ema /whatever -- /process.exe
We can compare the behavior of the perl init vis-a-vis the one using
dumb-init
in milliseconds instead of minutes.