On Thu, 9 Feb 2023 11:10:21 +0100
Michal Prívozník
On 2/9/23 10:56, Daniel P. Berrangé wrote:
On Thu, Feb 09, 2023 at 09:52:00AM +0100, Michal Prívozník wrote:
On 2/9/23 00:13, Laine Stump wrote:
I initially had the passt process being started in an identical fashion to the slirp-helper - libvirt was daemonizing the new process and recording its pid in a pidfile. The problem with this is that, since it is daemonized immediately, any startup error in passt happens after the daemonization, and thus isn't seen by libvirt - libvirt believes that the process has started successfully and continues on its merry way. The result was that sometimes a guest would be started, but there would be no passt process for qemu to use for network traffic.
Instead, we should be starting passt in the same manner we start dnsmasq - we just exec it as normal (along with a request that passt create the pidfile, which is just another option on the passt commandline) and wait for the child process to exit; passt then has a chance to parse its commandline and complete all the setup prior to daemonizing itself; if it encounters an error and exits with a non-0 code, libvirt will see the code and know about the failure. We can then grab the output from stderr, log that so the "user" has some idea of what went wrong, and then fail the guest startup.
Signed-off-by: Laine Stump
--- src/qemu/qemu_passt.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index 0f09bf3db8..f640a69c00 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -141,24 +141,23 @@ qemuPasstStart(virDomainObj *vm, g_autofree char *passtSocketName = qemuPasstCreateSocketPath(vm, net); g_autoptr(virCommand) cmd = NULL; g_autofree char *pidfile = qemuPasstCreatePidFilename(vm, net); + g_autofree char *errbuf = NULL; char macaddr[VIR_MAC_STRING_BUFLEN]; size_t i; pid_t pid = (pid_t) -1; int exitstatus = 0; int cmdret = 0; - VIR_AUTOCLOSE errfd = -1;
cmd = virCommandNew(PASST);
virCommandClearCaps(cmd); - virCommandSetPidFile(cmd, pidfile); - virCommandSetErrorFD(cmd, &errfd); - virCommandDaemonize(cmd); + virCommandSetErrorBuffer(cmd, &errbuf);
virCommandAddArgList(cmd, "--one-off", "--socket", passtSocketName, "--mac-addr", virMacAddrFormat(&net->mac, macaddr), + "--pid", pidfile,
The only problem with this approach is that our virPidFile*() functions rely on locking the very first byte. And when reading the pidfile, we try to lock the file and if we succeeded it means the file wasn't locked which means the process holding the lock died and thus the pid in the pidfile is stale.
Now, I don't see passt locking the pidfile at all. So effectively, after this patch qemuPasstStop() would do nothing (well, okay, it'll remove the pidfile), qemuPasstSetupCgroup() does nothing, etc.
What we usually do in this case, is: we let our code write the pidfile (just like the current code does), but then have a loop that waits a bit for socket to show up. If it doesn't in say 5 seconds we kill the child process (which we know the PID of). You can take inspiration from: qemuDBusStart() or qemuProcessStartManagedPRDaemon().
Busy waiting for sockets is nasty though. Depending on how passt is written it might not be needed. If passt creates the listen() socket and does all the important initialization steps that are liable to fail, *before* it daemonizes, then we can synchronize without busy waiting.
It does. In my opinion it could simply be handled like it's done for dnsmasq -- from networkStartDhcpDaemon(): if (virCommandRun(cmd, NULL) < 0) return -1; /* * There really is no race here - when dnsmasq daemonizes, its * leader process stays around until its child has actually * written its pidfile. So by time virCommandRun exits it has * waitpid'd and guaranteed the proess has started and written a * pid */
ie waitpid() for passt leader process to exit. Then check if the socket exists. If it does, then passt has daemonized and is listening and running, if it does not, then passt failed.
That still requires passt to hold the pidfile open and locked, neither of which is happening with the current code.
...is this still a requirement even if qemuPasstStop() just needs to remove the PID file? -- Stefano