On Wed, 13 May 2026 14:14:23 +1000
David Gibson
Most of our operation is asynchronous, based on non-blocking fds handled in our epoll loop. However, our several Unix sockets (tap client, repair helper, control client) are all blocking fds after accept().
That's correct for the repair helper, and (for now) correct for the control client. However, the reasons for that might not be obvious, so add some extra comments giving the rationale.
I don't believe it's correct for the tap client; having this socket be blocking means we could potentially block the main loop if we ever got a a spurious EPOLL{IN,OUT} event on the tap socket. Switch the tap socket to non-blocking for better robustness, and consistency with nearly every other fd we track.
That socket needs to be blocking for the second usage we make of it in tap_send_frames_passt(), that is, the one via write_remainder() without MSG_DONTWAIT. While a part of https://bugs.passt.top/show_bug.cgi?id=38 is solved (there are no blocking reads left in tap_passt_input()), this isn't the case for the writing side of it. Some nits below:
Signed-off-by: David Gibson
--- conf.c | 6 ++++++ repair.c | 4 ++++ tap.c | 3 ++- 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/conf.c b/conf.c index dec43fca..dc85f0f8 100644 --- a/conf.c +++ b/conf.c @@ -2082,6 +2082,12 @@ static void conf_accept(struct ctx *c) int fd, rc;
retry: + /* Currently we perform the configuration transaction more-or-less + * synchronously, so we want the accepted socket to be blocking. + * + * FIXME: We should make the configuration update asynchronous, like + * most of our operation, so a misbehaving configuration client can't + * block the main forwarding loop */
* ... loop. */
fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC); if (fd < 0) { if (errno != EAGAIN) diff --git a/repair.c b/repair.c index 42c4ae97..8a2d119d 100644 --- a/repair.c +++ b/repair.c @@ -99,6 +99,10 @@ int repair_listen_handler(struct ctx *c, uint32_t events) return EEXIST; }
+ /* We want accepted socket to be blocking; we use it during migration
"the accepted socket"
+ * which is a synchronous interruption to our normal non-blocking + * behaviour. + */ if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, SOCK_CLOEXEC)) < 0) { if ((rc = errno) != EAGAIN) diff --git a/tap.c b/tap.c index fda2da9b..3b8a3f3d 100644 --- a/tap.c +++ b/tap.c @@ -1490,7 +1490,8 @@ void tap_listen_handler(struct ctx *c, uint32_t events) return; }
- c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC); + c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, + SOCK_NONBLOCK | SOCK_CLOEXEC);
...so this part would need to be dropped.
if (c->fd_tap < 0) { if (errno != EAGAIN) warn_perror("Error accepting tap client");
The rest looks good to me. -- Stefano