[PATCH v2 0/7] UDP flow socket preliminaries
As discussed on our recent call, I'm working towards using connected sockets on both sides of UDP flows. This series makes some preliminary reworks that simplify things and make that easier. v2: * Added patches 5..7, other patches unchanged. David Gibson (7): udp: Common invocation of udp_sock_errs() for vhost-user and "buf" paths udp: Simplify checking of epoll event bits udp_vu: Factor things out of udp_vu_reply_sock_data() loop udp: Share more logic between vu and non-vu reply socket paths udp: Better handling of failure to forward from reply socket udp: Always hash socket facing flowsides udp: Add helper function for creating connected UDP socket udp.c | 121 +++++++++++++++++++++-------------------- udp_flow.c | 143 ++++++++++++++++++++++++++----------------------- udp_internal.h | 2 +- udp_vu.c | 68 +++++++---------------- udp_vu.h | 9 ++-- 5 files changed, 166 insertions(+), 177 deletions(-) -- 2.49.0
The vhost-user and non-vhost-user paths for both udp_listen_sock_handler()
and udp_reply_sock_handler() are more or less completely separate. Both,
however, start with essentially the same invocation of udp_sock_errs(), so
that can be made common.
Signed-off-by: David Gibson
udp_{listen,reply}_sock_handler() can accept both EPOLLERR and EPOLLIN
events. However, unlike most epoll event handlers we don't check the
event bits right there. EPOLLERR is checked within udp_sock_errs() which
we call unconditionally. Checking EPOLLIN is still more buried: it is
checked within both udp_sock_recv() and udp_vu_sock_recv().
We can simplify the logic and pass less extraneous parameters around by
moving the checking of the event bits to the top level event handlers.
This makes udp_{buf,vu}_{listen,reply}_sock_handler() no longer general
event handlers, but specific to EPOLLIN events, meaning new data. So,
rename those functions to udp_{buf,vu}_{listen,reply}_sock_data() to better
reflect their function.
Signed-off-by: David Gibson
At the start of every cycle of the loop in udp_vu_reply_sock_data() we:
- ASSERT that uflow is not NULL
- Check if the target pif is PIF_TAP
- Initialize the v6 boolean
However, all of these depend only on the flow, which doesn't change across
the loop. This is probably a duplication from udp_vu_listen_sock_data(),
where the flow can be different for each packet. For the reply socket
case, however, factor that logic out of the loop.
Signed-off-by: David Gibson
Share some additional miscellaneous logic between the vhost-user and "buf"
paths for data on udp reply sockets. The biggest piece is error handling
of cases where we can't forward between the two pifs of the flow. We also
make common some more simple logic locating the correct flow and its
parameters.
This adds some lines of code due to extra comment lines, but nonetheless
reduces logic duplication.
Signed-off-by: David Gibson
On Wed, 26 Mar 2025 14:44:04 +1100
David Gibson
Share some additional miscellaneous logic between the vhost-user and "buf" paths for data on udp reply sockets. The biggest piece is error handling of cases where we can't forward between the two pifs of the flow. We also make common some more simple logic locating the correct flow and its parameters.
This adds some lines of code due to extra comment lines, but nonetheless reduces logic duplication.
Signed-off-by: David Gibson
--- udp.c | 41 ++++++++++++++++++++++++++--------------- udp_vu.c | 26 +++++++++++--------------- udp_vu.h | 3 ++- 3 files changed, 39 insertions(+), 31 deletions(-) diff --git a/udp.c b/udp.c index 55021ac4..4258812e 100644 --- a/udp.c +++ b/udp.c @@ -754,24 +754,25 @@ void udp_listen_sock_handler(const struct ctx *c, /** * udp_buf_reply_sock_data() - Handle new data from flow specific socket * @c: Execution context - * @ref: epoll reference + * @s: Socket to read data from + * @tosidx: Flow & side to forward data from @s to * @now: Current timestamp * + * Return: true on success, false if can't forward from socket to flow's pif
"on success"...
+ * * #syscalls recvmmsg */ -static void udp_buf_reply_sock_data(const struct ctx *c, union epoll_ref ref, +static bool udp_buf_reply_sock_data(const struct ctx *c, + int s, flow_sidx_t tosidx, const struct timespec *now) { - flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside); const struct flowside *toside = flowside_at_sidx(tosidx); - struct udp_flow *uflow = udp_at_sidx(ref.flowside); + struct udp_flow *uflow = udp_at_sidx(tosidx); uint8_t topif = pif_at_sidx(tosidx); - int n, i, from_s; - - from_s = uflow->s[ref.flowside.sidei]; + int n, i;
- if ((n = udp_sock_recv(c, from_s, udp_mh_recv)) <= 0) - return; + if ((n = udp_sock_recv(c, s, udp_mh_recv)) <= 0) + return true;
is not exactly accurate: 0 means either no datagrams (should never happen) or error on recvmmsg(). But I think it's clear enough, and you'll have follow-ups anyway, so I went ahead and applied this. Maybe: * Return: false if we can't forward from socket to flow's pif, true otherwise ? Actually, looking at the caller:
flow_trace(uflow, "Received %d datagrams on reply socket", n); uflow->ts = now->tv_sec; @@ -790,11 +791,10 @@ static void udp_buf_reply_sock_data(const struct ctx *c, union epoll_ref ref, } else if (topif == PIF_TAP) { tap_send_frames(c, &udp_l2_iov[0][0], UDP_NUM_IOVS, n); } else { - uint8_t frompif = pif_at_sidx(ref.flowside); - - flow_err(uflow, "No support for forwarding UDP from %s to %s", - pif_name(frompif), pif_name(topif)); + return false; } + + return true; }
/** @@ -821,10 +821,21 @@ void udp_reply_sock_handler(const struct ctx *c, union epoll_ref ref, }
if (events & EPOLLIN) { + flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside); + int s = ref.fd; + bool ret; + if (c->mode == MODE_VU) - udp_vu_reply_sock_data(c, ref, now); + ret = udp_vu_reply_sock_data(c, s, tosidx, now); else - udp_buf_reply_sock_data(c, ref, now); + ret = udp_buf_reply_sock_data(c, s, tosidx, now); + + if (!ret) {
the opposite would probably be less surprising, if (ret) flow_err(...).
+ flow_err(uflow, + "No support for forwarding UDP from %s to %s", + pif_name(pif_at_sidx(ref.flowside)), + pif_name(pif_at_sidx(tosidx))); + } } }
diff --git a/udp_vu.c b/udp_vu.c index 6e1823a9..06bdeae6 100644 --- a/udp_vu.c +++ b/udp_vu.c @@ -273,38 +273,32 @@ void udp_vu_listen_sock_data(const struct ctx *c, union epoll_ref ref, /** * udp_vu_reply_sock_data() - Handle new data from flow specific socket * @c: Execution context - * @ref: epoll reference + * @s: Socket to read data from + * @tosidx: Flow & side to forward data from @s to * @now: Current timestamp + * + * Return: true on success, false if can't forward from socket to flow's pif */ -void udp_vu_reply_sock_data(const struct ctx *c, union epoll_ref ref, +bool udp_vu_reply_sock_data(const struct ctx *c, int s, flow_sidx_t tosidx, const struct timespec *now) { - flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside); const struct flowside *toside = flowside_at_sidx(tosidx); bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)); - struct udp_flow *uflow = udp_at_sidx(ref.flowside); - int from_s = uflow->s[ref.flowside.sidei]; + struct udp_flow *uflow = udp_at_sidx(tosidx); struct vu_dev *vdev = c->vdev; struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; - uint8_t topif = pif_at_sidx(tosidx); int i;
ASSERT(uflow);
- if (topif != PIF_TAP) { - uint8_t frompif = pif_at_sidx(ref.flowside); - - flow_err(uflow, - "No support for forwarding UDP from %s to %s", - pif_name(frompif), pif_name(topif)); - return; - } + if (pif_at_sidx(tosidx) != PIF_TAP) + return false;
for (i = 0; i < UDP_MAX_FRAMES; i++) { ssize_t dlen; int iov_used;
- iov_used = udp_vu_sock_recv(c, from_s, v6, &dlen); + iov_used = udp_vu_sock_recv(c, s, v6, &dlen); if (iov_used <= 0) break; flow_trace(uflow, "Received 1 datagram on reply socket"); @@ -318,4 +312,6 @@ void udp_vu_reply_sock_data(const struct ctx *c, union epoll_ref ref, } vu_flush(vdev, vq, elem, iov_used); } + + return true; } diff --git a/udp_vu.h b/udp_vu.h index 4f2262d0..2299b51f 100644 --- a/udp_vu.h +++ b/udp_vu.h @@ -8,6 +8,7 @@
void udp_vu_listen_sock_data(const struct ctx *c, union epoll_ref ref, const struct timespec *now); -void udp_vu_reply_sock_data(const struct ctx *c, union epoll_ref ref, +bool udp_vu_reply_sock_data(const struct ctx *c, int s, flow_sidx_t tosidx, const struct timespec *now); + #endif /* UDP_VU_H */
-- Stefano
On Wed, Mar 26, 2025 at 11:14:33PM +0100, Stefano Brivio wrote:
On Wed, 26 Mar 2025 14:44:04 +1100 David Gibson
wrote: Share some additional miscellaneous logic between the vhost-user and "buf" paths for data on udp reply sockets. The biggest piece is error handling of cases where we can't forward between the two pifs of the flow. We also make common some more simple logic locating the correct flow and its parameters.
This adds some lines of code due to extra comment lines, but nonetheless reduces logic duplication.
Signed-off-by: David Gibson
--- udp.c | 41 ++++++++++++++++++++++++++--------------- udp_vu.c | 26 +++++++++++--------------- udp_vu.h | 3 ++- 3 files changed, 39 insertions(+), 31 deletions(-) diff --git a/udp.c b/udp.c index 55021ac4..4258812e 100644 --- a/udp.c +++ b/udp.c @@ -754,24 +754,25 @@ void udp_listen_sock_handler(const struct ctx *c, /** * udp_buf_reply_sock_data() - Handle new data from flow specific socket * @c: Execution context - * @ref: epoll reference + * @s: Socket to read data from + * @tosidx: Flow & side to forward data from @s to * @now: Current timestamp * + * Return: true on success, false if can't forward from socket to flow's pif
"on success"...
+ * * #syscalls recvmmsg */ -static void udp_buf_reply_sock_data(const struct ctx *c, union epoll_ref ref, +static bool udp_buf_reply_sock_data(const struct ctx *c, + int s, flow_sidx_t tosidx, const struct timespec *now) { - flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside); const struct flowside *toside = flowside_at_sidx(tosidx); - struct udp_flow *uflow = udp_at_sidx(ref.flowside); + struct udp_flow *uflow = udp_at_sidx(tosidx); uint8_t topif = pif_at_sidx(tosidx); - int n, i, from_s; - - from_s = uflow->s[ref.flowside.sidei]; + int n, i;
- if ((n = udp_sock_recv(c, from_s, udp_mh_recv)) <= 0) - return; + if ((n = udp_sock_recv(c, s, udp_mh_recv)) <= 0) + return true;
is not exactly accurate: 0 means either no datagrams (should never happen) or error on recvmmsg(). But I think it's clear enough, and you'll have follow-ups anyway, so I went ahead and applied this.
Maybe:
* Return: false if we can't forward from socket to flow's pif, true otherwise
? Actually, looking at the caller:
Yeah. While writing subsequent patches, I think I spotted a better way to structure this. I'll put some fixups in the next series.
flow_trace(uflow, "Received %d datagrams on reply socket", n); uflow->ts = now->tv_sec; @@ -790,11 +791,10 @@ static void udp_buf_reply_sock_data(const struct ctx *c, union epoll_ref ref, } else if (topif == PIF_TAP) { tap_send_frames(c, &udp_l2_iov[0][0], UDP_NUM_IOVS, n); } else { - uint8_t frompif = pif_at_sidx(ref.flowside); - - flow_err(uflow, "No support for forwarding UDP from %s to %s", - pif_name(frompif), pif_name(topif)); + return false; } + + return true; }
/** @@ -821,10 +821,21 @@ void udp_reply_sock_handler(const struct ctx *c, union epoll_ref ref, }
if (events & EPOLLIN) { + flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside); + int s = ref.fd; + bool ret; + if (c->mode == MODE_VU) - udp_vu_reply_sock_data(c, ref, now); + ret = udp_vu_reply_sock_data(c, s, tosidx, now); else - udp_buf_reply_sock_data(c, ref, now); + ret = udp_buf_reply_sock_data(c, s, tosidx, now); + + if (!ret) {
the opposite would probably be less surprising, if (ret) flow_err(...).
+ flow_err(uflow, + "No support for forwarding UDP from %s to %s", + pif_name(pif_at_sidx(ref.flowside)), + pif_name(pif_at_sidx(tosidx))); + } } }
diff --git a/udp_vu.c b/udp_vu.c index 6e1823a9..06bdeae6 100644 --- a/udp_vu.c +++ b/udp_vu.c @@ -273,38 +273,32 @@ void udp_vu_listen_sock_data(const struct ctx *c, union epoll_ref ref, /** * udp_vu_reply_sock_data() - Handle new data from flow specific socket * @c: Execution context - * @ref: epoll reference + * @s: Socket to read data from + * @tosidx: Flow & side to forward data from @s to * @now: Current timestamp + * + * Return: true on success, false if can't forward from socket to flow's pif */ -void udp_vu_reply_sock_data(const struct ctx *c, union epoll_ref ref, +bool udp_vu_reply_sock_data(const struct ctx *c, int s, flow_sidx_t tosidx, const struct timespec *now) { - flow_sidx_t tosidx = flow_sidx_opposite(ref.flowside); const struct flowside *toside = flowside_at_sidx(tosidx); bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)); - struct udp_flow *uflow = udp_at_sidx(ref.flowside); - int from_s = uflow->s[ref.flowside.sidei]; + struct udp_flow *uflow = udp_at_sidx(tosidx); struct vu_dev *vdev = c->vdev; struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; - uint8_t topif = pif_at_sidx(tosidx); int i;
ASSERT(uflow);
- if (topif != PIF_TAP) { - uint8_t frompif = pif_at_sidx(ref.flowside); - - flow_err(uflow, - "No support for forwarding UDP from %s to %s", - pif_name(frompif), pif_name(topif)); - return; - } + if (pif_at_sidx(tosidx) != PIF_TAP) + return false;
for (i = 0; i < UDP_MAX_FRAMES; i++) { ssize_t dlen; int iov_used;
- iov_used = udp_vu_sock_recv(c, from_s, v6, &dlen); + iov_used = udp_vu_sock_recv(c, s, v6, &dlen); if (iov_used <= 0) break; flow_trace(uflow, "Received 1 datagram on reply socket"); @@ -318,4 +312,6 @@ void udp_vu_reply_sock_data(const struct ctx *c, union epoll_ref ref, } vu_flush(vdev, vq, elem, iov_used); } + + return true; } diff --git a/udp_vu.h b/udp_vu.h index 4f2262d0..2299b51f 100644 --- a/udp_vu.h +++ b/udp_vu.h @@ -8,6 +8,7 @@
void udp_vu_listen_sock_data(const struct ctx *c, union epoll_ref ref, const struct timespec *now); -void udp_vu_reply_sock_data(const struct ctx *c, union epoll_ref ref, +bool udp_vu_reply_sock_data(const struct ctx *c, int s, flow_sidx_t tosidx, const struct timespec *now); + #endif /* UDP_VU_H */
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
In udp_reply_sock_handler() if we're unable to forward the datagrams we
just print an error. Generally this means we have an unsupported pair of
pifs in the flow table, though, and that hasn't change. So, next time we
get a matching packet we'll just get the same failure. In vhost-user mode
we don't even dequeue the incoming packets which triggered this so we're
likely to get the same failure immediately.
Instead, close the flow, in the same we we do for an unrecoverable error.
Signed-off-by: David Gibson
For UDP packets from the tap interface (like TCP) we use a hash table to
look up which flow they belong to. Unlike TCP, we sometimes also create a
hash table entry for the socket side of UDP flows. We need that when we
receive a UDP packet from a "listening" socket which isn't specific to a
single flow.
At present we only do this for the initiating side of flows, which re-use
the listening socket. For the target side we use a connected "reply"
socket specific to the single flow.
We have in mind changes that maye introduce some edge cases were we could
receive UDP packets on a non flow specific socket more often. To allow for
those changes - and slightly simplifying things in the meantime - always
put both sides of a UDP flow - tap or socket - in the hash table. It's
not that costly, and means we always have the option of falling back to a
hash lookup.
Signed-off-by: David Gibson
Currently udp_flow_new() open codes creating and connecting a socket to use
for reply messages. We have in mind some more places to use this logic,
plus it just makes for a rather large function. Split this handling out
into a new udp_flow_sock() function.
Signed-off-by: David Gibson
On Wed, 26 Mar 2025 14:44:00 +1100
David Gibson
As discussed on our recent call, I'm working towards using connected sockets on both sides of UDP flows. This series makes some preliminary reworks that simplify things and make that easier.
v2: * Added patches 5..7, other patches unchanged.
David Gibson (7): udp: Common invocation of udp_sock_errs() for vhost-user and "buf" paths udp: Simplify checking of epoll event bits udp_vu: Factor things out of udp_vu_reply_sock_data() loop udp: Share more logic between vu and non-vu reply socket paths udp: Better handling of failure to forward from reply socket udp: Always hash socket facing flowsides udp: Add helper function for creating connected UDP socket
Applied. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio