[PATCH 00/10] RFC: Unify and simplify tap send path
Although we have an abstraction for the "slow path" (DHCP, NDP) guest bound packets, the TCP and UDP forwarding paths write directly to the tap fd. However, it turns out how they send frames to the tap device is more similar than it originally appears. This series unifies the low-level tap send functions for TCP and UDP, and makes some clean ups along the way. David Gibson (10): pcap: Introduce pcap_frame() helper pcap: Replace pcapm() with pcap_multiple() tcp: Combine two parts of passt tap send path together tcp: Don't keep compute total bytes in a message until we need it tcp: Improve interface to tcp_l2_buf_flush() tcp: Combine two parts of pasta tap send path together tap, tcp: Move tap send path to tap.c tcp,tap: Use different io vector bases depending on tap type udp: Use tap_send_frames() tap: Improve handling of partial frame sends pcap.c | 78 ++++++++----------------------- pcap.h | 3 +- tap.c | 108 ++++++++++++++++++++++++++++++++++++++++++ tap.h | 1 + tcp.c | 145 +++++++++++++-------------------------------------------- udp.c | 145 +++------------------------------------------------------ udp.h | 2 +- 7 files changed, 169 insertions(+), 313 deletions(-) -- 2.38.1
pcap(), pcapm() and pcapmm() duplicate some code, for the actual writing
to the capture file. The main purpose pf pcapm() and pcampp() not calling
pcap seems to be to avoid repeatedly calling gettimeofday(). We can
accomplish that while still sharing code by adding a new helper which
takes the packet timestamp as a parameter.
Signed-off-by: David Gibson
pcapm() captures multiple frames from a msghdr, however the only thing it
cares about in the msghdr is the list of buffers, where it assumes there is
one frame to capture per buffer. That's what we want for its single caller
but it's not the only obvious choice here (one frame per msghdr would
arguably make more sense in isolation). In addition pcapm() has logic
that only makes sense in the context of the passt specific path its called
from: it skips the first 4 bytes of each buffer, because those have the
qemu vnet_len rather than the frame proper.
Make this clearer by replacing pcapm() with pcap_multiple() which more
explicitly takes one struct iovec per frame, and parameterizes how much of
each buffer to skip (i.e. the offset of the frame within the buffer).
Signed-off-by: David Gibson
tcp_l2_buf_flush() open codes the "primary" send of message to the passt
tap interface, but calls tcp_l2_buf_flush_part() to handle the case of a
short send. Combine these two passt-specific operations into
tcp_l2_buf_flush_passt() which is a little cleaner and will enable furrther
cleanups.
Signed-off-by: David Gibson
tcp[46]_l2_buf_bytes keep track of the total number of bytes we have
queued to send to the tap interface. tcp_l2_buf_flush_passt() uses this
to determine if sendmsg() has sent all the data we requested, or whether
we need to resend a trailing portion.
However, the logic for finding where we're up to in the case of a short
sendmsg() can equally well tell whether we've had one at all, without
knowing the total number in advance. This does require an extra loop after
each sendmsg(), but it's doing simple arithmetic on values we've already
been accessing, and it leads to overall simpler code.
Signed-off-by: David Gibson
Currently this takes a msghdr, but the only thing we actually care about
in there in is the io vector. Make it take an io vector directly. We also
have a weird side effect of zeroing @buf_used. Just pass this by value
and zero it in the caller instead.
Signed-off-by: David Gibson
tcp_l2_buf_flush() open codes the loop across each frame in a group, but
but calls tcp_l2_buf_write_one() to send each frame to the pasta tuntap
device. Combine these two pasta-specific operations into
tcp_l2_buf_flush_pasta() which is a little cleaner and will enable further
cleanups.
Signed-off-by: David Gibson
The functions which do the final steps of sending TCP packets on through
the tap interface - tcp_l2_buf_flush*() - no longer have anything that's
actually specific to TCP in them, other than comments and names. Move them
all to tap.c.
Signed-off-by: David Gibson
Currently tap_send_frames() expects the frames it is given to include the
vnet_len field, even in pasta mode which doesn't use it (although it need
not be initialized in that case). This will inconvenience future changes,
so alter it to expect just the frame as appropriate for the tap backend
type. We alter the TCP code which uses it to match, setting up the base
of iovec to include or exclude the vnet_len as needed.
Signed-off-by: David Gibson
To send frames on the tap interface, the UDP uses a fairly complicated two
level batching. First multiple frames are gathered into a single "message"
for the qemu stream socket, then multiple messages are send with
sendmmsg(). We now have tap_send_frames() which already deals with sending
a number of frames, including batching and handling partial sends. Use
that to considerably simplify things.
This does make a couple of behavioural changes:
* We no longer split messages to keep them under 64kiB, which comments
say is necessary. However the TCP code didn't have equivalent code, so
either this isn't actually needed, or we should implement for both
(which is now easier since it can be done in one place).
* Previously when we got a partial send on UDP, we would resend the
remainder of the entire "message", including multiple frames. The
common code now only resends the remainder of a single frame, simply
dropping any frames which weren't even partially sent. This is what
TCP always did and is probably a better idea for UDP too.
Signed-off-by: David Gibson
In passt mode, when writing frames to the qemu socket, we might get a short
send. If we ignored this and carried on, the qemu socket would get out of
sync, because the bytes we actually sent wouldn't correspond to the length
header we already sent. tap_send_frames_passt() handles that by doing a
a blocking send to complete the message, but it has a few flaws:
* We only attempt to resend once: although it's unlikely in practice,
nothing prevents the blocking send() from also being short
* We print a debug error if send() returns non-zero.. but send() returns
the number of bytes sent, so we actually want it to return the length
of the remaining data.
Correct those flaws and also be a bit more thorough about reporting
problems here.
Signed-off-by: David Gibson
participants (1)
-
David Gibson