[PATCH v3 00/11] RFC: ICMP reworks preliminary to flow table integration
As with TCP, it turns out that there are a bunch of clean ups and reworks to the ICMP code which will make integration with the flow table easier, even before introducing a non-trivial version of the flow table itself. Based on the flow based dispatch/allocation, and bind/addressing cleanup series. Changes since v2: * Rebased on current main and revised flow dispatch series * Standardise on the name 'id_sock' instead of 'id_map' for pointers to specific entries in the id_map * Avoid confusing usage of the word "sequence" to mean a flow of ping packets as opposed to the ping sequence number. * Use packet_get() rather than explicit comparisons to validate packet lengths Changes since v1: * Rebased on newer version of flow dispatch & allocation series * Added 12/12 splitting out close and new sequence functions David Gibson (11): icmp: Don't set "port" on destination sockaddr for ping sockets icmp: Remove redundant initialisation of sendto() address icmp: Don't attempt to handle "wrong direction" ping socket traffic icmp: Don't attempt to match host IDs to guest IDs icmp: Use -1 to represent "missing" sockets icmp: Simplify socket expiry scanning icmp: Share more between IPv4 and IPv6 paths in icmp_tap_handler() icmp: Consolidate icmp_sock_handler() with icmpv6_sock_handler() icmp: Warn on receive errors from ping sockets icmp: Validate packets received on ping sockets icmp: Dedicated functions for starting and closing ping sequences icmp.c | 329 ++++++++++++++++++++++++++++---------------------------- icmp.h | 5 +- passt.c | 4 +- 3 files changed, 166 insertions(+), 172 deletions(-) -- 2.43.0
We set the port to the ICMP id on the sendto() address when using ICMP
ping sockets. However, this has no effect: the ICMP id the kernel
uses is determined only by the "port" on the socket's *bound* address
(which is constructed inside sock_l4(), using the id we also pass to
it).
For unclear reasons this change triggers cppcheck 2.13.0 to give new
"variable could be const pointer" warnings, so make *ih const as well to
fix that.
Signed-off-by: David Gibson
We initialise the address portion of the sockaddr for sendto() to the
unspecified address, but then always overwrite it with the actual
destination address before we call the sendto().
Signed-off-by: David Gibson
Linux ICMP "ping" sockets are very specific in what they do. They let
userspace send ping requests (ICMP_ECHO or ICMP6_ECHO_REQUEST), and receive
matching replies (ICMP_ECHOREPLY or ICMP6_ECHO_REPLY). They don't let you
intercept or handle incoming ping requests.
In the case of passt/pasta that means we can process echo requests from tap
and forward them to a ping socket, then take the replies from the ping
socket and forward them to tap. We can't do the reverse: take echo
requests from the host and somehow forward them to the guest. There's
really no way for something outside to initiate a ping to a passt/pasta
connected guest and if there was we'd need an entirely different mechanism
to handle it.
However, we have some logic to deal with packets going in that reverse
direction. Remove it, since it can't ever be used that way. While we're
there use defines for the ICMPv6 types, instead of open coded type values.
Signed-off-by: David Gibson
When forwarding pings from tap, currently we create a ping socket with
a socket address whose port is set to the ID of the ping received from the
guest. This causes the socket to send pings with the same ID on the host.
Although this seems look a good idea for maximum transparency, it's
probably unwise.
First, it's fallible - the bind() could fail, and we already have fallback
logic which will overwrite the packets with the expected guest id if the
id we get on replies doesn't already match. We might as well do that
unconditionally.
But more importantly, we don't know what else on the host might be using
ping sockets, so we could end up with an ID that's the same as an existing
socket. You'd expect that to fail the bind() with EADDRINUSE, which would
be fine: we'd fall back to rewriting the reply ids. However it appears the
kernel (v6.6.3 at least), does *not* fail the bind() and instead it's
"last socket wins" in terms of who gets the replies. So we could
accidentally intercept ping replies for something else on the host.
So, instead of using bind() to set the id, just let the kernel pick one
and expect to translate the replies back. Although theoretically this
makes the passt/pasta link a bit less "transparent", essentially nothing
cares about specific ping IDs, much like TCP source ports, which we also
don't preserve.
Signed-off-by: David Gibson
icmp_id_map[] contains, amongst other things, fds for "ping" sockets
associated with various ICMP echo ids. However, we only lazily open()
those sockets, so many will be missing. We currently represent that with
a 0, which isn't great, since that's technically a valid fd. Use -1
instead. This does require initializing the fields in icmp_id_map[] but
we already have an obvious place to do that.
Signed-off-by: David Gibson
Currently we use icmp_act[] to scan for ICMP ids which might have an open
socket which could time out. However icmp_act[] contains no information
that's not already in icmp_id_map[] - it's just an "index" which allows
scanning for relevant entries with less cache footprint.
We only scan for ICMP socket expiry every 1s, though, so it's not clear
that cache footprint really matters. Furthermore, there's no strong reason
we need to scan even that often - the timeout is fairly arbitrary and
approximate.
So, eliminate icmp_act[] in favour of directly scanning icmp_id_map[] and
compensate for the cache impact by reducing the scan frequency to once
every 10s.
Signed-off-by: David Gibson
Currently icmp_tap_handler() consists of two almost disjoint paths for the
IPv4 and IPv6 cases. The only thing they share is an error message.
We can use some intermediate variables to refactor this to share some more
code between those paths.
Signed-off-by: David Gibson
Currently we have separate handlers for ICMP and ICMPv6 ping replies.
Although there are a number of points of difference, with some creative
refactoring we can combine these together sensibly. Although it doesn't
save a vast amount of code, it does make it clearer that we're performing
basically the same steps for each case.
Signed-off-by: David Gibson
Currently we silently ignore an errors receiving a packet from a ping
socket. We don't expect that to happen, so it's probably worth reporting
if it does.
Signed-off-by: David Gibson
We access fields of packets received from ping sockets assuming they're
echo replies, without actually checking that. Of course, we don't expect
anything else from the kernel, but it's probably best to verify.
While we're at it, also check for short packets, or a receive address of
the wrong family.
Signed-off-by: David Gibson
ICMP sockets are cleaned up on a timeout implemented in icmp_timer_one(),
and the logic to do that cleanup is open coded in that function. Similarly
new sockets are opened when we discover we don't have an existing one in
icmp_tap_handler(), and again the logic is open-coded.
That's not the worst thing, but it's a bit cleaner to have dedicated
functions for the creation and destruction of ping sockets. This will also
make things a bit easier for future changes we have in mind.
Signed-off-by: David Gibson
On Tue, 16 Jan 2024 16:16:07 +1100
David Gibson
As with TCP, it turns out that there are a bunch of clean ups and reworks to the ICMP code which will make integration with the flow table easier, even before introducing a non-trivial version of the flow table itself.
Based on the flow based dispatch/allocation, and bind/addressing cleanup series.
Applied. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio