[PATCH 00/18] More pesto preliminaries
The number of preliminary patches in the pesto series has grown to the point that I think it's worth trying to get a bunch merged, while we polish the rest. Changes from pesto series v3 * Extracted just the preliminary patches, not introducing pesto itself * Several small revisions based on Stefano's feedback * Added several extra bits of preliminary rework. David Gibson (18): conf: runas can be const fwd: Comparing rule can be const vhost_user: Fix assorted minor cppcheck warnings serialise: Split functions user for serialisation from util.c serialise: Add helpers for serialising unsigned integers fwd: Move selecting correct scan bitmap into fwd_sync_one() fwd: Look up rule index in fwd_sync_one() fwd: Store forwarding tables indexed by (origin) pif fwd: Allow FWD_DUAL_STACK_ANY flag to be passed directly to fwd_rule_add() fwd, conf: Expose ephemeral ports as bitmap rather than function conf: Don't bother complaining about overlapping excluded ranges conf: Move check for mapping port 0 to caller conf: Move check for disabled interfaces earlier conf: Remove redundant warning when SO_BINDTODEVICE is unavailable pif: Limit pif names to IFNAMSIZ (16) bytes ip: Define a bound for the string returned by ipproto_name() bitmap: Split bitmap helper functions into their own module fwd: Split forwading rule specification from its implementation state Makefile | 24 +++---- bitmap.c | 99 ++++++++++++++++++++++++++++ bitmap.h | 24 +++++++ conf.c | 113 ++++++++++++++++---------------- flow.c | 35 +++++----- fwd.c | 177 ++++++++++++++++++++++++++------------------------- fwd.h | 38 +++-------- fwd_rule.h | 44 +++++++++++++ ip.c | 18 ++++-- ip.h | 2 + migrate.c | 1 + passt.h | 6 +- pcap.c | 1 + pif.c | 2 +- pif.h | 4 +- serialise.c | 123 +++++++++++++++++++++++++++++++++++ serialise.h | 21 ++++++ tcp.c | 1 + util.c | 154 +------------------------------------------- util.h | 12 ---- vhost_user.c | 16 +++-- virtio.h | 2 +- vu_common.c | 2 +- 23 files changed, 535 insertions(+), 384 deletions(-) create mode 100644 bitmap.c create mode 100644 bitmap.h create mode 100644 fwd_rule.h create mode 100644 serialise.c create mode 100644 serialise.h -- 2.53.0
This pointer stepping through the array of existing rules can be const.
For some reason cppcheck doesn't catch this at the moment, but does with
certain changes we're looking at for the dynamic configuration client.
Signed-off-by: David Gibson
This fixes a batch of minor incorrect format specifier and "could be const"
warnings in the vhost_user code. For some very strange reason, cppcheck
doesn't catch these errors right now, but does after code rearrangements
we're making for the dynamic forwarding update stuff.
Signed-off-by: David Gibson
Currently we store the inbound (PIF_HOST) and outbound (PIF_SPLICE)
forwarding tables in separate fields of struct ctx. In a number of places
this requires somewhat awkward if or switch constructs to select the
right table for updates. Conceptually simplify that by using an index of
forwarding tables by pif, which as a bonus keeps track generically which
pifs have implemented forwarding tables so far.
For now this doesn't simplify a lot textually, because many places that
need this also have other special cases to apply by pif. It does simplify
a few crucial places though, and we expect it will become more useful as
the flexibility of the forwarding table is improved.
Signed-off-by: David Gibson
In conf_ports() we die with an error if a port specification includes
overlapping excluded ranges. This is contrary to our usual convention of
handling conflicting options (last option wins, rather than error). Plus,
these options don't even conflict, they're just redundant.
This behaviour potentially makes life harder for scripts or other tools
invoking pasta - if they might have the same port excluded for multiple
different reasons, they have to explicitly deduplicate the list, rather
than just list everything on the command line. So, don't give this error,
let a port be excluded as many times as you like.
Signed-off-by: David Gibson
Add helpers to serialise/deserialise unsigned integers, handling endian
conversion so that the stream format is consistent regardless of host
endiannness. Currently we only use this on one place: sending the number
of flows during migration. We're going to have more use for this as we
add dynamic configuration updates, so these will become more useful.
For now we only need a 32-bit version, however define the functions with
a macro so we can easily add other integer widths when we need them.
Signed-off-by: David Gibson
Currently we have the slightly silly setup that fwd_listen_sync_() looks
up the pointer to a rule based on its index, then fwd_sync_one() turns it
back into an index. Avoid this by passing the index directly into
fwd_sync_one().
Signed-off-by: David Gibson
fwd_rule_add() takes a flags parameter, but it only allows the FWD_WEAK
and FWD_SCAN flags to be specified there. It doesn't allow the
FWD_DUAL_STACK_ANY flag to be set, instead expecting a [*] address to be
indicated by passing NULL as @addr.
However, for upcoming dynamic rule updates, it's more convenient to be able
to explicitly pass FWD_DUAL_STACK_ANY along with an address of ::. Allow
that mode of calling.
Signed-off-by: David Gibson
Both the runas variable in conf() and the parameter to conf_ugid() can be
a const pointer. For some reason cppcheck doesn't catch this now, but does
after some upcoming rearrangements for sharing code with the dynamic update
tool.
Signed-off-by: David Gibson
conf_ports_range_except() checks that it hasn't been asked to map port 0,
which generally won't work. However, there's only one callsite that
doesn't statically pass a non-zero value here. Change the check to an
assert() and move the actual error message to that callsite. This will
allow further cleanups later.
Signed-off-by: David Gibson
It turns out the only callers of fwd_port_is_ephemeral() use it to build a
bitmap of ephemeral ports. So, replace it with fwd_port_map_ephemeral(),
which directly builds that bitmap. As a bonus this allows a slightly
cheaper implementation of building the map, since inside fwd.c we know that
the ephemeral ports form a single range.
Signed-off-by: David Gibson
conf_ports_range_except() generates errors if asked to map an address of
a family that's not available. This is quite late to perform the check,
most callsites pass a NULL address, making the test moot, the one remaining
can check the address earlier. Move the check there, simplifying the
lower level function.
Signed-off-by: David Gibson
conf_ports() has logic to warn if -[TU] auto is specified but we can't use
SO_BINDTODEVICE. However, this is redundant with similar logic in
conf_ports_range_except(). The latter will be triggered both for an
explicit -[TU] auto and when it's invoked as a default option, so keep that
one and drop the one in conf_ports().
Signed-off-by: David Gibson
ipproto_name() returns a static string of theoretically unbounded length.
That's going to be inconvenient in future, so introduce IPPROTO_STRLEN
giving an explicit bound on the length. Use static_assert() and some
macros to ensure nothing we return can exceed this.
Signed-off-by: David Gibson
All current pif names are quite short, and we expect them to remain short
when/if we allow arbitrary pifs. However, because of the structure of
the current code we don't enforce any limit on the length.
This will become more important with dynamic configuration updates, so
start enforcing a length limit. Specifically we allow pif names to be up
to IFNAMSIZ bytes, including the terminating \0. This is semi-arbitrary -
there's no particular reason we have to use the same length limit as
kernel netif names. However, when we do allow arbitrary pifs, we expect
that we might support a similar number to the number of kernel interfaces.
It might make sense to use names matching kernel interface names in that
future. So, re-use IFNAMSIZ to avoid surprise.
Signed-off-by: David Gibson
Most of the fields in struct fwd_rule give the parameters of the forwarding
rule itself. The @socks field gives the list of listening sockets we use
to implement it. In order to share code with the configuraiton update tool
pesto, we want to split these. Move the rule specification itself into
fwd_rule.h, which can be shared with pesto.
Signed-off-by: David Gibson
Currently fwd_listen_sync_() determines which of the two port scanned
bitmaps is the correct one for a rule before passing it into
fwd_sync_one(). However, fwd_sync_one() is already looking at the rule
internals so is better placed to do this.
Signed-off-by: David Gibson
Currently bitmap functions are in util.[ch] along with a lot of other
stuff. In preparation for sharing them with a configuration client, move
these out into their own files.
Signed-off-by: David Gibson
On Fri, 27 Mar 2026 15:34:27 +1100
David Gibson
All current pif names are quite short, and we expect them to remain short when/if we allow arbitrary pifs. However, because of the structure of the current code we don't enforce any limit on the length.
This will become more important with dynamic configuration updates, so start enforcing a length limit. Specifically we allow pif names to be up to IFNAMSIZ bytes, including the terminating \0. This is semi-arbitrary - there's no particular reason we have to use the same length limit as kernel netif names. However, when we do allow arbitrary pifs, we expect that we might support a similar number to the number of kernel interfaces. It might make sense to use names matching kernel interface names in that future. So, re-use IFNAMSIZ to avoid surprise.
And what if... we used 128 instead, which is reasonably longer than UNIX_PATH_MAX (108, which despite the application usage in POSIX 2024.1: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_un.h.html is still commonly used a path length limit, also by passt itself)? At that point we could embed UNIX domain socket paths as PIF name (possibly with some additional specifier) which _might_ be useful to forward UNIX sockets in the https://bugs.passt.top/show_bug.cgi?id=200 sense. It's not the only way to implement it, but perhaps it's one possibility that might make sense for what we know now? What do you think? It also has the advantage of being sufficiently longer than IFNAMSIZ, so that should we ever need to have stuff like "container_A:eth0" in a PIF name, we could have it as well. -- Stefano
On Fri, 27 Mar 2026 15:34:12 +1100
David Gibson
The number of preliminary patches in the pesto series has grown to the point that I think it's worth trying to get a bunch merged, while we polish the rest.
Changes from pesto series v3 * Extracted just the preliminary patches, not introducing pesto itself * Several small revisions based on Stefano's feedback * Added several extra bits of preliminary rework.
Applied, except for 15/18 (see pending question), with two small typos in 18/18's message fixed (forwarding, configuration). -- Stefano
On Sun, Mar 29, 2026 at 02:02:27PM +0200, Stefano Brivio wrote:
On Fri, 27 Mar 2026 15:34:27 +1100 David Gibson
wrote: All current pif names are quite short, and we expect them to remain short when/if we allow arbitrary pifs. However, because of the structure of the current code we don't enforce any limit on the length.
This will become more important with dynamic configuration updates, so start enforcing a length limit. Specifically we allow pif names to be up to IFNAMSIZ bytes, including the terminating \0. This is semi-arbitrary - there's no particular reason we have to use the same length limit as kernel netif names. However, when we do allow arbitrary pifs, we expect that we might support a similar number to the number of kernel interfaces. It might make sense to use names matching kernel interface names in that future. So, re-use IFNAMSIZ to avoid surprise.
And what if... we used 128 instead, which is reasonably longer than UNIX_PATH_MAX (108, which despite the application usage in POSIX 2024.1: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_un.h.html is still commonly used a path length limit, also by passt itself)?
Well, certainly we could make pif names longer...
At that point we could embed UNIX domain socket paths as PIF name (possibly with some additional specifier) which _might_ be useful to forward UNIX sockets in the https://bugs.passt.top/show_bug.cgi?id=200 sense.
...but I don't think that rationale is compelling. The unix socket path would be the equivalent of IP+port, not the pif. I don't see how embedding a unix path in the pif would be useful.
It's not the only way to implement it, but perhaps it's one possibility that might make sense for what we know now? What do you think?
It also has the advantage of being sufficiently longer than IFNAMSIZ, so that should we ever need to have stuff like "container_A:eth0" in a PIF name, we could have it as well.
That's a more convincing case. Ok, I'll expand the pif name size for the next spin. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Mon, 30 Mar 2026 12:08:58 +1100
David Gibson
On Sun, Mar 29, 2026 at 02:02:27PM +0200, Stefano Brivio wrote:
On Fri, 27 Mar 2026 15:34:27 +1100 David Gibson
wrote: All current pif names are quite short, and we expect them to remain short when/if we allow arbitrary pifs. However, because of the structure of the current code we don't enforce any limit on the length.
This will become more important with dynamic configuration updates, so start enforcing a length limit. Specifically we allow pif names to be up to IFNAMSIZ bytes, including the terminating \0. This is semi-arbitrary - there's no particular reason we have to use the same length limit as kernel netif names. However, when we do allow arbitrary pifs, we expect that we might support a similar number to the number of kernel interfaces. It might make sense to use names matching kernel interface names in that future. So, re-use IFNAMSIZ to avoid surprise.
And what if... we used 128 instead, which is reasonably longer than UNIX_PATH_MAX (108, which despite the application usage in POSIX 2024.1: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_un.h.html is still commonly used a path length limit, also by passt itself)?
Well, certainly we could make pif names longer...
At that point we could embed UNIX domain socket paths as PIF name (possibly with some additional specifier) which _might_ be useful to forward UNIX sockets in the https://bugs.passt.top/show_bug.cgi?id=200 sense.
...but I don't think that rationale is compelling. The unix socket path would be the equivalent of IP+port, not the pif.
Right, that was the "obvious" idea we assumed so far, but I was thinking that if you want to stick to only PIFs and addresses in rules and flow tables (from what I understood, you would see some value in keeping that), we could think of modelling that UNIX domain socket as a PIF that can only map to a given TCP or UDP port (depending on the UNIX domain socket type) instead. It's a bit of a stretch but maybe it's convenient? On the other hand it would conflict with the current (and perhaps only reasonable) concept of PIF, that is, "the host", "the guest", etc.
I don't see how embedding a unix path in the pif would be useful.
...and maybe something else entirely: should we, one day, add support for multiple virtual machines, with each one connecting to different UNIX domain sockets (either data or vhost-user control sockets), would it be convenient to differentiate between those "TAP" (current name) PIFs using their UNIX domain socket path instead? Or would there be a separate table mapping PIF names to socket paths? This looks more elegant and flexible but I'm not sure if it's preferable to the former.
It's not the only way to implement it, but perhaps it's one possibility that might make sense for what we know now? What do you think?
It also has the advantage of being sufficiently longer than IFNAMSIZ, so that should we ever need to have stuff like "container_A:eth0" in a PIF name, we could have it as well.
That's a more convincing case. Ok, I'll expand the pif name size for the next spin.
By the way, I forgot to mention, Podman container IDs (the ID you get from 'podman create', and if I recall correctly also from Docker) are 64 hex digits, which might be convenient to represent as 64 (printable) characters (instead of 32 bytes) for users to refer to. The OCI specification (surprise) just says that it's a "string", without defining the maximum length: https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#state ...anyway, if Podman, Docker, or their users want to refer to containers using those IDs, that plus IFNAMSIZ would be 80 bytes. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio