[PATCH v3] pasta: make it possible to disable socket splicing
During testing it is sometimes useful to force traffic which would
normally be forwared by socket splicing through the tap interface.
In this commit, we add a command switch enabling such funtionality
for inbound local traffic.
For outbound local traffic this is much trickier, if even possible,
so leave that for a later commit.
Suggested-by: David Gibson
On Tue, 3 Dec 2024 16:53:02 -0500
Jon Maloy
diff --git a/passt.1 b/passt.1 index b2896a2..c8a5783 100644 --- a/passt.1 +++ b/passt.1 @@ -695,6 +695,10 @@ Configure MAC address \fIaddr\fR on the tap interface in the namespace.
Default is to let the tap driver build a pseudorandom hardware address.
+.TP +.BR \-\-no-splice +Disable socket splicing for host to NS traffic.
It's not necessarily clear to users what "NS" is: we never use it in the man page. To keep this quick, if you agree, I would fix this up on merge with: .TP .BR \-\-no-splice Disable the bypass path for inbound, local traffic. See the section \fBHandling of local traffic in pasta\fR in the \fBNOTES\fR for more details. Everything else looks good to me. -- Stefano
On Tue, Dec 03, 2024 at 04:53:02PM -0500, Jon Maloy wrote:
During testing it is sometimes useful to force traffic which would normally be forwared by socket splicing through the tap interface.
In this commit, we add a command switch enabling such funtionality for inbound local traffic.
For outbound local traffic this is much trickier, if even possible, so leave that for a later commit.
Suggested-by: David Gibson
Signed-off-by: Jon Maloy
Reviewed-by: David Gibson
--- v2: Some minor changes based on feedback from PASST team v3: More changes based on feedback from D. Gibson and S. Brivio -Moved new option to pasta-only section -Added description to man-page --- conf.c | 7 ++++++- fwd.c | 2 +- passt.1 | 4 ++++ passt.h | 2 ++ 4 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/conf.c b/conf.c index eaa7d99..53f6770 100644 --- a/conf.c +++ b/conf.c @@ -977,7 +977,8 @@ pasta_opts: " Don't copy all routes to namespace\n" " --no-copy-addrs DEPRECATED:\n" " Don't copy all addresses to namespace\n" - " --ns-mac-addr ADDR Set MAC address on tap interface\n"); + " --ns-mac-addr ADDR Set MAC address on tap interface\n" + " --no-splice Disable inbound socket splicing\n");
exit(status); } @@ -1319,6 +1320,7 @@ void conf(struct ctx *c, int argc, char **argv) {"no-dhcpv6", no_argument, &c->no_dhcpv6, 1 }, {"no-ndp", no_argument, &c->no_ndp, 1 }, {"no-ra", no_argument, &c->no_ra, 1 }, + {"no-splice", no_argument, &c->no_splice, 1 }, {"freebind", no_argument, &c->freebind, 1 }, {"no-map-gw", no_argument, &no_map_gw, 1 }, {"ipv4-only", no_argument, NULL, '4' }, @@ -1756,6 +1758,9 @@ void conf(struct ctx *c, int argc, char **argv) } } while (name != -1);
+ if (c->mode == MODE_PASST) + c->no_splice = 1; + if (c->mode == MODE_PASTA && !c->pasta_conf_ns) { if (copy_routes_opt) die("--no-copy-routes needs --config-net"); diff --git a/fwd.c b/fwd.c index 0b7f8b1..2829cd2 100644 --- a/fwd.c +++ b/fwd.c @@ -443,7 +443,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, else if (proto == IPPROTO_UDP) tgt->eport += c->udp.fwd_in.delta[tgt->eport];
- if (c->mode == MODE_PASTA && inany_is_loopback(&ini->eaddr) && + if (!c->no_splice && inany_is_loopback(&ini->eaddr) && (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { /* spliceable */
diff --git a/passt.1 b/passt.1 index b2896a2..c8a5783 100644 --- a/passt.1 +++ b/passt.1 @@ -695,6 +695,10 @@ Configure MAC address \fIaddr\fR on the tap interface in the namespace.
Default is to let the tap driver build a pseudorandom hardware address.
+.TP +.BR \-\-no-splice +Disable socket splicing for host to NS traffic. + .SH EXAMPLES
.SS \fBpasta diff --git a/passt.h b/passt.h index c038630..0dd4efa 100644 --- a/passt.h +++ b/passt.h @@ -229,6 +229,7 @@ struct ip6_ctx { * @no_dhcpv6: Disable DHCPv6 server * @no_ndp: Disable NDP handler altogether * @no_ra: Disable router advertisements + * @no_splice: Disable socket splicing for inbound traffic * @host_lo_to_ns_lo: Map host loopback addresses to ns loopback addresses * @freebind: Allow binding of non-local addresses for forwarding * @low_wmem: Low probed net.core.wmem_max @@ -291,6 +292,7 @@ struct ctx { int no_dhcpv6; int no_ndp; int no_ra; + int no_splice; int host_lo_to_ns_lo; int freebind;
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 3 Dec 2024 16:53:02 -0500
Jon Maloy
During testing it is sometimes useful to force traffic which would normally be forwared by socket splicing through the tap interface.
In this commit, we add a command switch enabling such funtionality for inbound local traffic.
For outbound local traffic this is much trickier, if even possible, so leave that for a later commit.
Suggested-by: David Gibson
Signed-off-by: Jon Maloy --- v2: Some minor changes based on feedback from PASST team v3: More changes based on feedback from D. Gibson and S. Brivio -Moved new option to pasta-only section -Added description to man-page --- conf.c | 7 ++++++- fwd.c | 2 +- passt.1 | 4 ++++ passt.h | 2 ++ 4 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/conf.c b/conf.c index eaa7d99..53f6770 100644 --- a/conf.c +++ b/conf.c @@ -977,7 +977,8 @@ pasta_opts: " Don't copy all routes to namespace\n" " --no-copy-addrs DEPRECATED:\n" " Don't copy all addresses to namespace\n" - " --ns-mac-addr ADDR Set MAC address on tap interface\n"); + " --ns-mac-addr ADDR Set MAC address on tap interface\n" + " --no-splice Disable inbound socket splicing\n");
exit(status); } @@ -1319,6 +1320,7 @@ void conf(struct ctx *c, int argc, char **argv) {"no-dhcpv6", no_argument, &c->no_dhcpv6, 1 }, {"no-ndp", no_argument, &c->no_ndp, 1 }, {"no-ra", no_argument, &c->no_ra, 1 }, + {"no-splice", no_argument, &c->no_splice, 1 }, {"freebind", no_argument, &c->freebind, 1 }, {"no-map-gw", no_argument, &no_map_gw, 1 }, {"ipv4-only", no_argument, NULL, '4' }, @@ -1756,6 +1758,9 @@ void conf(struct ctx *c, int argc, char **argv) } } while (name != -1);
+ if (c->mode == MODE_PASST) + c->no_splice = 1;
Oops, sorry, I missed this during review, but tests caught it: this needs to be if (c->mode != MODE_PASTA) to also include the MODE_VU case, otherwise:
+ if (c->mode == MODE_PASTA && !c->pasta_conf_ns) { if (copy_routes_opt) die("--no-copy-routes needs --config-net"); diff --git a/fwd.c b/fwd.c index 0b7f8b1..2829cd2 100644 --- a/fwd.c +++ b/fwd.c @@ -443,7 +443,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, else if (proto == IPPROTO_UDP) tgt->eport += c->udp.fwd_in.delta[tgt->eport];
- if (c->mode == MODE_PASTA && inany_is_loopback(&ini->eaddr) && + if (!c->no_splice && inany_is_loopback(&ini->eaddr) &&
...this becomes true, and we eventually hit tcp_splice_conn_from_sock() with passt in vhost-user mode.
(proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { /* spliceable */
Tests fail here (240 columns wide, you might need to copy and paste this): guest$ which socat ip jq >/dev/null │Starting tests in file: passt_vu_in_ns/tcp guest$ socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc │ guest$ cmp test_big.bin /root/big.bin │Starting test: TCP/IPv4: host to guest: big transfer guest$ socat -u OPEN:/root/big.bin TCP4:192.0.2.1:10003 │...passed. guest$ socat -u OPEN:/root/big.bin TCP4:192.0.2.2:10002 │ guest$ socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc │Starting test: TCP/IPv4: host to ns (spliced): big transfer │? cmp /tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_ns_big.bin /home/sbrivio/passt/test/big.bin ==> /home/sbrivio/passt/test/test_logs/context_qemu.log <== │...passed. qemu-system-x86_64: Failed to set msg fds. │ qemu-system-x86_64: vhost VQ 0 ring restore failed: -22: Invalid argument (22) │Starting test: TCP/IPv4: guest to host: big transfer qemu-system-x86_64: Failed to set msg fds. │? cmp /tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_big.bin /home/sbrivio/passt/test/big.bin qemu-system-x86_64: vhost VQ 1 ring restore failed: -22: Invalid argument (22) │...passed. │ ──guest────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤Starting test: TCP/IPv4: guest to ns: big transfer enp9s0 │? cmp /tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_ns_big.bin /home/sbrivio/passt/test/big.bin ns$ ip addr add 192.0.2.1/32 dev enp9s0 │...passed. ns$ ip addr del 192.0.2.1/32 dev enp9s0 │ ns$ ip addr add 2001:db8::1 dev enp9s0 && sleep 2 │Starting test: TCP/IPv4: ns to host (spliced): big transfer ns$ ip addr del 2001:db8::1 dev enp9s0 │? cmp /tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_big.bin /home/sbrivio/passt/test/big.bin ns$ which socat ip jq >/dev/null │...passed. ns$ socat -u TCP4-LISTEN:10002 OPEN:/tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_ns_big.bin,create,trunc │ ns$ socat -u TCP4-LISTEN:10002 OPEN:/tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_ns_big.bin,create,trunc │Starting test: TCP/IPv4: ns to host (via tap): big transfer ns$ socat -u OPEN:/home/sbrivio/passt/test/big.bin TCP4:127.0.0.1:10003 │? cmp /tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_big.bin /home/sbrivio/passt/test/big.bin ns$ socat -u OPEN:/home/sbrivio/passt/test/big.bin TCP4:192.0.2.1:10003 │...passed. ns$ socat -u OPEN:/home/sbrivio/passt/test/big.bin TCP4:127.0.0.1:10001 │ 2024/12/05 20:46:19 socat[4796] E write(7, 0x564e4d181000, 8192): Connection reset by peer │Starting test: TCP/IPv4: ns to guest (using loopback address): big transfer ns$ │ ──namespace─────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────┴──passt_vu_in_ns/tcp [7/32] - TCP/IPv4: ns to guest (using loopback address): big transfer────────── /' │ 2a01:4ff:ff00::add:1 host$ ip -j -6 addr show|jq -rM '[.[] | select(.ifname == "enp9s0").addr_info[] | select(.scope == "global" and .depreca│You can start qemu with: ted != true).local] | .[0]' │ kvm ... -chardev socket,id=chr0,path=/tmp/passt-tests-s5FGIm/passt_in_ns/passt.socket -netdev vhost-user,id=netdev0 2a01:4f8:222:904::2 │,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa nod host$ ip -j -6 route show|jq -rM '[.[] | select(.dst == "default").gateway] | .[0]' │e,memdev=memfd0 fe80::1 │ host$ sed -n 's/^nameserver \([^:]*:\)\([^%]*\).*//p' /etc/resolv.conf | tr ' │accepted connection from PID 4763 ' ',' | sed 's/,$//;s/$/ │==4761== Warning: set address range perms: large range [0x59c8f000, 0x119c8f000) (defined) /' │==4761== Warning: set address range perms: large range [0x119c8f000, 0x519c8f000) (defined) 2a01:4ff:ff00::add:2,2a01:4ff:ff00::add:1 │NDP: received RS, sending RA host$ sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)//p' | tr ' │DHCP: offer to discover ' ',' | sed 's/,$//;s/$/ │ from 52:54:00:12:34:56 /' │DHCP: ack to request host$ which socat ip jq >/dev/null │ from 52:54:00:12:34:56 host$ socat -u OPEN:/home/sbrivio/passt/test/big.bin TCP4:127.0.0.1:10001 │DHCPv6: received SOLICIT, sending ADVERTISE host$ socat -u OPEN:/home/sbrivio/passt/test/big.bin TCP4:127.0.0.1:10002 │DHCPv6: received REQUEST/RENEW/CONFIRM, sending REPLY host$ socat -u TCP4-LISTEN:10003 OPEN:/tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_big.bin,create,trunc │NDP: received NS, sending NA host$ socat -u TCP4-LISTEN:10003 OPEN:/tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_big.bin,create,trunc │ASSERTION FAILED in tcp_splice_conn_from_sock (tcp_splice.c:428): c->mode == MODE_PASTA host$ socat -u TCP4-LISTEN:10003 OPEN:/tmp/passt-tests-s5FGIm/passt_vu_in_ns/tcp/test_big.bin,create,trunc │Bad system call host$ │ ──host──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──passt in pasta (namespace)─────────────────────────────────────────────────────────────────────────────────────────── Testing commit: bae9a55 udp_vu: update segment size PASS: 166 | FAIL: 0 | 2024-12-05T19:57:40+00:00 -- Stefano
On Thu, Dec 05, 2024 at 09:06:26PM +0100, Stefano Brivio wrote:
On Tue, 3 Dec 2024 16:53:02 -0500 Jon Maloy
wrote: During testing it is sometimes useful to force traffic which would normally be forwared by socket splicing through the tap interface.
In this commit, we add a command switch enabling such funtionality for inbound local traffic.
For outbound local traffic this is much trickier, if even possible, so leave that for a later commit.
Suggested-by: David Gibson
Signed-off-by: Jon Maloy --- v2: Some minor changes based on feedback from PASST team v3: More changes based on feedback from D. Gibson and S. Brivio -Moved new option to pasta-only section -Added description to man-page --- conf.c | 7 ++++++- fwd.c | 2 +- passt.1 | 4 ++++ passt.h | 2 ++ 4 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/conf.c b/conf.c index eaa7d99..53f6770 100644 --- a/conf.c +++ b/conf.c @@ -977,7 +977,8 @@ pasta_opts: " Don't copy all routes to namespace\n" " --no-copy-addrs DEPRECATED:\n" " Don't copy all addresses to namespace\n" - " --ns-mac-addr ADDR Set MAC address on tap interface\n"); + " --ns-mac-addr ADDR Set MAC address on tap interface\n" + " --no-splice Disable inbound socket splicing\n");
exit(status); } @@ -1319,6 +1320,7 @@ void conf(struct ctx *c, int argc, char **argv) {"no-dhcpv6", no_argument, &c->no_dhcpv6, 1 }, {"no-ndp", no_argument, &c->no_ndp, 1 }, {"no-ra", no_argument, &c->no_ra, 1 }, + {"no-splice", no_argument, &c->no_splice, 1 }, {"freebind", no_argument, &c->freebind, 1 }, {"no-map-gw", no_argument, &no_map_gw, 1 }, {"ipv4-only", no_argument, NULL, '4' }, @@ -1756,6 +1758,9 @@ void conf(struct ctx *c, int argc, char **argv) } } while (name != -1);
+ if (c->mode == MODE_PASST) + c->no_splice = 1;
Oops, sorry, I missed this during review, but tests caught it: this needs to be if (c->mode != MODE_PASTA) to also include the MODE_VU case, otherwise:
Good point. Sorry I missed this on review. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (3)
-
David Gibson
-
Jon Maloy
-
Stefano Brivio