[PATCH v2 0/6] vhost-user: Add multiqueue support
This series implements multiqueue support for vhost-user mode, allowing passt to utilize multiple queue pairs. There is no improved network performance because we keep using only one thread. The implementation introduces a --max-queues parameter to configure up to 16 queue pairs (32 virtqueues) in vhost-user mode. Packets are routed to the appropriate RX queue based on which TX queue they originated from, enabling the guest kernel to distribute network traffic across multiple queues and vCPUs. This series adds: - configuration support for multiqueue via --max-queues parameter - one packet pool per queue pair instead of shared pools - queue parameter threading throughout the network stack - a significant refactoring that propagates queue information through all protocol handlers (TCP, UDP, ICMP, ARP, DHCP, DHCPv6, NDP) - flow-aware queue routing that matches RX queue selection to the incoming TX queue, maintaining proper packet affinity - test coverage with VHOST_USER_MQ environment variable to validate multiqueue functionality across all protocols (TCP, UDP, ICMP) and services (DHCP, NDP) Current behavior: TX queue selection is controlled by the guest kernel, while RX packets are routed to queues based on their associated flows. Host-initiated flows currently default to queue 0. The RX queue of a flow is updated on each new packet from the TX queue to maintain affinity. The changes maintain backward compatibility - without --max-queues, behavior remains unchanged with single-queue operation. v2: - New patch: "tap: Remove pool parameter from tap4_handler() and tap6_handler()" to clean up unused parameters before adding queue pair parameter - Changed to one packet pool per queue pair instead of shared pools across all queue pairs - Split "multiqueue: Add queue-aware flow management..." into two patches: - "tap: Add queue pair parameter throughout the packet processing path" - "flow: Add queue pair tracking to flow management" - Updated test infrastructure patch with refined implementation Laurent Vivier (6): tap: Remove pool parameter from tap4_handler() and tap6_handler() vhost-user: Enable multiqueue vhost-user: Add queue pair parameter throughout the network stack tap: Add queue pair parameter throughout the packet processing path flow: Add queue pair tracking to flow management test: Add multiqueue support to vhost-user test infrastructure arp.c | 12 ++-- arp.h | 4 +- conf.c | 31 ++++++++- dhcp.c | 5 +- dhcp.h | 2 +- dhcpv6.c | 12 ++-- dhcpv6.h | 2 +- flow.c | 30 +++++++++ flow.h | 17 +++++ fwd.c | 4 +- icmp.c | 25 ++++--- icmp.h | 2 +- ndp.c | 32 +++++---- ndp.h | 5 +- passt.h | 2 + tap.c | 173 +++++++++++++++++++++++++++++-------------------- tap.h | 20 +++--- tcp.c | 66 +++++++++++-------- tcp.h | 11 ++-- tcp_vu.c | 8 ++- test/lib/setup | 58 +++++++++++++---- test/run | 23 +++++++ udp.c | 39 ++++++----- udp.h | 12 ++-- udp_flow.c | 8 ++- udp_flow.h | 2 +- udp_vu.c | 4 +- vhost_user.c | 38 ++++++----- virtio.h | 2 +- vu_common.c | 15 +++-- vu_common.h | 3 +- 31 files changed, 443 insertions(+), 224 deletions(-) -- 2.51.0
These handlers only ever operate on their respective global pools
(pool_tap4 and pool_tap6). The pool parameter was always passed the
same value, making it unnecessary indirection.
Access the global pools directly instead, simplifying the function
signatures.
Signed-off-by: Laurent Vivier
Add the --max-queues parameter to specify the maximum number of queue
pairs supported in vhost-user mode. This enables multi-queue support
by allowing configuration of up to 16 queue pairs (32 virtqueues).
For the moment, only the first RX queue is used, the TX queue is
selected by the guest kernel.
Signed-off-by: Laurent Vivier
For vhost-user multiqueue support, we need to maintain separate packet
pools for each queue pair, as packets from different queues should be
processed independently.
Previously, we had single global packet pools (pool_tap4 and pool_tap6)
for IPv4 and IPv6 packets. This worked fine for single-queue scenarios
(passt and pasta), but doesn't support multiple concurrent receive queues.
Convert these to arrays of pools, one per queue pair, indexed by the
queue pair number. The queue pair is simply the virtqueue index divided
by 2 (since each queue pair consists of one RX and one TX virtqueue).
Add a qpair parameter throughout the packet processing pipeline:
- tap4_handler() and tap6_handler(): specify which queue's packets to process
- tap_flush_pools(): specify which queue's pools to flush
- tap_handler(): specify which queue to handle
- tap_add_packet(): specify which queue's pool to add the packet to
For passt and pasta (single queue), all calls use qpair=0. For vhost-user
multiqueue, the queue pair is derived from the virtqueue index (index / 2)
in vu_handle_tx().
The pool initialization in tap_sock_update_pool() is updated to initialize
all queue pairs' pools from the same underlying buffer.
Signed-off-by: Laurent Vivier
Add a queue pair parameter to vu_send_single() and propagate this parameter
through the entire network stack call chain. The queue pair parameter specifies
which queue pair to use for sending packets in vhost-user mode.
Functions modified to accept and propagate the queue pair parameter:
- Core sending: tap_send_single(), vu_send_single()
- UDP/ICMP helpers: tap_udp4_send(), tap_udp6_send(), tap_icmp4_send(),
tap_icmp6_send()
- Protocol handlers: arp(), dhcp(), dhcpv6(), ndp(), tcp_rst_no_conn()
- ARP helpers: arp_announce()
- NDP helpers: ndp_send(), ndp_na(), ndp_ra(), ndp_unsolicited_na()
- UDP error handling: udp_send_tap_icmp4(), udp_send_tap_icmp6()
All callers currently pass queue pair #0 to preserve existing
behavior. This is a preparatory step for enabling multi-queue and
per-queue worker threads in vhost-user mode.
No functional change.
Signed-off-by: Laurent Vivier
With the recent addition of multiqueue support to passt's vhost-user
implementation, we need test coverage to validate the functionality. The
test infrastructure previously only tested single queue configurations.
Add a VHOST_USER_MQ environment variable to control the number of queue
pairs. When set to values greater than 1, the setup scripts pass
--max-qpairs to passt and configure QEMU's vhost-user netdev with the
corresponding queues= parameter.
The test suite now runs an additional set of tests with 8 queue pairs to
exercise the multiqueue paths across all protocols (TCP, UDP, ICMP) and
services (DHCP, NDP). Note that the guest kernel will only enable as many
queues as there are vCPUs.
Signed-off-by: Laurent Vivier
For multiqueue support, we need to ensure packets are routed to the
correct RX queue based on which TX queue they originated from. This
requires tracking the queue pair association for each flow.
Add a qpair field to struct flow_common to store the queue pair number
for each flow (FLOW_QPAIR_INVALID if not assigned). The field uses 5
bits, allowing support for up to 31 queue pairs (index 31 is reserved
for FLOW_QPAIR_INVALID), which we verify is sufficient for
VHOST_USER_MAX_VQS via static assertion.
Introduce flow_qp() to retrieve the queue pair for a flow (returning 0
for NULL flows or flows without a valid assignment), and flow_setqp()
to assign queue pairs. Update all protocol handlers (TCP, UDP, ICMP)
and their tap handlers to accept a qpair parameter and assign it to
flows using FLOW_SETQP().
The vhost-user code now uses FLOW_QP() to select the appropriate RX
queue when sending packets, ensuring they're routed based on the
originating TX queue rather than always using queue 0.
Note that flows initiated from the host side (via sockets, for example
udp_flow_from_sock()) currently default to queue pair 0, as they don't
have an associated incoming queue to derive the assignment from.
Signed-off-by: Laurent Vivier
Nit: It's not very obvious to me what the salient difference is between "throughout the network stack" (previous patch) and "throughout the packet processing path" (this one). On Fri, Nov 21, 2025 at 05:59:00PM +0100, Laurent Vivier wrote:
For vhost-user multiqueue support, we need to maintain separate packet pools for each queue pair, as packets from different queues should be processed independently.
Previously, we had single global packet pools (pool_tap4 and pool_tap6) for IPv4 and IPv6 packets. This worked fine for single-queue scenarios (passt and pasta), but doesn't support multiple concurrent receive queues.
Convert these to arrays of pools, one per queue pair, indexed by the queue pair number. The queue pair is simply the virtqueue index divided by 2 (since each queue pair consists of one RX and one TX virtqueue).
Add a qpair parameter throughout the packet processing pipeline: - tap4_handler() and tap6_handler(): specify which queue's packets to process - tap_flush_pools(): specify which queue's pools to flush - tap_handler(): specify which queue to handle - tap_add_packet(): specify which queue's pool to add the packet to
For passt and pasta (single queue), all calls use qpair=0. For vhost-user multiqueue, the queue pair is derived from the virtqueue index (index / 2) in vu_handle_tx().
The pool initialization in tap_sock_update_pool() is updated to initialize all queue pairs' pools from the same underlying buffer.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson
--- tap.c | 107 ++++++++++++++++++++++++++++++---------------------- tap.h | 7 ++-- vu_common.c | 6 +-- 3 files changed, 68 insertions(+), 52 deletions(-)
diff --git a/tap.c b/tap.c index a842104687b7..529acecc9851 100644 --- a/tap.c +++ b/tap.c @@ -94,9 +94,13 @@ CHECK_FRAME_LEN(L2_MAX_LEN_VU); DIV_ROUND_UP(sizeof(pkt_buf), \ ETH_HLEN + sizeof(struct ipv6hdr) + sizeof(struct udphdr))
-/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */ -static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS_IP4); -static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS_IP6); +/* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers + * One pool per queue pair for multiqueue support + */ +static PACKET_POOL_DECL(pool_tap4, TAP_MSGS_IP4) pool_tap4_storage[VHOST_USER_MAX_VQS / 2]; +static struct pool *pool_tap4[VHOST_USER_MAX_VQS / 2]; +static PACKET_POOL_DECL(pool_tap6, TAP_MSGS_IP6) pool_tap6_storage[VHOST_USER_MAX_VQS / 2]; +static struct pool *pool_tap6[VHOST_USER_MAX_VQS / 2];
#define TAP_SEQS 128 /* Different L4 tuples in one batch */ #define FRAGMENT_MSG_RATE 10 /* # seconds between fragment warnings */ @@ -703,21 +707,22 @@ static bool tap4_is_fragment(const struct iphdr *iph, /** * tap4_handler() - IPv4 and ARP packet handler for tap file descriptor * @c: Execution context + * @qpair: Queue pair * @now: Current timestamp * * Return: count of packets consumed by handlers */ -static int tap4_handler(struct ctx *c, const struct timespec *now) +static int tap4_handler(struct ctx *c, unsigned int qpair, const struct timespec *now) { unsigned int i, j, seq_count; struct tap4_l4_t *seq;
- if (!c->ifi4 || !pool_tap4->count) - return pool_tap4->count; + if (!c->ifi4 || !pool_tap4[qpair]->count) + return pool_tap4[qpair]->count;
i = 0; resume: - for (seq_count = 0, seq = NULL; i < pool_tap4->count; i++) { + for (seq_count = 0, seq = NULL; i < pool_tap4[qpair]->count; i++) { size_t l3len, hlen, l4len; struct ethhdr eh_storage; struct iphdr iph_storage; @@ -727,7 +732,7 @@ resume: struct iov_tail data; struct iphdr *iph;
- if (!packet_get(pool_tap4, i, &data)) + if (!packet_get(pool_tap4[qpair], i, &data)) continue;
eh = IOV_PEEK_HEADER(&data, eh_storage); @@ -794,8 +799,8 @@ resume: if (iph->protocol == IPPROTO_UDP) { struct iov_tail eh_data;
- packet_get(pool_tap4, i, &eh_data); - if (dhcp(c, 0, &eh_data)) + packet_get(pool_tap4[qpair], i, &eh_data); + if (dhcp(c, qpair, &eh_data)) continue; }
@@ -825,7 +830,7 @@ resume: goto append;
if (seq_count == TAP_SEQS) - break; /* Resume after flushing if i < pool_tap4->count */ + break; /* Resume after flushing if i < pool_tap4[qpair]->count */
Nit: overlong line
for (seq = tap4_l4 + seq_count - 1; seq >= tap4_l4; seq--) { if (L4_MATCH(iph, uh, seq)) { @@ -871,30 +876,31 @@ append: } }
- if (i < pool_tap4->count) + if (i < pool_tap4[qpair]->count) goto resume;
- return pool_tap4->count; + return pool_tap4[qpair]->count; }
/** * tap6_handler() - IPv6 packet handler for tap file descriptor * @c: Execution context + * @qpair: Queue pair * @now: Current timestamp * * Return: count of packets consumed by handlers */ -static int tap6_handler(struct ctx *c, const struct timespec *now) +static int tap6_handler(struct ctx *c, unsigned int qpair, const struct timespec *now) { unsigned int i, j, seq_count = 0; struct tap6_l4_t *seq;
- if (!c->ifi6 || !pool_tap6->count) - return pool_tap6->count; + if (!c->ifi6 || !pool_tap6[qpair]->count) + return pool_tap6[qpair]->count;
i = 0; resume: - for (seq_count = 0, seq = NULL; i < pool_tap6->count; i++) { + for (seq_count = 0, seq = NULL; i < pool_tap6[qpair]->count; i++) { size_t l4len, plen, check; struct in6_addr *saddr, *daddr; struct ipv6hdr ip6h_storage; @@ -906,7 +912,7 @@ resume: struct ipv6hdr *ip6h; uint8_t proto;
- if (!packet_get(pool_tap6, i, &data)) + if (!packet_get(pool_tap6[qpair], i, &data)) return -1;
eh = IOV_REMOVE_HEADER(&data, eh_storage); @@ -1014,7 +1020,7 @@ resume: goto append;
if (seq_count == TAP_SEQS) - break; /* Resume after flushing if i < pool_tap6->count */ + break; /* Resume after flushing if i < pool_tap6[qpair]->count */
for (seq = tap6_l4 + seq_count - 1; seq >= tap6_l4; seq--) { if (L4_MATCH(ip6h, proto, uh, seq)) { @@ -1061,39 +1067,42 @@ append: } }
- if (i < pool_tap6->count) + if (i < pool_tap6[qpair]->count) goto resume;
- return pool_tap6->count; + return pool_tap6[qpair]->count; }
/** - * tap_flush_pools() - Flush both IPv4 and IPv6 packet pools + * tap_flush_pools() - Flush both IPv4 and IPv6 packet pools for a given qpair */ -void tap_flush_pools(void) +void tap_flush_pools(unsigned int qpair) { - pool_flush(pool_tap4); - pool_flush(pool_tap6); + pool_flush(pool_tap4[qpair]); + pool_flush(pool_tap6[qpair]); }
/** * tap_handler() - IPv4/IPv6 and ARP packet handler for tap file descriptor * @c: Execution context + * @qpair: Queue pair * @now: Current timestamp */ -void tap_handler(struct ctx *c, const struct timespec *now) +void tap_handler(struct ctx *c, unsigned int qpair, const struct timespec *now) { - tap4_handler(c, now); - tap6_handler(c, now); + ASSERT(qpair < VHOST_USER_MAX_VQS / 2); + tap4_handler(c, qpair, now); + tap6_handler(c, qpair, now); }
/** * tap_add_packet() - Queue/capture packet, update notion of guest MAC address * @c: Execution context + * @qpair: Queue pair * @data: Packet to add to the pool * @now: Current timestamp */ -void tap_add_packet(struct ctx *c, struct iov_tail *data, +void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data, const struct timespec *now) { struct ethhdr eh_storage; @@ -1114,21 +1123,23 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data, proto_update_l2_buf(c->guest_mac); }
+ ASSERT(qpair < VHOST_USER_MAX_VQS / 2); + switch (ntohs(eh->h_proto)) { case ETH_P_ARP: case ETH_P_IP: - if (!pool_can_fit(pool_tap4, data)) { - tap4_handler(c, now); - pool_flush(pool_tap4); + if (!pool_can_fit(pool_tap4[qpair], data)) { + tap4_handler(c, qpair, now); + pool_flush(pool_tap4[qpair]); } - packet_add(pool_tap4, data); + packet_add(pool_tap4[qpair], data); break; case ETH_P_IPV6: - if (!pool_can_fit(pool_tap6, data)) { - tap6_handler(c, now); - pool_flush(pool_tap6); + if (!pool_can_fit(pool_tap6[qpair], data)) { + tap6_handler(c, qpair, now); + pool_flush(pool_tap6[qpair]); } - packet_add(pool_tap6, data); + packet_add(pool_tap6[qpair], data); break; default: break; @@ -1168,7 +1179,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now) ssize_t n; char *p;
- tap_flush_pools(); + tap_flush_pools(0);
if (partial_len) { /* We have a partial frame from an earlier pass. Move it to the @@ -1212,7 +1223,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now) n -= sizeof(uint32_t);
data = IOV_TAIL_FROM_BUF(p, l2len, 0); - tap_add_packet(c, &data, now); + tap_add_packet(c, 0, &data, now);
p += l2len; n -= l2len; @@ -1221,7 +1232,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now) partial_len = n; partial_frame = p;
- tap_handler(c, now); + tap_handler(c, 0, now); }
/** @@ -1251,7 +1262,7 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now) { ssize_t n, len;
- tap_flush_pools(); + tap_flush_pools(0);
for (n = 0; n <= (ssize_t)(sizeof(pkt_buf) - L2_MAX_LEN_PASTA); @@ -1280,10 +1291,10 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now) continue;
data = IOV_TAIL_FROM_BUF(pkt_buf + n, len, 0); - tap_add_packet(c, &data, now); + tap_add_packet(c, 0, &data, now); }
- tap_handler(c, now); + tap_handler(c, 0, now); }
/** @@ -1487,10 +1498,14 @@ static void tap_sock_tun_init(struct ctx *c) */ static void tap_sock_update_pool(void *base, size_t size) { - int i; + unsigned int i;
- pool_tap4_storage = PACKET_INIT(pool_tap4, TAP_MSGS_IP4, base, size); - pool_tap6_storage = PACKET_INIT(pool_tap6, TAP_MSGS_IP6, base, size); + for (i = 0; i < VHOST_USER_MAX_VQS / 2; i++) { + pool_tap4_storage[i] = PACKET_INIT(pool_tap4, TAP_MSGS_IP4, base, size); + pool_tap4[i] = (struct pool *)&pool_tap4_storage[i]; + pool_tap6_storage[i] = PACKET_INIT(pool_tap6, TAP_MSGS_IP6, base, size); + pool_tap6[i] = (struct pool *)&pool_tap6_storage[i]; + }
for (i = 0; i < TAP_SEQS; i++) { tap4_l4[i].p = PACKET_INIT(pool_l4, UIO_MAXIOV, base, size); diff --git a/tap.h b/tap.h index 92d3e5446991..f10c2a212a51 100644 --- a/tap.h +++ b/tap.h @@ -118,8 +118,9 @@ void tap_handler_passt(struct ctx *c, uint32_t events, int tap_sock_unix_open(char *sock_path); void tap_sock_reset(struct ctx *c); void tap_backend_init(struct ctx *c); -void tap_flush_pools(void); -void tap_handler(struct ctx *c, const struct timespec *now); -void tap_add_packet(struct ctx *c, struct iov_tail *data, +void tap_flush_pools(unsigned int qpair); +void tap_handler(struct ctx *c, unsigned int qpair, + const struct timespec *now); +void tap_add_packet(struct ctx *c, unsigned int qpair, struct iov_tail *data, const struct timespec *now); #endif /* TAP_H */ diff --git a/vu_common.c b/vu_common.c index 040ad067ffbf..63ead85a4674 100644 --- a/vu_common.c +++ b/vu_common.c @@ -170,7 +170,7 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
ASSERT(VHOST_USER_IS_QUEUE_TX(index));
- tap_flush_pools(); + tap_flush_pools(index / 2);
count = 0; out_sg_count = 0; @@ -196,11 +196,11 @@ static void vu_handle_tx(struct vu_dev *vdev, int index,
data = IOV_TAIL(elem[count].out_sg, elem[count].out_num, 0); if (IOV_DROP_HEADER(&data, struct virtio_net_hdr_mrg_rxbuf)) - tap_add_packet(vdev->context, &data, now); + tap_add_packet(vdev->context, index / 2, &data, now);
count++; } - tap_handler(vdev->context, now); + tap_handler(vdev->context, index / 2, now);
if (count) { int i; -- 2.51.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, Nov 21, 2025 at 05:58:58PM +0100, Laurent Vivier wrote:
Add the --max-queues parameter to specify the maximum number of queue
Nit: you updated the option to be --max-qpairs, which I think is good, but now the commit message is out of date. One other query: what makes it "max qpairs" rather than just "qpairs" - are there (now or in planned work) circumstances where you'd end up with less qpairs than specified here?
pairs supported in vhost-user mode. This enables multi-queue support by allowing configuration of up to 16 queue pairs (32 virtqueues).
For the moment, only the first RX queue is used, the TX queue is selected by the guest kernel.
IIUC, with this patch (but not the ones after) things will break if the guest uses a qpair other than 0, right? AFAICT vu_kick_cb() isn't updated so will ignore anything on the other qpairs.
Signed-off-by: Laurent Vivier
--- conf.c | 31 ++++++++++++++++++++++++++++++- passt.h | 2 ++ tap.c | 15 +++++++++++++-- vhost_user.c | 38 +++++++++++++++++++++----------------- virtio.h | 2 +- 5 files changed, 67 insertions(+), 21 deletions(-) diff --git a/conf.c b/conf.c index 66b9e63400ec..99a7995f8038 100644 --- a/conf.c +++ b/conf.c @@ -862,7 +862,9 @@ static void usage(const char *name, FILE *f, int status) " --vhost-user Enable vhost-user mode\n" " UNIX domain socket is provided by -s option\n" " --print-capabilities print back-end capabilities in JSON format,\n" - " only meaningful for vhost-user mode\n"); + " only meaningful for vhost-user mode\n" + " --max-qpairs Specify the maximum number of queue pairs\n" + ); FPRINTF(f, " --repair-path PATH path for passt-repair(1)\n" " default: append '.repair' to UNIX domain path\n"); @@ -1483,6 +1485,7 @@ void conf(struct ctx *c, int argc, char **argv) {"migrate-exit", no_argument, NULL, 29 }, {"migrate-no-linger", no_argument, NULL, 30 }, {"stats", required_argument, NULL, 31 }, + {"max-qpairs", required_argument, NULL, 32 }, { 0 }, }; const char *optstring = "+dqfel:hs:F:I:p:P:m:a:n:M:g:i:o:D:S:H:461t:u:T:U:"; @@ -1514,6 +1517,7 @@ void conf(struct ctx *c, int argc, char **argv) c->tcp.fwd_in.mode = c->tcp.fwd_out.mode = FWD_UNSET; c->udp.fwd_in.mode = c->udp.fwd_out.mode = FWD_UNSET; memcpy(c->our_tap_mac, MAC_OUR_LAA, ETH_ALEN); + c->max_qpairs = 1;
optind = 0; do { @@ -1717,6 +1721,31 @@ void conf(struct ctx *c, int argc, char **argv) die("Can't display statistics if not running in foreground"); c->stats = strtol(optarg, NULL, 0); break; + case 32: { + unsigned long max_qpairs; + char *e; + + if (c->mode != MODE_VU) + die("--max-qpairs is for vhost-user mode only"); + + errno = 0; + max_qpairs = strtoul(optarg, &e, 0); + + if (errno || *e) + die("Invalid max-qpairs: %s", optarg); + + if (max_qpairs < 1) { + die("max-qpairs %lu too small (min 1)", + max_qpairs); + } + + if (max_qpairs * 2 > VHOST_USER_MAX_VQS) { + die("max-qpairs %lu too big (maximum %u)", + max_qpairs, VHOST_USER_MAX_VQS / 2); + } + c->max_qpairs = max_qpairs; + break; + } case 'd': c->debug = 1; c->quiet = 0; diff --git a/passt.h b/passt.h index 15801b44bfa8..c7b6dad69190 100644 --- a/passt.h +++ b/passt.h @@ -205,6 +205,7 @@ struct ip6_ctx { * @low_wmem: Low probed net.core.wmem_max * @low_rmem: Low probed net.core.rmem_max * @vdev: vhost-user device + * @max_qpairs: Maximum number of queue pairs * @device_state_fd: Device state migration channel * @device_state_result: Device state migration result * @migrate_target: Are we the target, on the next migration request? @@ -283,6 +284,7 @@ struct ctx { int low_rmem;
struct vu_dev *vdev; + unsigned int max_qpairs;
/* Migration */ int device_state_fd; diff --git a/tap.c b/tap.c index d7f777fd0b19..d098061ed559 100644 --- a/tap.c +++ b/tap.c @@ -1314,8 +1314,19 @@ static void tap_backend_show_hints(struct ctx *c) break; case MODE_VU: info("You can start qemu with:"); - info(" kvm ... -chardev socket,id=chr0,path=%s -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0\n", - c->sock_path); + if (c->max_qpairs > 1) { + info(" kvm ... -chardev socket,id=chr0,path=%s " + "-netdev vhost-user,id=netdev0,chardev=chr0,queues=%d " + "-device virtio-net,netdev=netdev0,mq=true " + "-object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE " + "-numa node,memdev=memfd0\n", c->sock_path, c->max_qpairs); + } else { + info(" kvm ... -chardev socket,id=chr0,path=%s " + "-netdev vhost-user,id=netdev0,chardev=chr0 " + "-device virtio-net,netdev=netdev0 " + "-object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE " + "-numa node,memdev=memfd0\n", c->sock_path); + } break; } } diff --git a/vhost_user.c b/vhost_user.c index aa7c869d9e56..6d3fa04d2119 100644 --- a/vhost_user.c +++ b/vhost_user.c @@ -323,6 +323,7 @@ static bool vu_get_features_exec(struct vu_dev *vdev, uint64_t features = 1ULL << VIRTIO_F_VERSION_1 | 1ULL << VIRTIO_NET_F_MRG_RXBUF | + 1ULL << VIRTIO_NET_F_MQ | 1ULL << VHOST_F_LOG_ALL | 1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
@@ -342,9 +343,9 @@ static bool vu_get_features_exec(struct vu_dev *vdev, */ static void vu_set_enable_all_rings(struct vu_dev *vdev, bool enable) { - uint16_t i; + unsigned int i;
- for (i = 0; i < VHOST_USER_MAX_VQS; i++) + for (i = 0; i < vdev->context->max_qpairs * 2; i++) vdev->vq[i].enable = enable; }
@@ -476,7 +477,7 @@ static bool vu_set_mem_table_exec(struct vu_dev *vdev, close(vmsg->fds[i]); }
- for (i = 0; i < VHOST_USER_MAX_VQS; i++) { + for (i = 0; i < vdev->context->max_qpairs * 2; i++) { if (vdev->vq[i].vring.desc) { if (map_ring(vdev, &vdev->vq[i])) die("remapping queue %d during setmemtable", i); @@ -759,15 +760,18 @@ static void vu_set_watch(const struct vu_dev *vdev, int idx) /** * vu_check_queue_msg_file() - Check if a message is valid, * close fds if NOFD bit is set + * @vdev: vhost-user device * @vmsg: vhost-user message */ -static void vu_check_queue_msg_file(struct vhost_user_msg *vmsg) +static void vu_check_queue_msg_file(const struct vu_dev *vdev, + struct vhost_user_msg *vmsg) { bool nofd = vmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK; - int idx = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK; + unsigned int idx = vmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
- if (idx >= VHOST_USER_MAX_VQS) - die("Invalid vhost-user queue index: %u", idx); + if (idx >= vdev->context->max_qpairs * 2) + die("Invalid vhost-user queue index: %u (maximum %u)", idx, + vdev->context->max_qpairs * 2);
if (nofd) { vmsg_close_fds(vmsg); @@ -794,7 +798,7 @@ static bool vu_set_vring_kick_exec(struct vu_dev *vdev,
debug("u64: 0x%016"PRIx64, vmsg->payload.u64);
- vu_check_queue_msg_file(vmsg); + vu_check_queue_msg_file(vdev, vmsg);
if (vdev->vq[idx].kick_fd != -1) { epoll_del(vdev->context->epollfd, vdev->vq[idx].kick_fd); @@ -834,7 +838,7 @@ static bool vu_set_vring_call_exec(struct vu_dev *vdev,
debug("u64: 0x%016"PRIx64, vmsg->payload.u64);
- vu_check_queue_msg_file(vmsg); + vu_check_queue_msg_file(vdev, vmsg);
if (vdev->vq[idx].call_fd != -1) { close(vdev->vq[idx].call_fd); @@ -869,7 +873,7 @@ static bool vu_set_vring_err_exec(struct vu_dev *vdev,
debug("u64: 0x%016"PRIx64, vmsg->payload.u64);
- vu_check_queue_msg_file(vmsg); + vu_check_queue_msg_file(vdev, vmsg);
if (vdev->vq[idx].err_fd != -1) { close(vdev->vq[idx].err_fd); @@ -896,7 +900,8 @@ static bool vu_get_protocol_features_exec(struct vu_dev *vdev, uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD | 1ULL << VHOST_USER_PROTOCOL_F_DEVICE_STATE | - 1ULL << VHOST_USER_PROTOCOL_F_RARP; + 1ULL << VHOST_USER_PROTOCOL_F_RARP | + 1ULL << VHOST_USER_PROTOCOL_F_MQ;
(void)vdev; vmsg_set_reply_u64(vmsg, features); @@ -935,10 +940,9 @@ static bool vu_get_queue_num_exec(struct vu_dev *vdev, { (void)vdev;
- /* NOLINTNEXTLINE(misc-redundant-expression) */ - vmsg_set_reply_u64(vmsg, VHOST_USER_MAX_VQS / 2); + vmsg_set_reply_u64(vmsg, vdev->context->max_qpairs);
- debug("VHOST_USER_MAX_VQS %u", VHOST_USER_MAX_VQS / 2); + debug("max_qpairs %u", vdev->context->max_qpairs);
return true; } @@ -959,7 +963,7 @@ static bool vu_set_vring_enable_exec(struct vu_dev *vdev, debug("State.index: %u", idx); debug("State.enable: %u", enable);
- if (idx >= VHOST_USER_MAX_VQS) + if (idx >= vdev->context->max_qpairs * 2) die("Invalid vring_enable index: %u", idx);
vdev->vq[idx].enable = enable; @@ -1047,7 +1051,7 @@ static bool vu_check_device_state_exec(struct vu_dev *vdev, */ void vu_init(struct ctx *c) { - int i; + unsigned int i;
c->vdev = &vdev_storage; c->vdev->context = c; @@ -1074,7 +1078,7 @@ void vu_cleanup(struct vu_dev *vdev) { unsigned int i;
- for (i = 0; i < VHOST_USER_MAX_VQS; i++) { + for (i = 0; i < vdev->context->max_qpairs * 2; i++) { struct vu_virtq *vq = &vdev->vq[i];
vq->started = false; diff --git a/virtio.h b/virtio.h index 12caaa0b6def..176c935cecc7 100644 --- a/virtio.h +++ b/virtio.h @@ -88,7 +88,7 @@ struct vu_dev_region { uint64_t mmap_addr; };
-#define VHOST_USER_MAX_VQS 2 +#define VHOST_USER_MAX_VQS 32
/* * Set a reasonable maximum number of ram slots, which will be supported by -- 2.51.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, Nov 21, 2025 at 05:58:59PM +0100, Laurent Vivier wrote:
Add a queue pair parameter to vu_send_single() and propagate this parameter through the entire network stack call chain. The queue pair parameter specifies which queue pair to use for sending packets in vhost-user mode.
"sending" in this case meaning passt -> guest, right? Which is the Rx queue in vu terminology.
Functions modified to accept and propagate the queue pair parameter: - Core sending: tap_send_single(), vu_send_single() - UDP/ICMP helpers: tap_udp4_send(), tap_udp6_send(), tap_icmp4_send(), tap_icmp6_send() - Protocol handlers: arp(), dhcp(), dhcpv6(), ndp(), tcp_rst_no_conn() - ARP helpers: arp_announce() - NDP helpers: ndp_send(), ndp_na(), ndp_ra(), ndp_unsolicited_na() - UDP error handling: udp_send_tap_icmp4(), udp_send_tap_icmp6()
All callers currently pass queue pair #0 to preserve existing behavior. This is a preparatory step for enabling multi-queue and per-queue worker threads in vhost-user mode.
No functional change.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson
--- arp.c | 12 +++++++----- arp.h | 4 ++-- dhcp.c | 5 +++-- dhcp.h | 2 +- dhcpv6.c | 12 +++++++----- dhcpv6.h | 2 +- fwd.c | 4 ++-- icmp.c | 6 ++++-- ndp.c | 32 +++++++++++++++++++------------- ndp.h | 5 +++-- tap.c | 37 ++++++++++++++++++++++--------------- tap.h | 13 +++++++------ tcp.c | 8 +++++--- udp.c | 14 ++++++++------ vu_common.c | 9 ++++++--- vu_common.h | 3 ++- 16 files changed, 99 insertions(+), 69 deletions(-)
diff --git a/arp.c b/arp.c index 33b03cf6c316..c43d33ced2c3 100644 --- a/arp.c +++ b/arp.c @@ -63,11 +63,12 @@ static bool ignore_arp(const struct ctx *c, /** * arp() - Check if this is a supported ARP message, reply as needed * @c: Execution context + * @qpair: Queue pair on which to send the reply * @data: Single packet with Ethernet buffer * * Return: 1 if handled, -1 on failure */ -int arp(const struct ctx *c, struct iov_tail *data) +int arp(const struct ctx *c, int qpair, struct iov_tail *data) { union inany_addr tgt; struct { @@ -112,7 +113,7 @@ int arp(const struct ctx *c, struct iov_tail *data) memcpy(resp.am.tha, am->sha, sizeof(resp.am.tha)); memcpy(resp.am.tip, am->sip, sizeof(resp.am.tip));
- tap_send_single(c, &resp, sizeof(resp)); + tap_send_single(c, qpair, &resp, sizeof(resp));
return 1; } @@ -148,16 +149,17 @@ void arp_send_init_req(const struct ctx *c) memcpy(req.am.tip, &c->ip4.addr, sizeof(req.am.tip));
debug("Sending initial ARP request for guest MAC address"); - tap_send_single(c, &req, sizeof(req)); + tap_send_single(c, 0, &req, sizeof(req));
Should this one also go into the caller? I don't think it really matters, it's only a question of what seems logically more sensible.
}
/** * arp_announce() - Send an ARP announcement for an IPv4 host * @c: Execution context + * @qpair: Queue pair on which to send the announcement * @ip: IPv4 address we announce as owned by @mac * @mac: MAC address to advertise for @ip */ -void arp_announce(const struct ctx *c, struct in_addr *ip, +void arp_announce(const struct ctx *c, int qpair, struct in_addr *ip, const unsigned char *mac) { char ip_str[INET_ADDRSTRLEN]; @@ -199,5 +201,5 @@ void arp_announce(const struct ctx *c, struct in_addr *ip, eth_ntop(mac, mac_str, sizeof(mac_str)); debug("ARP announcement for %s / %s", ip_str, mac_str);
- tap_send_single(c, &msg, sizeof(msg)); + tap_send_single(c, qpair, &msg, sizeof(msg)); } diff --git a/arp.h b/arp.h index 4862e90a14ee..7dd872809340 100644 --- a/arp.h +++ b/arp.h @@ -20,9 +20,9 @@ struct arpmsg { unsigned char tip[4]; } __attribute__((__packed__));
-int arp(const struct ctx *c, struct iov_tail *data); +int arp(const struct ctx *c, int qpair, struct iov_tail *data); void arp_send_init_req(const struct ctx *c); -void arp_announce(const struct ctx *c, struct in_addr *ip, +void arp_announce(const struct ctx *c, int qpair, struct in_addr *ip, const unsigned char *mac);
#endif /* ARP_H */ diff --git a/dhcp.c b/dhcp.c index 6b9c2e3b9e5a..dd3c67f52724 100644 --- a/dhcp.c +++ b/dhcp.c @@ -296,11 +296,12 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len) /** * dhcp() - Check if this is a DHCP message, reply as needed * @c: Execution context + * @qpair: Queue pair on which to send the reply * @data: Single packet with Ethernet buffer * * Return: 0 if it's not a DHCP message, 1 if handled, -1 on failure */ -int dhcp(const struct ctx *c, struct iov_tail *data) +int dhcp(const struct ctx *c, int qpair, struct iov_tail *data) { char macstr[ETH_ADDRSTRLEN]; size_t mlen, dlen, opt_len; @@ -471,7 +472,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) else dst = c->ip4.addr;
- tap_udp4_send(c, c->ip4.our_tap_addr, 67, dst, 68, &reply, dlen); + tap_udp4_send(c, qpair, c->ip4.our_tap_addr, 67, dst, 68, &reply, dlen);
return 1; } diff --git a/dhcp.h b/dhcp.h index cd50c99b8856..a96991633e58 100644 --- a/dhcp.h +++ b/dhcp.h @@ -6,7 +6,7 @@ #ifndef DHCP_H #define DHCP_H
-int dhcp(const struct ctx *c, struct iov_tail *data); +int dhcp(const struct ctx *c, int qpair, struct iov_tail *data); void dhcp_init(void);
#endif /* DHCP_H */ diff --git a/dhcpv6.c b/dhcpv6.c index e4df0db562e6..b1b792612615 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -369,12 +369,13 @@ notonlink: /** * dhcpv6_send_ia_notonlink() - Send NotOnLink status * @c: Execution context + * @qpair: Queue pair on which to send the reply * @ia_base: Non-appropriate IA_NA or IA_TA base * @client_id_base: Client ID message option base * @len: Client ID length * @xid: Transaction ID for message exchange */ -static void dhcpv6_send_ia_notonlink(struct ctx *c, +static void dhcpv6_send_ia_notonlink(struct ctx *c, int qpair, const struct iov_tail *ia_base, const struct iov_tail *client_id_base, int len, uint32_t xid) @@ -404,7 +405,7 @@ static void dhcpv6_send_ia_notonlink(struct ctx *c,
resp_not_on_link.hdr.xid = xid;
- tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, + tap_udp6_send(c, qpair, src, 547, tap_ip6_daddr(c, src), 546, xid, &resp_not_on_link, n); }
@@ -539,13 +540,14 @@ static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data, /** * dhcpv6() - Check if this is a DHCPv6 message, reply as needed * @c: Execution context + * @qpair: Queue pair on which to send the reply * @data: Single packet starting from UDP header * @saddr: Source IPv6 address of original message * @daddr: Destination IPv6 address of original message * * Return: 0 if it's not a DHCPv6 message, 1 if handled, -1 on failure */ -int dhcpv6(struct ctx *c, struct iov_tail *data, +int dhcpv6(struct ctx *c, int qpair, struct iov_tail *data, const struct in6_addr *saddr, const struct in6_addr *daddr) { const struct opt_server_id *server_id = NULL; @@ -627,7 +629,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,
if (dhcpv6_ia_notonlink(data, &c->ip6.addr)) {
- dhcpv6_send_ia_notonlink(c, data, &client_id_base, + dhcpv6_send_ia_notonlink(c, qpair, data, &client_id_base, ntohs(client_id->l), mh->xid);
return 1; @@ -677,7 +679,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,
resp.hdr.xid = mh->xid;
- tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, + tap_udp6_send(c, qpair, src, 547, tap_ip6_daddr(c, src), 546, mh->xid, &resp, n); c->ip6.addr_seen = c->ip6.addr;
diff --git a/dhcpv6.h b/dhcpv6.h index c706dfdbb2ac..420caf8b7169 100644 --- a/dhcpv6.h +++ b/dhcpv6.h @@ -6,7 +6,7 @@ #ifndef DHCPV6_H #define DHCPV6_H
-int dhcpv6(struct ctx *c, struct iov_tail *data, +int dhcpv6(struct ctx *c, int qpair, struct iov_tail *data, struct in6_addr *saddr, struct in6_addr *daddr); void dhcpv6_init(const struct ctx *c);
diff --git a/fwd.c b/fwd.c index 68bb11663c46..60c6ec3b6af9 100644 --- a/fwd.c +++ b/fwd.c @@ -147,9 +147,9 @@ void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, return;
if (inany_v4(addr)) - arp_announce(c, inany_v4(addr), e->mac); + arp_announce(c, 0, inany_v4(addr), e->mac); else - ndp_unsolicited_na(c, &addr->a6); + ndp_unsolicited_na(c, 0, &addr->a6); }
/** diff --git a/icmp.c b/icmp.c index 35faefb91870..a9f0518c2f61 100644 --- a/icmp.c +++ b/icmp.c @@ -132,12 +132,14 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) const struct in_addr *daddr = inany_v4(&ini->eaddr);
ASSERT(saddr && daddr); /* Must have IPv4 addresses */ - tap_icmp4_send(c, *saddr, *daddr, buf, pingf->f.tap_omac, n); + tap_icmp4_send(c, 0, *saddr, *daddr, buf, + pingf->f.tap_omac, n); } else if (pingf->f.type == FLOW_PING6) { const struct in6_addr *saddr = &ini->oaddr.a6; const struct in6_addr *daddr = &ini->eaddr.a6;
- tap_icmp6_send(c, saddr, daddr, buf, pingf->f.tap_omac, n); + tap_icmp6_send(c, 0, saddr, daddr, buf, + pingf->f.tap_omac, n); } return;
diff --git a/ndp.c b/ndp.c index a33239d4aa81..0963a6392655 100644 --- a/ndp.c +++ b/ndp.c @@ -175,25 +175,27 @@ struct ndp_ns { /** * ndp_send() - Send an NDP message * @c: Execution context + * @qpair: Queue pair on which to send the message * @dst: IPv6 address to send the message to * @buf: ICMPv6 header + message payload * @l4len: Length of message, including ICMPv6 header */ -static void ndp_send(const struct ctx *c, const struct in6_addr *dst, +static void ndp_send(const struct ctx *c, int qpair, const struct in6_addr *dst, const void *buf, size_t l4len) { const struct in6_addr *src = &c->ip6.our_tap_ll;
- tap_icmp6_send(c, src, dst, buf, c->our_tap_mac, l4len); + tap_icmp6_send(c, qpair, src, dst, buf, c->our_tap_mac, l4len); }
/** * ndp_na() - Send an NDP Neighbour Advertisement (NA) message * @c: Execution context + * @qpair: Queue pair on which to send the NA * @dst: IPv6 address to send the NA to * @addr: IPv6 address to advertise */ -static void ndp_na(const struct ctx *c, const struct in6_addr *dst, +static void ndp_na(const struct ctx *c, int qpair, const struct in6_addr *dst, const struct in6_addr *addr) { union inany_addr tgt; @@ -217,25 +219,28 @@ static void ndp_na(const struct ctx *c, const struct in6_addr *dst, inany_from_af(&tgt, AF_INET6, addr); fwd_neigh_mac_get(c, &tgt, na.target_l2_addr.mac);
- ndp_send(c, dst, &na, sizeof(na)); + ndp_send(c, qpair, dst, &na, sizeof(na)); }
/** * ndp_unsolicited_na() - Send unsolicited NA * @c: Execution context + * @qpair: Queue pair on which to send the RA * @addr: IPv6 address to advertise */ -void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr) +void ndp_unsolicited_na(const struct ctx *c, int qpair, + const struct in6_addr *addr) { - ndp_na(c, &in6addr_ll_all_nodes, addr); + ndp_na(c, qpair, &in6addr_ll_all_nodes, addr); }
/** * ndp_ra() - Send an NDP Router Advertisement (RA) message * @c: Execution context + * @qpair: Queue pair on which to send the RA * @dst: IPv6 address to send the RA to */ -static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) +static void ndp_ra(const struct ctx *c, int qpair, const struct in6_addr *dst) { struct ndp_ra ra = { .ih = { @@ -341,18 +346,19 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) memcpy(&ra.source_ll.mac, c->our_tap_mac, ETH_ALEN);
/* NOLINTNEXTLINE(clang-analyzer-security.PointerSub) */ - ndp_send(c, dst, &ra, ptr - (unsigned char *)&ra); + ndp_send(c, qpair, dst, &ra, ptr - (unsigned char *)&ra); }
/** * ndp() - Check for NDP solicitations, reply as needed * @c: Execution context + * @qpair: Queue pair on which to send replies * @saddr: Source IPv6 address * @data: Single packet with ICMPv6 header * * Return: 0 if not handled here, 1 if handled, -1 on failure */ -int ndp(const struct ctx *c, const struct in6_addr *saddr, +int ndp(const struct ctx *c, int qpair, const struct in6_addr *saddr, struct iov_tail *data) { struct icmp6hdr ih_storage; @@ -381,13 +387,13 @@ int ndp(const struct ctx *c, const struct in6_addr *saddr,
info("NDP: received NS, sending NA");
- ndp_na(c, saddr, &ns->target_addr); + ndp_na(c, qpair, saddr, &ns->target_addr); } else if (ih->icmp6_type == RS) { if (c->no_ra) return 1;
info("NDP: received RS, sending RA"); - ndp_ra(c, saddr); + ndp_ra(c, qpair, saddr); }
return 1; @@ -445,7 +451,7 @@ void ndp_timer(const struct ctx *c, const struct timespec *now)
info("NDP: sending unsolicited RA, next in %llds", (long long)interval);
- ndp_ra(c, &in6addr_ll_all_nodes); + ndp_ra(c, 0, &in6addr_ll_all_nodes);
first: next_ra = now->tv_sec + interval; @@ -468,5 +474,5 @@ void ndp_send_init_req(const struct ctx *c) .target_addr = c->ip6.addr }; debug("Sending initial NDP NS request for guest MAC address"); - ndp_send(c, &c->ip6.addr, &ns, sizeof(ns)); + ndp_send(c, 0, &c->ip6.addr, &ns, sizeof(ns));
As for arp_send_init_req(), would the 0 here make more sense in the caller?
} diff --git a/ndp.h b/ndp.h index 56b756d8400b..927e69eb4649 100644 --- a/ndp.h +++ b/ndp.h @@ -8,10 +8,11 @@
struct icmp6hdr;
-int ndp(const struct ctx *c, const struct in6_addr *saddr, +int ndp(const struct ctx *c, int qpair, const struct in6_addr *saddr, struct iov_tail *data); void ndp_timer(const struct ctx *c, const struct timespec *now); void ndp_send_init_req(const struct ctx *c); -void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr); +void ndp_unsolicited_na(const struct ctx *c, int qpair, + const struct in6_addr *addr);
#endif /* NDP_H */ diff --git a/tap.c b/tap.c index d098061ed559..a842104687b7 100644 --- a/tap.c +++ b/tap.c @@ -125,10 +125,12 @@ unsigned long tap_l2_max_len(const struct ctx *c) /** * tap_send_single() - Send a single frame * @c: Execution context + * @qpair: Queue pair on which to send the frame * @data: Packet buffer * @l2len: Total L2 packet length */ -void tap_send_single(const struct ctx *c, const void *data, size_t l2len) +void tap_send_single(const struct ctx *c, int qpair, const void *data, + size_t l2len) { uint32_t vnet_len = htonl(l2len); struct iovec iov[2]; @@ -147,7 +149,7 @@ void tap_send_single(const struct ctx *c, const void *data, size_t l2len) tap_send_frames(c, iov, iovcnt, 1); break; case MODE_VU: - vu_send_single(c, data, l2len); + vu_send_single(c, qpair, data, l2len); break; } } @@ -250,6 +252,7 @@ void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport, /** * tap_udp4_send() - Send UDP over IPv4 packet * @c: Execution context + * @qpair: Queue pair on which to send packet * @src: IPv4 source address * @sport: UDP source port * @dst: IPv4 destination address @@ -257,7 +260,7 @@ void *tap_push_uh4(struct udphdr *uh, struct in_addr src, in_port_t sport, * @in: UDP payload contents (not including UDP header) * @dlen: UDP payload length (not including UDP header) */ -void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, +void tap_udp4_send(const struct ctx *c, int qpair, struct in_addr src, in_port_t sport, struct in_addr dst, in_port_t dport, const void *in, size_t dlen) { @@ -268,20 +271,22 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, char *data = tap_push_uh4(uh, src, sport, dst, dport, in, dlen);
memcpy(data, in, dlen); - tap_send_single(c, buf, dlen + (data - buf)); + tap_send_single(c, qpair, buf, dlen + (data - buf)); }
/** * tap_icmp4_send() - Send ICMPv4 packet * @c: Execution context + * @qpair: Queue pair on which to send packet * @src: IPv4 source address * @dst: IPv4 destination address * @in: ICMP packet, including ICMP header * @src_mac: MAC address to be used as source for message * @l4len: ICMP packet length, including ICMP header */ -void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, - const void *in, const void *src_mac, size_t l4len) +void tap_icmp4_send(const struct ctx *c, int qpair, struct in_addr src, + struct in_addr dst, const void *in, const void *src_mac, + size_t l4len) { char buf[USHRT_MAX]; struct iphdr *ip4h = tap_push_l2h(c, buf, src_mac, ETH_P_IP); @@ -291,7 +296,7 @@ void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, memcpy(icmp4h, in, l4len); csum_icmp4(icmp4h, icmp4h + 1, l4len - sizeof(*icmp4h));
- tap_send_single(c, buf, l4len + ((char *)icmp4h - buf)); + tap_send_single(c, qpair, buf, l4len + ((char *)icmp4h - buf)); }
/** @@ -355,6 +360,7 @@ void *tap_push_uh6(struct udphdr *uh, /** * tap_udp6_send() - Send UDP over IPv6 packet * @c: Execution context + * @qpair: Queue pair on which to send packet * @src: IPv6 source address * @sport: UDP source port * @dst: IPv6 destination address @@ -363,7 +369,7 @@ void *tap_push_uh6(struct udphdr *uh, * @in: UDP payload contents (not including UDP header) * @dlen: UDP payload length (not including UDP header) */ -void tap_udp6_send(const struct ctx *c, +void tap_udp6_send(const struct ctx *c, int qpair, const struct in6_addr *src, in_port_t sport, const struct in6_addr *dst, in_port_t dport, uint32_t flow, void *in, size_t dlen) @@ -376,19 +382,20 @@ void tap_udp6_send(const struct ctx *c, char *data = tap_push_uh6(uh, src, sport, dst, dport, in, dlen);
memcpy(data, in, dlen); - tap_send_single(c, buf, dlen + (data - buf)); + tap_send_single(c, qpair, buf, dlen + (data - buf)); }
/** * tap_icmp6_send() - Send ICMPv6 packet * @c: Execution context + * @qpair: Queue pair on which to send packet * @src: IPv6 source address * @dst: IPv6 destination address * @in: ICMP packet, including ICMP header * @src_mac: MAC address to be used as source for message * @l4len: ICMP packet length, including ICMP header */ -void tap_icmp6_send(const struct ctx *c, +void tap_icmp6_send(const struct ctx *c, int qpair, const struct in6_addr *src, const struct in6_addr *dst, const void *in, const void *src_mac, size_t l4len) { @@ -400,7 +407,7 @@ void tap_icmp6_send(const struct ctx *c, memcpy(icmp6h, in, l4len); csum_icmp6(icmp6h, src, dst, icmp6h + 1, l4len - sizeof(*icmp6h));
- tap_send_single(c, buf, l4len + ((char *)icmp6h - buf)); + tap_send_single(c, qpair, buf, l4len + ((char *)icmp6h - buf)); }
/** @@ -727,7 +734,7 @@ resume: if (!eh) continue; if (ntohs(eh->h_proto) == ETH_P_ARP) { - arp(c, &data); + arp(c, 0, &data); continue; }
@@ -788,7 +795,7 @@ resume: struct iov_tail eh_data;
packet_get(pool_tap4, i, &eh_data); - if (dhcp(c, &eh_data)) + if (dhcp(c, 0, &eh_data)) continue; }
@@ -954,7 +961,7 @@ resume: continue;
ndp_data = data; - if (ndp(c, saddr, &ndp_data)) + if (ndp(c, 0, saddr, &ndp_data)) continue;
tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1); @@ -973,7 +980,7 @@ resume: if (proto == IPPROTO_UDP) { struct iov_tail uh_data = data;
- if (dhcpv6(c, &uh_data, saddr, daddr)) + if (dhcpv6(c, 0, &uh_data, saddr, daddr)) continue; }
diff --git a/tap.h b/tap.h index 1864173cc9b0..92d3e5446991 100644 --- a/tap.h +++ b/tap.h @@ -87,24 +87,25 @@ void *tap_push_ip6h(struct ipv6hdr *ip6h, const struct in6_addr *src, const struct in6_addr *dst, size_t l4len, uint8_t proto, uint32_t flow); -void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, +void tap_udp4_send(const struct ctx *c, int qpair, struct in_addr src, in_port_t sport, struct in_addr dst, in_port_t dport, const void *in, size_t dlen); -void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, - const void *in, const void *src_mac, size_t l4len); +void tap_icmp4_send(const struct ctx *c, int qpair, struct in_addr src, + struct in_addr dst, const void *in, const void *src_mac, + size_t l4len); const struct in6_addr *tap_ip6_daddr(const struct ctx *c, const struct in6_addr *src); void *tap_push_ip6h(struct ipv6hdr *ip6h, const struct in6_addr *src, const struct in6_addr *dst, size_t l4len, uint8_t proto, uint32_t flow); -void tap_udp6_send(const struct ctx *c, +void tap_udp6_send(const struct ctx *c, int qpair, const struct in6_addr *src, in_port_t sport, const struct in6_addr *dst, in_port_t dport, uint32_t flow, void *in, size_t dlen); -void tap_icmp6_send(const struct ctx *c, +void tap_icmp6_send(const struct ctx *c, int qpair, const struct in6_addr *src, const struct in6_addr *dst, const void *in, const void *src_mac, size_t l4len); -void tap_send_single(const struct ctx *c, const void *data, size_t l2len); +void tap_send_single(const struct ctx *c, int qpair, const void *data, size_t l2len); size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t bufs_per_frame, size_t nframes); void eth_update_mac(struct ethhdr *eh, diff --git a/tcp.c b/tcp.c index 3202d3385a63..76f3273bb93f 100644 --- a/tcp.c +++ b/tcp.c @@ -1985,6 +1985,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c, /** * tcp_rst_no_conn() - Send RST in response to a packet with no connection * @c: Execution context + * @qpair: Queue pair on which to send the reply * @af: Address family, AF_INET or AF_INET6 * @saddr: Source address of the packet we're responding to * @daddr: Destination address of the packet we're responding to @@ -1992,7 +1993,7 @@ static void tcp_conn_from_sock_finish(const struct ctx *c, * @th: TCP header of the packet we're responding to * @l4len: Packet length, including TCP header */ -static void tcp_rst_no_conn(const struct ctx *c, int af, +static void tcp_rst_no_conn(const struct ctx *c, int qpair, int af, const void *saddr, const void *daddr, uint32_t flow_lbl, const struct tcphdr *th, size_t l4len) @@ -2050,7 +2051,7 @@ static void tcp_rst_no_conn(const struct ctx *c, int af,
tcp_update_csum(psum, rsth, &payload); rst_l2len = ((char *)rsth - buf) + sizeof(*rsth); - tap_send_single(c, buf, rst_l2len); + tap_send_single(c, qpair, buf, rst_l2len); }
/** @@ -2109,7 +2110,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, tcp_conn_from_tap(c, af, saddr, daddr, th, opts, optlen, now); else - tcp_rst_no_conn(c, af, saddr, daddr, flow_lbl, th, l4len); + tcp_rst_no_conn(c, 0, af, saddr, daddr, flow_lbl, th, + l4len); return 1; }
diff --git a/udp.c b/udp.c index 9c00950250a0..2c74b42a3d95 100644 --- a/udp.c +++ b/udp.c @@ -384,13 +384,14 @@ static void udp_tap_prepare(const struct mmsghdr *mmh, /** * udp_send_tap_icmp4() - Construct and send ICMPv4 to local peer * @c: Execution context + * @qpair: Queue pair on which to send the ICMPv4 packet * @ee: Extended error descriptor * @toside: Destination side of flow * @saddr: Address of ICMP generating node * @in: First bytes (max 8) of original UDP message body * @dlen: Length of the read part of original UDP message body */ -static void udp_send_tap_icmp4(const struct ctx *c, +static void udp_send_tap_icmp4(const struct ctx *c, int qpair, const struct sock_extended_err *ee, const struct flowside *toside, struct in_addr saddr, @@ -426,13 +427,14 @@ static void udp_send_tap_icmp4(const struct ctx *c, /* Try to obtain the MAC address of the generating node */ saddr_any = inany_from_v4(saddr); fwd_neigh_mac_get(c, &saddr_any, tap_omac); - tap_icmp4_send(c, saddr, eaddr, &msg, tap_omac, msglen); + tap_icmp4_send(c, qpair, saddr, eaddr, &msg, tap_omac, msglen); }
/** * udp_send_tap_icmp6() - Construct and send ICMPv6 to local peer * @c: Execution context + * @qpair: Queue pair on which to send the ICMPv6 packet * @ee: Extended error descriptor * @toside: Destination side of flow * @saddr: Address of ICMP generating node @@ -440,7 +442,7 @@ static void udp_send_tap_icmp4(const struct ctx *c, * @dlen: Length of the read part of original UDP message body * @flow: IPv6 flow identifier */ -static void udp_send_tap_icmp6(const struct ctx *c, +static void udp_send_tap_icmp6(const struct ctx *c, int qpair, const struct sock_extended_err *ee, const struct flowside *toside, const struct in6_addr *saddr, @@ -474,7 +476,7 @@ static void udp_send_tap_icmp6(const struct ctx *c,
/* Try to obtain the MAC address of the generating node */ fwd_neigh_mac_get(c, (union inany_addr *) saddr, tap_omac); - tap_icmp6_send(c, saddr, eaddr, &msg, tap_omac, msglen); + tap_icmp6_send(c, qpair, saddr, eaddr, &msg, tap_omac, msglen); }
/** @@ -634,12 +636,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, if (hdr->cmsg_level == IPPROTO_IP && (o4 = inany_v4(&otap)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen); + udp_send_tap_icmp4(c, 0, ee, toside, *o4, data, dlen); return 1; }
if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { - udp_send_tap_icmp6(c, ee, toside, &otap.a6, data, dlen, + udp_send_tap_icmp6(c, 0, ee, toside, &otap.a6, data, dlen, FLOW_IDX(uflow)); return 1; } diff --git a/vu_common.c b/vu_common.c index b13b7c308fd8..040ad067ffbf 100644 --- a/vu_common.c +++ b/vu_common.c @@ -235,23 +235,26 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref, }
/** - * vu_send_single() - Send a buffer to the front-end using the RX virtqueue + * vu_send_single() - Send a buffer to the front-end using a specified virtqueue * @c: execution context + * @qpair: Queue pair on which to send the buffer * @buf: address of the buffer * @size: size of the buffer * * Return: number of bytes sent, -1 if there is an error */ -int vu_send_single(const struct ctx *c, const void *buf, size_t size) +int vu_send_single(const struct ctx *c, int qpair, const void *buf, size_t size) { struct vu_dev *vdev = c->vdev; - struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE]; struct iovec in_sg[VIRTQUEUE_MAX_SIZE]; + struct vu_virtq *vq; size_t total; int elem_cnt; int i;
+ vq = &vdev->vq[qpair << 1]; + trace("vu_send_single size %zu", size);
if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) { diff --git a/vu_common.h b/vu_common.h index f538f237790b..25b824c51d1d 100644 --- a/vu_common.h +++ b/vu_common.h @@ -56,6 +56,7 @@ void vu_flush(const struct vu_dev *vdev, struct vu_virtq *vq, struct vu_virtq_element *elem, int elem_cnt); void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref, const struct timespec *now); -int vu_send_single(const struct ctx *c, const void *buf, size_t size); +int vu_send_single(const struct ctx *c, int qpair, const void *buf, + size_t size);
#endif /* VU_COMMON_H */ -- 2.51.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, Nov 21, 2025 at 05:59:02PM +0100, Laurent Vivier wrote:
With the recent addition of multiqueue support to passt's vhost-user implementation, we need test coverage to validate the functionality. The test infrastructure previously only tested single queue configurations.
Add a VHOST_USER_MQ environment variable to control the number of queue pairs. When set to values greater than 1, the setup scripts pass --max-qpairs to passt and configure QEMU's vhost-user netdev with the corresponding queues= parameter.
The test suite now runs an additional set of tests with 8 queue pairs to exercise the multiqueue paths across all protocols (TCP, UDP, ICMP) and services (DHCP, NDP). Note that the guest kernel will only enable as many queues as there are vCPUs.
Signed-off-by: Laurent Vivier
--- test/lib/setup | 58 +++++++++++++++++++++++++++++++++++++++----------- test/run | 23 ++++++++++++++++++++ 2 files changed, 69 insertions(+), 12 deletions(-) diff --git a/test/lib/setup b/test/lib/setup index 5994598744a3..2af34d670473 100755 --- a/test/lib/setup +++ b/test/lib/setup @@ -18,6 +18,8 @@ VCPUS="$( [ $(nproc) -ge 8 ] && echo 6 || echo $(( $(nproc) / 2 + 1 )) )" MEM_KIB="$(sed -n 's/MemTotal:[ ]*\([0-9]*\) kB/\1/p' /proc/meminfo)" QEMU_ARCH="$(uname -m)" [ "${QEMU_ARCH}" = "i686" ] && QEMU_ARCH=i386 +VHOST_USER=0 +VHOST_USER_MQ=1
# setup_build() - Set up pane layout for build tests setup_build() { @@ -45,7 +47,8 @@ setup_passt() { [ ${PCAP} -eq 1 ] && __opts="${__opts} -p ${LOGDIR}/passt.pcap" [ ${DEBUG} -eq 1 ] && __opts="${__opts} -d" [ ${TRACE} -eq 1 ] && __opts="${__opts} --trace" - [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" + [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" && \ + [ ${VHOST_USER_MQ} -gt 1 ] && __opts="${__opts} --max-qpairs ${VHOST_USER_MQ}"
context_run passt "make clean" context_run passt "make valgrind" @@ -59,10 +62,18 @@ setup_passt() { __vmem="$(((${__vmem} + 500) / 1000))G" __qemu_netdev=" \ -chardev socket,id=c,path=${STATESETUP}/passt.socket \ - -netdev vhost-user,id=v,chardev=c \ - -device virtio-net,netdev=v \ -object memory-backend-memfd,id=m,share=on,size=${__vmem} \ -numa node,memdev=m" + + if [ ${VHOST_USER_MQ} -gt 1 ]; then + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v,mq=true \ + -netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ}" + else + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v \ + -netdev vhost-user,id=v,chardev=c"
Is there a diffence for qemu between omitting queues= and using queues=1? If not we can simplify this. For the passt option it's worth explicitly not-setting it for the single-queue case, so that we're exercising the command line option as well. But exercising qemu's options is not our concern, so we can use queues=1 if it means the same thing as omitting it entirely. Otherwise LGTM.
+ fi else __qemu_netdev="-device virtio-net-pci,netdev=s \ -netdev stream,id=s,server=off,addr.type=unix,addr.path=${STATESETUP}/passt.socket" @@ -155,7 +166,8 @@ setup_passt_in_ns() { [ ${PCAP} -eq 1 ] && __opts="${__opts} -p ${LOGDIR}/passt_in_pasta.pcap" [ ${DEBUG} -eq 1 ] && __opts="${__opts} -d" [ ${TRACE} -eq 1 ] && __opts="${__opts} --trace" - [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" + [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" && \ + [ ${VHOST_USER_MQ} -gt 1 ] && __opts="${__opts} --max-qpairs ${VHOST_USER_MQ}"
if [ ${VALGRIND} -eq 1 ]; then context_run passt "make clean" @@ -173,10 +185,18 @@ setup_passt_in_ns() { __vmem="$(((${__vmem} + 500) / 1000))G" __qemu_netdev=" \ -chardev socket,id=c,path=${STATESETUP}/passt.socket \ - -netdev vhost-user,id=v,chardev=c \ - -device virtio-net,netdev=v \ -object memory-backend-memfd,id=m,share=on,size=${__vmem} \ -numa node,memdev=m" + + if [ ${VHOST_USER_MQ} -gt 1 ]; then + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v,mq=true \ + -netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ}" + else + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v \ + -netdev vhost-user,id=v,chardev=c" + fi else __qemu_netdev="-device virtio-net-pci,netdev=s \ -netdev stream,id=s,server=off,addr.type=unix,addr.path=${STATESETUP}/passt.socket" @@ -241,7 +261,8 @@ setup_two_guests() { [ ${PCAP} -eq 1 ] && __opts="${__opts} -p ${LOGDIR}/passt_1.pcap" [ ${DEBUG} -eq 1 ] && __opts="${__opts} -d" [ ${TRACE} -eq 1 ] && __opts="${__opts} --trace" - [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" + [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" && \ + [ ${VHOST_USER_MQ} -gt 1 ] && __opts="${__opts} --max-qpairs ${VHOST_USER_MQ}"
context_run_bg passt_1 "./passt -s ${STATESETUP}/passt_1.socket -P ${STATESETUP}/passt_1.pid -f ${__opts} --fqdn fqdn1.passt.test -H hostname1 -t 10001 -u 10001" wait_for [ -f "${STATESETUP}/passt_1.pid" ] @@ -250,7 +271,8 @@ setup_two_guests() { [ ${PCAP} -eq 1 ] && __opts="${__opts} -p ${LOGDIR}/passt_2.pcap" [ ${DEBUG} -eq 1 ] && __opts="${__opts} -d" [ ${TRACE} -eq 1 ] && __opts="${__opts} --trace" - [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" + [ ${VHOST_USER} -eq 1 ] && __opts="${__opts} --vhost-user" && \ + [ ${VHOST_USER_MQ} -gt 1 ] && __opts="${__opts} --max-qpairs ${VHOST_USER_MQ}"
context_run_bg passt_2 "./passt -s ${STATESETUP}/passt_2.socket -P ${STATESETUP}/passt_2.pid -f ${__opts} --hostname hostname2 --fqdn fqdn2 -t 10004 -u 10004" wait_for [ -f "${STATESETUP}/passt_2.pid" ] @@ -260,16 +282,28 @@ setup_two_guests() { __vmem="$(((${__vmem} + 500) / 1000))G" __qemu_netdev1=" \ -chardev socket,id=c,path=${STATESETUP}/passt_1.socket \ - -netdev vhost-user,id=v,chardev=c \ - -device virtio-net,netdev=v \ -object memory-backend-memfd,id=m,share=on,size=${__vmem} \ -numa node,memdev=m" __qemu_netdev2=" \ -chardev socket,id=c,path=${STATESETUP}/passt_2.socket \ - -netdev vhost-user,id=v,chardev=c \ - -device virtio-net,netdev=v \ -object memory-backend-memfd,id=m,share=on,size=${__vmem} \ -numa node,memdev=m" + + if [ ${VHOST_USER_MQ} -gt 1 ]; then + __qemu_netdev1="${__qemu_netdev1} \ + -device virtio-net,netdev=v,mq=true \ + -netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ}" + __qemu_netdev2="${__qemu_netdev2} \ + -device virtio-net,netdev=v,mq=true \ + -netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ}" + else + __qemu_netdev1="${__qemu_netdev1} \ + -device virtio-net,netdev=v \ + -netdev vhost-user,id=v,chardev=c" + __qemu_netdev2="${__qemu_netdev2} \ + -device virtio-net,netdev=v \ + -netdev vhost-user,id=v,chardev=c" + fi else __qemu_netdev1="-device virtio-net-pci,netdev=s \ -netdev stream,id=s,server=off,addr.type=unix,addr.path=${STATESETUP}/passt_1.socket" diff --git a/test/run b/test/run index f858e5586847..652cc12b1234 100755 --- a/test/run +++ b/test/run @@ -190,6 +190,29 @@ run() { test passt_vu_in_ns/shutdown teardown passt_in_ns
+ VHOST_USER=1 + VHOST_USER_MQ=8 + setup passt_in_ns + test passt_vu/ndp + test passt_vu_in_ns/dhcp + test passt_vu_in_ns/icmp + test passt_vu_in_ns/tcp + test passt_vu_in_ns/udp + test passt_vu_in_ns/shutdown + teardown passt_in_ns + + setup two_guests + test two_guests_vu/basic + teardown two_guests + + setup passt_in_ns + test passt_vu/ndp + test passt_vu_in_ns/dhcp + test perf/passt_vu_tcp + test perf/passt_vu_udp + test passt_vu_in_ns/shutdown + teardown passt_in_ns + # TODO: Make those faster by at least pre-installing gcc and make on # non-x86 images, then re-enable. skip_distro() { -- 2.51.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, Nov 21, 2025 at 05:59:01PM +0100, Laurent Vivier wrote:
For multiqueue support, we need to ensure packets are routed to the correct RX queue based on which TX queue they originated from. This
I know what you mean, but I don't love this phrasing. The packet itself didn't originate from a Tx queue - it's only the flow that connects this packet to an earlier one that did come from a Tx queue.
requires tracking the queue pair association for each flow.
Add a qpair field to struct flow_common to store the queue pair number for each flow (FLOW_QPAIR_INVALID if not assigned). The field uses 5 bits, allowing support for up to 31 queue pairs (index 31 is reserved for FLOW_QPAIR_INVALID), which we verify is sufficient for VHOST_USER_MAX_VQS via static assertion.
Introduce flow_qp() to retrieve the queue pair for a flow (returning 0 for NULL flows or flows without a valid assignment), and flow_setqp() to assign queue pairs. Update all protocol handlers (TCP, UDP, ICMP) and their tap handlers to accept a qpair parameter and assign it to flows using FLOW_SETQP().
The vhost-user code now uses FLOW_QP() to select the appropriate RX queue when sending packets, ensuring they're routed based on the originating TX queue rather than always using queue 0.
Note that flows initiated from the host side (via sockets, for example udp_flow_from_sock()) currently default to queue pair 0, as they don't have an associated incoming queue to derive the assignment from.
Signed-off-by: Laurent Vivier
--- flow.c | 30 +++++++++++++++++++++++++++ flow.h | 17 ++++++++++++++++ icmp.c | 23 ++++++++++++--------- icmp.h | 2 +- tap.c | 18 ++++++++-------- tcp.c | 60 +++++++++++++++++++++++++++++++----------------------- tcp.h | 11 +++++----- tcp_vu.c | 8 +++++--- udp.c | 29 ++++++++++++++------------ udp.h | 12 ++++++----- udp_flow.c | 8 +++++++- udp_flow.h | 2 +- udp_vu.c | 4 +++- 13 files changed, 149 insertions(+), 75 deletions(-) diff --git a/flow.c b/flow.c index 278a9cf0ac6d..8bf18bdca170 100644 --- a/flow.c +++ b/flow.c @@ -405,6 +405,35 @@ void flow_epollid_register(int epollid, int epollfd) epoll_id_to_fd[epollid] = epollfd; }
+/** + * flow_qp() - Get the queue pair for a flow + * @f: Flow to query (may be NULL) + * + * Return: queue pair number for the flow, or 0 if flow is NULL or has no + * valid queue pair assignment + */ +unsigned int flow_qp(const struct flow_common *f) +{ + if (f == NULL || f->qpair == FLOW_QPAIR_INVALID)
Are there any instances where you actually want to pass a NULL flow to this? If you're going to return 0 anyway, why not just set f->qpair to 0 by default, rather than using FLOW_QPAIR_INVALID? Rest of the patch LGTM.
+ return 0; + return f->qpair; +} + +/** + * flow_setqp() - Set queue pair assignment for a flow + * @f: Flow to update + * @qpair: Queue pair number to assign + */ +void flow_setqp(struct flow_common *f, unsigned int qpair) +{ + ASSERT(qpair < FLOW_QPAIR_MAX); + + flow_trace((union flow *)f, "updating queue pair from %d to %d", + f->qpair, qpair); + + f->qpair = qpair; +} + /** * flow_initiate_() - Move flow to INI, setting pif[INISIDE] * @flow: Flow to change state @@ -609,6 +638,7 @@ union flow *flow_alloc(void) flow_new_entry = flow; memset(flow, 0, sizeof(*flow)); flow_epollid_clear(&flow->f); + flow->f.qpair = FLOW_QPAIR_INVALID; flow_set_state(&flow->f, FLOW_STATE_NEW);
return flow; diff --git a/flow.h b/flow.h index b43b0b1dd7f2..a48c00c5b621 100644 --- a/flow.h +++ b/flow.h @@ -179,6 +179,8 @@ int flowside_connect(const struct ctx *c, int s, * @side[]: Information for each side of the flow * @tap_omac: MAC address of remote endpoint as seen from the guest * @epollid: epollfd identifier, or EPOLLFD_ID_INVALID + * @qpair: Queue pair number assigned to this flow + * (FLOW_QPAIR_INVALID if not assigned) */ struct flow_common { #ifdef __GNUC__ @@ -199,6 +201,8 @@ struct flow_common {
#define EPOLLFD_ID_BITS 8 unsigned int epollid:EPOLLFD_ID_BITS; +#define FLOW_QPAIR_BITS 5 + unsigned int qpair:FLOW_QPAIR_BITS; };
#define EPOLLFD_ID_DEFAULT 0 @@ -206,6 +210,12 @@ struct flow_common { #define EPOLLFD_ID_MAX (EPOLLFD_ID_SIZE - 1) #define EPOLLFD_ID_INVALID EPOLLFD_ID_MAX
+#define FLOW_QPAIR_NUM (1 << FLOW_QPAIR_BITS) +#define FLOW_QPAIR_MAX (FLOW_QPAIR_NUM - 1) +#define FLOW_QPAIR_INVALID FLOW_QPAIR_MAX + +static_assert(VHOST_USER_MAX_VQS <= FLOW_QPAIR_MAX * 2); + #define FLOW_INDEX_BITS 17 /* 128k - 1 */ #define FLOW_MAX MAX_FROM_BITS(FLOW_INDEX_BITS)
@@ -266,6 +276,13 @@ int flow_epollfd(const struct flow_common *f); void flow_epollid_set(struct flow_common *f, int epollid); void flow_epollid_clear(struct flow_common *f); void flow_epollid_register(int epollid, int epollfd); +unsigned int flow_qp(const struct flow_common *f); +#define FLOW_QP(flow_) \ + (flow_qp(&(flow_)->f)) +void flow_setqp(struct flow_common *f, unsigned int qpair); +#define FLOW_SETQP(flow_, _qpair) \ + (flow_setqp(&(flow_)->f, _qpair)) +
I don't love flow_setqp() just being a standalone call, since it means we don't know from the flow state whether the qpair field is initialised or not. I'd prefer to require that the qpair is set on one of the existing state transitions (flow_initiate(), flow_target(), FLOW_SET_TYPE() or FLOW_ACTIVATE). If it absolutely can't go at one of those points, I think we should introduce a new state transition at which the qpair is added. (flow.h describes the state transitions in detail).
void flow_defer_handler(const struct ctx *c, const struct timespec *now); int flow_migrate_source_early(struct ctx *c, const struct migrate_stage *stage, int fd); diff --git a/icmp.c b/icmp.c index a9f0518c2f61..04f21f758998 100644 --- a/icmp.c +++ b/icmp.c @@ -132,13 +132,13 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) const struct in_addr *daddr = inany_v4(&ini->eaddr);
ASSERT(saddr && daddr); /* Must have IPv4 addresses */ - tap_icmp4_send(c, 0, *saddr, *daddr, buf, + tap_icmp4_send(c, FLOW_QP(pingf), *saddr, *daddr, buf, pingf->f.tap_omac, n); } else if (pingf->f.type == FLOW_PING6) { const struct in6_addr *saddr = &ini->oaddr.a6; const struct in6_addr *daddr = &ini->eaddr.a6;
- tap_icmp6_send(c, 0, saddr, daddr, buf, + tap_icmp6_send(c, FLOW_QP(pingf), saddr, daddr, buf, pingf->f.tap_omac, n); } return; @@ -238,17 +238,18 @@ cancel:
/** * icmp_tap_handler() - Handle packets from tap - * @c: Execution context - * @pif: pif on which the packet is arriving - * @af: Address family, AF_INET or AF_INET6 - * @saddr: Source address - * @daddr: Destination address - * @data: Single packet with ICMP/ICMPv6 header - * @now: Current timestamp + * @c: Execution context + * @qpair: Queue pair + * @pif: pif on which the packet is arriving + * @af: Address family, AF_INET or AF_INET6 + * @saddr: Source address + * @daddr: Destination address + * @data: Single packet with ICMP/ICMPv6 header + * @now: Current timestamp * * Return: count of consumed packets (always 1, even if malformed) */ -int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, +int icmp_tap_handler(const struct ctx *c, int qpair, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, struct iov_tail *data, const struct timespec *now) { @@ -309,6 +310,8 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, else if (!(pingf = icmp_ping_new(c, af, id, saddr, daddr))) return 1;
+ FLOW_SETQP(pingf, qpair); + tgt = &pingf->f.side[TGTSIDE];
ASSERT(flow_proto[pingf->f.type] == proto); diff --git a/icmp.h b/icmp.h index 1a0e6205f087..f78508ba3bc9 100644 --- a/icmp.h +++ b/icmp.h @@ -10,7 +10,7 @@ struct ctx; struct icmp_ping_flow;
void icmp_sock_handler(const struct ctx *c, union epoll_ref ref); -int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, +int icmp_tap_handler(const struct ctx *c, int qpair, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, struct iov_tail *data, const struct timespec *now); void icmp_init(void); diff --git a/tap.c b/tap.c index 529acecc9851..ccb47cf82fd4 100644 --- a/tap.c +++ b/tap.c @@ -739,7 +739,7 @@ resume: if (!eh) continue; if (ntohs(eh->h_proto) == ETH_P_ARP) { - arp(c, 0, &data); + arp(c, qpair, &data); continue; }
@@ -786,7 +786,7 @@ resume:
tap_packet_debug(iph, NULL, NULL, 0, NULL, 1);
- icmp_tap_handler(c, PIF_TAP, AF_INET, + icmp_tap_handler(c, qpair, PIF_TAP, AF_INET, &iph->saddr, &iph->daddr, &data, now); continue; @@ -863,14 +863,14 @@ append: if (c->no_tcp) continue; for (k = 0; k < p->count; ) - k += tcp_tap_handler(c, PIF_TAP, AF_INET, + k += tcp_tap_handler(c, qpair, PIF_TAP, AF_INET, &seq->saddr, &seq->daddr, 0, p, k, now); } else if (seq->protocol == IPPROTO_UDP) { if (c->no_udp) continue; for (k = 0; k < p->count; ) - k += udp_tap_handler(c, PIF_TAP, AF_INET, + k += udp_tap_handler(c, qpair, PIF_TAP, AF_INET, &seq->saddr, &seq->daddr, seq->ttl, p, k, now); } @@ -967,12 +967,12 @@ resume: continue;
ndp_data = data; - if (ndp(c, 0, saddr, &ndp_data)) + if (ndp(c, qpair, saddr, &ndp_data)) continue;
tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1);
- icmp_tap_handler(c, PIF_TAP, AF_INET6, + icmp_tap_handler(c, qpair, PIF_TAP, AF_INET6, saddr, daddr, &data, now); continue; } @@ -986,7 +986,7 @@ resume: if (proto == IPPROTO_UDP) { struct iov_tail uh_data = data;
- if (dhcpv6(c, 0, &uh_data, saddr, daddr)) + if (dhcpv6(c, qpair, &uh_data, saddr, daddr)) continue; }
@@ -1054,14 +1054,14 @@ append: if (c->no_tcp) continue; for (k = 0; k < p->count; ) - k += tcp_tap_handler(c, PIF_TAP, AF_INET6, + k += tcp_tap_handler(c, qpair, PIF_TAP, AF_INET6, &seq->saddr, &seq->daddr, seq->flow_lbl, p, k, now); } else if (seq->protocol == IPPROTO_UDP) { if (c->no_udp) continue; for (k = 0; k < p->count; ) - k += udp_tap_handler(c, PIF_TAP, AF_INET6, + k += udp_tap_handler(c, qpair, PIF_TAP, AF_INET6, &seq->saddr, &seq->daddr, seq->hop_limit, p, k, now); } diff --git a/tcp.c b/tcp.c index 76f3273bb93f..40cf4a5415e5 100644 --- a/tcp.c +++ b/tcp.c @@ -1497,21 +1497,23 @@ static void tcp_bind_outbound(const struct ctx *c,
/** * tcp_conn_from_tap() - Handle connection request (SYN segment) from tap - * @c: Execution context - * @af: Address family, AF_INET or AF_INET6 - * @saddr: Source address, pointer to in_addr or in6_addr - * @daddr: Destination address, pointer to in_addr or in6_addr - * @th: TCP header from tap: caller MUST ensure it's there - * @opts: Pointer to start of options - * @optlen: Bytes in options: caller MUST ensure available length - * @now: Current timestamp + * @c: Execution context + * @qpair: Queue pair for the flow + * @af: Address family, AF_INET or AF_INET6 + * @saddr: Source address, pointer to in_addr or in6_addr + * @daddr: Destination address, pointer to in_addr or in6_addr + * @th: TCP header from tap: caller MUST ensure it's there + * @opts: Pointer to start of options + * @optlen: Bytes in options: caller MUST ensure available length + * @now: Current timestamp * * #syscalls:vu getsockname */ -static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, - const void *saddr, const void *daddr, - const struct tcphdr *th, const char *opts, - size_t optlen, const struct timespec *now) +static void tcp_conn_from_tap(const struct ctx *c, int qpair, + sa_family_t af, const void *saddr, + const void *daddr, const struct tcphdr *th, + const char *opts, size_t optlen, + const struct timespec *now) { in_port_t srcport = ntohs(th->source); in_port_t dstport = ntohs(th->dest); @@ -1623,6 +1625,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, conn_event(c, conn, TAP_SYN_ACK_SENT); }
+ FLOW_SETQP(conn, qpair); tcp_epoll_ctl(c, conn);
if (c->mode == MODE_VU) { /* To rebind to same oport after migration */ @@ -2056,21 +2059,23 @@ static void tcp_rst_no_conn(const struct ctx *c, int qpair, int af,
/** * tcp_tap_handler() - Handle packets from tap and state transitions - * @c: Execution context - * @pif: pif on which the packet is arriving - * @af: Address family, AF_INET or AF_INET6 - * @saddr: Source address - * @daddr: Destination address - * @flow_lbl: IPv6 flow label (ignored for IPv4) - * @p: Pool of TCP packets, with TCP headers - * @idx: Index of first packet in pool to process - * @now: Current timestamp + * @c: Execution context + * @qpair: Queue pair + * @pif: pif on which the packet is arriving + * @af: Address family, AF_INET or AF_INET6 + * @saddr: Source address + * @daddr: Destination address + * @flow_lbl: IPv6 flow label (ignored for IPv4) + * @p: Pool of TCP packets, with TCP headers + * @idx: Index of first packet in pool to process + * @now: Current timestamp * * Return: count of consumed packets */ -int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, - const void *saddr, const void *daddr, uint32_t flow_lbl, - const struct pool *p, int idx, const struct timespec *now) +int tcp_tap_handler(const struct ctx *c, int qpair, uint8_t pif, + sa_family_t af, const void *saddr, const void *daddr, + uint32_t flow_lbl, const struct pool *p, int idx, + const struct timespec *now) { struct tcp_tap_conn *conn; struct tcphdr th_storage; @@ -2107,10 +2112,10 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, /* New connection from tap */ if (!flow) { if (opts && th->syn && !th->ack) - tcp_conn_from_tap(c, af, saddr, daddr, th, + tcp_conn_from_tap(c, qpair, af, saddr, daddr, th, opts, optlen, now); else - tcp_rst_no_conn(c, 0, af, saddr, daddr, flow_lbl, th, + tcp_rst_no_conn(c, qpair, af, saddr, daddr, flow_lbl, th, l4len); return 1; } @@ -2119,6 +2124,9 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, ASSERT(pif_at_sidx(sidx) == PIF_TAP); conn = &flow->tcp;
+ /* update queue pair */ + FLOW_SETQP(flow, qpair); + flow_trace(conn, "packet length %zu from tap", l4len);
if (th->rst) { diff --git a/tcp.h b/tcp.h index 0082386725c2..de29220c6ac2 100644 --- a/tcp.h +++ b/tcp.h @@ -13,11 +13,12 @@ struct ctx; void tcp_timer_handler(const struct ctx *c, union epoll_ref ref); void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, const struct timespec *now); -void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, - uint32_t events); -int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, - const void *saddr, const void *daddr, uint32_t flow_lbl, - const struct pool *p, int idx, const struct timespec *now); +void tcp_sock_handler(const struct ctx *c, + union epoll_ref ref, uint32_t events); +int tcp_tap_handler(const struct ctx *c, int qpair, uint8_t pif, + sa_family_t af, const void *saddr, const void *daddr, + uint32_t flow_lbl, const struct pool *p, int idx, + const struct timespec *now); int tcp_sock_init(const struct ctx *c, const union inany_addr *addr, const char *ifname, in_port_t port); int tcp_init(struct ctx *c); diff --git a/tcp_vu.c b/tcp_vu.c index 1c81ce376dad..1044491d404c 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -71,14 +71,15 @@ static size_t tcp_vu_hdrlen(bool v6) int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) { struct vu_dev *vdev = c->vdev; - struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; - size_t optlen, hdrlen; + int rx_queue = FLOW_QP(conn) * 2; + struct vu_virtq *vq = &vdev->vq[rx_queue]; struct vu_virtq_element flags_elem[2]; struct ipv6hdr *ip6h = NULL; struct iphdr *ip4h = NULL; struct iovec flags_iov[2]; struct tcp_syn_opts *opts; struct iov_tail payload; + size_t optlen, hdrlen; struct tcphdr *th; struct ethhdr *eh; uint32_t seq; @@ -349,7 +350,8 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) { uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; struct vu_dev *vdev = c->vdev; - struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; + int rx_queue = FLOW_QP(conn) * 2; + struct vu_virtq *vq = &vdev->vq[rx_queue]; ssize_t len, previous_dlen; int i, iov_cnt, head_cnt; size_t hdrlen, fillsize; diff --git a/udp.c b/udp.c index 2c74b42a3d95..e2b367b07eb8 100644 --- a/udp.c +++ b/udp.c @@ -636,12 +636,14 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, if (hdr->cmsg_level == IPPROTO_IP && (o4 = inany_v4(&otap)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, 0, ee, toside, *o4, data, dlen); + udp_send_tap_icmp4(c, FLOW_QP(uflow), ee, toside, + *o4, data, dlen); return 1; }
if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { - udp_send_tap_icmp6(c, 0, ee, toside, &otap.a6, data, dlen, + udp_send_tap_icmp6(c, FLOW_QP(uflow), ee, + toside, &otap.a6, data, dlen, FLOW_IDX(uflow)); return 1; } @@ -970,21 +972,22 @@ fail:
/** * udp_tap_handler() - Handle packets from tap - * @c: Execution context - * @pif: pif on which the packet is arriving - * @af: Address family, AF_INET or AF_INET6 - * @saddr: Source address - * @daddr: Destination address - * @ttl: TTL or hop limit for packets to be sent in this call - * @p: Pool of UDP packets, with UDP headers - * @idx: Index of first packet to process - * @now: Current timestamp + * @c: Execution context + * @qpair: Queue pair + * @pif: pif on which the packet is arriving + * @af: Address family, AF_INET or AF_INET6 + * @saddr: Source address + * @daddr: Destination address + * @ttl: TTL or hop limit for packets to be sent in this call + * @p: Pool of UDP packets, with UDP headers + * @idx: Index of first packet to process + * @now: Current timestamp * * Return: count of consumed packets * * #syscalls sendmmsg */ -int udp_tap_handler(const struct ctx *c, uint8_t pif, +int udp_tap_handler(const struct ctx *c, int qpair, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, uint8_t ttl, const struct pool *p, int idx, const struct timespec *now) @@ -1018,7 +1021,7 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, src = ntohs(uh->source); dst = ntohs(uh->dest);
- tosidx = udp_flow_from_tap(c, pif, af, saddr, daddr, src, dst, now); + tosidx = udp_flow_from_tap(c, qpair, pif, af, saddr, daddr, src, dst, now); if (!(uflow = udp_at_sidx(tosidx))) { char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN];
diff --git a/udp.h b/udp.h index f1d83f380b3f..20418e34b0cc 100644 --- a/udp.h +++ b/udp.h @@ -7,11 +7,13 @@ #define UDP_H
void udp_portmap_clear(void); -void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref, - uint32_t events, const struct timespec *now); -void udp_sock_handler(const struct ctx *c, union epoll_ref ref, - uint32_t events, const struct timespec *now); -int udp_tap_handler(const struct ctx *c, uint8_t pif, +void udp_listen_sock_handler(const struct ctx *c, + union epoll_ref ref, uint32_t events, + const struct timespec *now); +void udp_sock_handler(const struct ctx *c, + union epoll_ref ref, uint32_t events, + const struct timespec *now); +int udp_tap_handler(const struct ctx *c, int qpair, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, uint8_t ttl, const struct pool *p, int idx, const struct timespec *now); diff --git a/udp_flow.c b/udp_flow.c index 8907f2f72741..9ba1af64b833 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -266,17 +266,19 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, /** * udp_flow_from_tap() - Find or create UDP flow for tap packets * @c: Execution context + * @qpair: Queue pair for the flow * @pif: pif on which the packet is arriving * @af: Address family, AF_INET or AF_INET6 * @saddr: Source address on guest side * @daddr: Destination address guest side * @srcport: Source port on guest side * @dstport: Destination port on guest side + * @now: Current timestamp * * Return: sidx for the destination side of the flow for this packet, or * FLOW_SIDX_NONE if we couldn't find or create a flow. */ -flow_sidx_t udp_flow_from_tap(const struct ctx *c, +flow_sidx_t udp_flow_from_tap(const struct ctx *c, int qpair, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, in_port_t srcport, in_port_t dstport, @@ -293,6 +295,8 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c, srcport, dstport); if ((uflow = udp_at_sidx(sidx))) { uflow->ts = now->tv_sec; + /* update qpair */ + FLOW_SETQP(uflow, qpair); return flow_sidx_opposite(sidx); }
@@ -316,6 +320,8 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c, return FLOW_SIDX_NONE; }
+ FLOW_SETQP(flow, qpair); + return udp_flow_new(c, flow, now); }
diff --git a/udp_flow.h b/udp_flow.h index 4c528e95ca66..7c0fc3830b50 100644 --- a/udp_flow.h +++ b/udp_flow.h @@ -36,7 +36,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, const union inany_addr *dst, in_port_t port, const union sockaddr_inany *s_in, const struct timespec *now); -flow_sidx_t udp_flow_from_tap(const struct ctx *c, +flow_sidx_t udp_flow_from_tap(const struct ctx *c, int qpair, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, in_port_t srcport, in_port_t dstport, diff --git a/udp_vu.c b/udp_vu.c index 099677f914e7..f3cf97393d0a 100644 --- a/udp_vu.c +++ b/udp_vu.c @@ -202,9 +202,11 @@ static void udp_vu_csum(const struct flowside *toside, int iov_used) void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx) { const struct flowside *toside = flowside_at_sidx(tosidx); + const struct udp_flow *uflow = udp_at_sidx(tosidx); bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)); struct vu_dev *vdev = c->vdev; - struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; + int rx_queue = FLOW_QP(uflow) * 2; + struct vu_virtq *vq = &vdev->vq[rx_queue]; int i;
for (i = 0; i < n; i++) { -- 2.51.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On 11/26/25 03:20, David Gibson wrote:
On Fri, Nov 21, 2025 at 05:58:58PM +0100, Laurent Vivier wrote:
Add the --max-queues parameter to specify the maximum number of queue
Nit: you updated the option to be --max-qpairs, which I think is good, but now the commit message is out of date.
One other query: what makes it "max qpairs" rather than just "qpairs" - are there (now or in planned work) circumstances where you'd end up with less qpairs than specified here?
In fact, the number of qpairs is negociated by the guest. If you start passt with max-qpairs=16 but QEMU with queues=4, it will only use 4 qpairs. If you start QEMU with queues=17, it will fail. Perhaps we can remove the parameter and rely only on the hardcoded one (32 virtqueues = 16 qpairs)? I can add "--mq" parameter instead to enable multiqueue?
pairs supported in vhost-user mode. This enables multi-queue support by allowing configuration of up to 16 queue pairs (32 virtqueues).
For the moment, only the first RX queue is used, the TX queue is selected by the guest kernel.
IIUC, with this patch (but not the ones after) things will break if the guest uses a qpair other than 0, right? AFAICT vu_kick_cb() isn't updated so will ignore anything on the other qpairs.
No, I think it should work. For the TX part (from guest) we add all the kick_fd to the epollfd, so all the TX queues are managed. But we we use only RX queue 0, that is not what is expected by the guest kernel as we should use the same queue pair but I think kernel will process them anyway. Thanks, Laurent
On 11/26/25 04:41, David Gibson wrote:
+/** + * flow_qp() - Get the queue pair for a flow + * @f: Flow to query (may be NULL) + * + * Return: queue pair number for the flow, or 0 if flow is NULL or has no + * valid queue pair assignment + */ +unsigned int flow_qp(const struct flow_common *f) +{ + if (f == NULL || f->qpair == FLOW_QPAIR_INVALID) Are there any instances where you actually want to pass a NULL flow to this?
If you're going to return 0 anyway, why not just set f->qpair to 0 by default, rather than using FLOW_QPAIR_INVALID?
In fact, in the multithread part I need to know if the queue pair has been set or not. I agree that we should avoid this if possible because it takes one slot in the array. And as we have 16 qpairs we must use 5 bit (we need 17 values) rather than 4 bit. I'm reworking the multithread part, I will see if I can remove this. Thanks, Laurent
On 11/26/25 04:45, David Gibson wrote:
+ + if [ ${VHOST_USER_MQ} -gt 1 ]; then + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v,mq=true \ + -netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ}" + else + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v \ + -netdev vhost-user,id=v,chardev=c" Is there a diffence for qemu between omitting queues= and using queues=1? If not we can simplify this. For the passt option it's worth explicitly not-setting it for the single-queue case, so that we're exercising the command line option as well. But exercising qemu's options is not our concern, so we can use queues=1 if it means the same thing as omitting it entirely.
I think the important parameter here is mq=true that will set or not the feature. This exercise the interface between QEMU and passt. I will try to see if we can set queues unconditionally (with 1 or more). Thanks, Laurent
On Thu, Nov 27, 2025 at 10:17:43AM +0100, Laurent Vivier wrote:
On 11/26/25 04:41, David Gibson wrote:
+/** + * flow_qp() - Get the queue pair for a flow + * @f: Flow to query (may be NULL) + * + * Return: queue pair number for the flow, or 0 if flow is NULL or has no + * valid queue pair assignment + */ +unsigned int flow_qp(const struct flow_common *f) +{ + if (f == NULL || f->qpair == FLOW_QPAIR_INVALID) Are there any instances where you actually want to pass a NULL flow to this?
If you're going to return 0 anyway, why not just set f->qpair to 0 by default, rather than using FLOW_QPAIR_INVALID?
In fact, in the multithread part I need to know if the queue pair has been set or not. I agree that we should avoid this if possible because it takes one slot in the array. And as we have 16 qpairs we must use 5 bit (we need 17 values) rather than 4 bit. I'm reworking the multithread part, I will see if I can remove this.
Ok. It's not obvious to me why there would be a reason to explicitly track "unset" rather than setting a default value early. So, I guess we'll see in the next spin. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Thu, Nov 27, 2025 at 10:20:37AM +0100, Laurent Vivier wrote:
On 11/26/25 04:45, David Gibson wrote:
+ + if [ ${VHOST_USER_MQ} -gt 1 ]; then + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v,mq=true \ + -netdev vhost-user,id=v,chardev=c,queues=${VHOST_USER_MQ}" + else + __qemu_netdev="${__qemu_netdev} \ + -device virtio-net,netdev=v \ + -netdev vhost-user,id=v,chardev=c" Is there a diffence for qemu between omitting queues= and using queues=1? If not we can simplify this. For the passt option it's worth explicitly not-setting it for the single-queue case, so that we're exercising the command line option as well. But exercising qemu's options is not our concern, so we can use queues=1 if it means the same thing as omitting it entirely.
I think the important parameter here is mq=true that will set or not the feature. This exercise the interface between QEMU and passt.
Ah, good point, I missed that.
I will try to see if we can set queues unconditionally (with 1 or more).
Thanks, Laurent
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Laurent Vivier