Usually, of course, it's invalid to pass a NULL buffer to recv(). However, it's acceptable when using MSG_TRUNC, because that suppresses actually writing to the buffer. So, we pass NULL in tcp_sock_consume(). Unfortunately, checker tools aren't always aware of that special case: we already have a suppression for cppcheck to cover this. valgrind-3.22.0 (present in Fedora 39) has a similar problem, generating a spurious warning here. We could generate another suppression for valgrind, however, it so happens that we already have tcp_buf_discard ready to hand. If we pass this instead of NULL it makes both cppcheck and valgrind happy. We're still using the MSG_TRUNC flag, the kernel doesn't actually have to copy data, so we should still have the performance benefits of it. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tcp.c b/tcp.c index cfcd40a..f51d27a 100644 --- a/tcp.c +++ b/tcp.c @@ -2106,8 +2106,13 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) if (SEQ_LE(ack_seq, conn->seq_ack_from_tap)) return 0; - /* cppcheck-suppress [nullPointer, unmatchedSuppression] */ - if (recv(conn->sock, NULL, ack_seq - conn->seq_ack_from_tap, + /* Since we're using MSG_TRUNC, it's allowed to pass NULL instead of a + * real buffer. However some checker tools (including at least some + * versions of cppcheck and valgrind) aren't aware of that special case. + * We so happen to have a convenient discard buffer, so we might as well + * pass it to avoid spurious complaints from those tools. + */ + if (recv(conn->sock, tcp_buf_discard, ack_seq - conn->seq_ack_from_tap, MSG_DONTWAIT | MSG_TRUNC) < 0) return -errno; -- 2.41.0
On Wed, 15 Nov 2023 15:41:24 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Usually, of course, it's invalid to pass a NULL buffer to recv(). However, it's acceptable when using MSG_TRUNC, because that suppresses actually writing to the buffer. So, we pass NULL in tcp_sock_consume(). Unfortunately, checker tools aren't always aware of that special case: we already have a suppression for cppcheck to cover this. valgrind-3.22.0 (present in Fedora 39) has a similar problem, generating a spurious warning here.I haven't tried valgrind 3.22 yet, but... do you happen to know why test/valgrind.supp doesn't cover this anymore?We could generate another suppression for valgrind, however, it so happens that we already have tcp_buf_discard ready to hand. If we pass this instead of NULL it makes both cppcheck and valgrind happy. We're still using the MSG_TRUNC flag, the kernel doesn't actually have to copy data, so we should still have the performance benefits of it.I'm not enthusiastic about this, because using tcp_buf_discard there might tell an optimising compiler that it's useful to prefetch it. We would also pass the actual address of tcp_buf_discard to the kernel, and I'm not sure if this has further subtle implications on possible optimisations in the kernel implementation (even though as you said no data is actually copied). -- Stefano
On Wed, Nov 15, 2023 at 06:32:59AM +0100, Stefano Brivio wrote:On Wed, 15 Nov 2023 15:41:24 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Huh.. I hadn't spotted there was an existing suppression. I don't know why that's not working any more, I can have a closer look.Usually, of course, it's invalid to pass a NULL buffer to recv(). However, it's acceptable when using MSG_TRUNC, because that suppresses actually writing to the buffer. So, we pass NULL in tcp_sock_consume(). Unfortunately, checker tools aren't always aware of that special case: we already have a suppression for cppcheck to cover this. valgrind-3.22.0 (present in Fedora 39) has a similar problem, generating a spurious warning here.I haven't tried valgrind 3.22 yet, but... do you happen to know why test/valgrind.supp doesn't cover this anymore?Ok, fair points. I'll revisit this. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonWe could generate another suppression for valgrind, however, it so happens that we already have tcp_buf_discard ready to hand. If we pass this instead of NULL it makes both cppcheck and valgrind happy. We're still using the MSG_TRUNC flag, the kernel doesn't actually have to copy data, so we should still have the performance benefits of it.I'm not enthusiastic about this, because using tcp_buf_discard there might tell an optimising compiler that it's useful to prefetch it. We would also pass the actual address of tcp_buf_discard to the kernel, and I'm not sure if this has further subtle implications on possible optimisations in the kernel implementation (even though as you said no data is actually copied).
On Wed, Nov 15, 2023 at 07:11:19PM +1100, David Gibson wrote:On Wed, Nov 15, 2023 at 06:32:59AM +0100, Stefano Brivio wrote:Huh.. so.. this actually intersects with the stuff we discussed on the last call about whether it's a good idea to build without optimization for the valgrind tests (we currently do). So, in terms of -g, my understanding is that valgrind doesn't need debug symbols for its actual test mechanisms. But, I realised later, that it obviously does in order to identify meaningfully where the problems occurred - which also includes matching then to suppressions. So it seems the change that's caused the error for me is not in valgrind, but in the compiler. Even with -O0, the compiler in Fedora 39 is inlining tcp_sock_consume() (confirmed with objdump). Since there's no stack frame for it, valgrind doesn't match it against the suppression. I'm now testing a new spin that uses an explicit ((noinline) to fix the suppression. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonOn Wed, 15 Nov 2023 15:41:24 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Huh.. I hadn't spotted there was an existing suppression. I don't know why that's not working any more, I can have a closer look.Usually, of course, it's invalid to pass a NULL buffer to recv(). However, it's acceptable when using MSG_TRUNC, because that suppresses actually writing to the buffer. So, we pass NULL in tcp_sock_consume(). Unfortunately, checker tools aren't always aware of that special case: we already have a suppression for cppcheck to cover this. valgrind-3.22.0 (present in Fedora 39) has a similar problem, generating a spurious warning here.I haven't tried valgrind 3.22 yet, but... do you happen to know why test/valgrind.supp doesn't cover this anymore?Ok, fair points. I'll revisit this.We could generate another suppression for valgrind, however, it so happens that we already have tcp_buf_discard ready to hand. If we pass this instead of NULL it makes both cppcheck and valgrind happy. We're still using the MSG_TRUNC flag, the kernel doesn't actually have to copy data, so we should still have the performance benefits of it.I'm not enthusiastic about this, because using tcp_buf_discard there might tell an optimising compiler that it's useful to prefetch it. We would also pass the actual address of tcp_buf_discard to the kernel, and I'm not sure if this has further subtle implications on possible optimisations in the kernel implementation (even though as you said no data is actually copied).