With the new versions of compiler and valgrind in Fedora 39, I hit some spurious test failures. Here are some workarounds. Changes since v1: * Took entirely different approach, fixing the existing suppression * Extra patch allowing optimised valgrind builds David Gibson (2): valgrind: Adjust suppression for MSG_TRUNC with NULL buffer valgrind: Don't disable optimizations for valgrind builds Makefile | 2 +- tcp.c | 9 +++++++++ test/valgrind.supp | 3 +-- 3 files changed, 11 insertions(+), 3 deletions(-) -- 2.41.0
valgrind complains if we pass a NULL buffer to recv(), even if we use MSG_TRUNC, in which case it's actually safe. For a long time we've had a valgrind suppression for this. It singles out the recv() in tcp_sock_consume(), the only place we use MSG_TRUNC. However, tcp_sock_consume() only has a single caller, which makes it a prime candidate for inlining. If inlined, it won't appear on the stack and valgrind won't match the correct suppression. It appears that certain compiler versions (for example gcc-13.2.1 in Fedora 39) will inline this function even with the -O0 we use for valgrind builds. This breaks the suppression leading to a spurious failure in the tests. There's not really any way to adjust the suppression itself without making it overly broad (we don't want to match other recv() calls). So, as a hack explicitly prevent inlining of this function when we're making a valgrind build. To accomplish this add an explicit -DVALGRIND when making such a build. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- Makefile | 2 +- tcp.c | 9 +++++++++ test/valgrind.supp | 3 +-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index ff21459..57b2544 100644 --- a/Makefile +++ b/Makefile @@ -128,7 +128,7 @@ qrap: $(QRAP_SRCS) passt.h valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ getpid gettid kill clock_gettime mmap \ munmap open unlink gettimeofday futex -valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) +valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) -DVALGRIND valgrind: all .PHONY: clean diff --git a/tcp.c b/tcp.c index cfcd40a..b86d433 100644 --- a/tcp.c +++ b/tcp.c @@ -2097,6 +2097,15 @@ static void tcp_conn_from_tap(struct ctx *c, * * Return: 0 on success, negative error code from recv() on failure */ +#ifdef VALGRIND +/* valgrind doesn't realise that passing a NULL buffer to recv() is ok if using + * MSG_TRUNC. We have a supression for this in the tests, but it relies on + * valgrind being able to see the tcp_sock_consume() stack frame, which it won't + * if this gets inlined. This has a single caller making it a likely inlining + * candidate, and certain compiler versions will do so even at -O0. + */ + __attribute__((noinline)) +#endif /* VALGRIND */ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) { /* Simply ignore out-of-order ACKs: we already consumed the data we diff --git a/test/valgrind.supp b/test/valgrind.supp index 1228056..a158394 100644 --- a/test/valgrind.supp +++ b/test/valgrind.supp @@ -3,7 +3,6 @@ passt_recv_MSG_TRUNC_into_NULL_buffer Memcheck:Param socketcall.recvfrom(buf) - fun:recv ... - fun:tcp_sock_consume* + fun:tcp_sock_consume } -- 2.41.0
On Thu, 16 Nov 2023 16:43:49 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:valgrind complains if we pass a NULL buffer to recv(), even if we use MSG_TRUNC, in which case it's actually safe. For a long time we've had a valgrind suppression for this. It singles out the recv() in tcp_sock_consume(), the only place we use MSG_TRUNC. However, tcp_sock_consume() only has a single caller, which makes it a prime candidate for inlining. If inlined, it won't appear on the stack and valgrind won't match the correct suppression. It appears that certain compiler versions (for example gcc-13.2.1 in Fedora 39) will inline this function even with the -O0 we use for valgrind builds. This breaks the suppression leading to a spurious failure in the tests. There's not really any way to adjust the suppression itself without making it overly broad (we don't want to match other recv() calls). So, as a hack explicitly prevent inlining of this function when we're making a valgrind build. To accomplish this add an explicit -DVALGRIND when making such a build. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- Makefile | 2 +- tcp.c | 9 +++++++++ test/valgrind.supp | 3 +-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index ff21459..57b2544 100644 --- a/Makefile +++ b/Makefile @@ -128,7 +128,7 @@ qrap: $(QRAP_SRCS) passt.h valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ getpid gettid kill clock_gettime mmap \ munmap open unlink gettimeofday futex -valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) +valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) -DVALGRIND valgrind: all .PHONY: clean diff --git a/tcp.c b/tcp.c index cfcd40a..b86d433 100644 --- a/tcp.c +++ b/tcp.c @@ -2097,6 +2097,15 @@ static void tcp_conn_from_tap(struct ctx *c, * * Return: 0 on success, negative error code from recv() on failure */ +#ifdef VALGRIND +/* valgrind doesn't realise that passing a NULL buffer to recv() is ok if using + * MSG_TRUNC. We have a supression for this in the tests, but it relies on^ pp (I can fix it up on merge but I guess you prefer to repost). Rest of the series looks good to me. -- Stefano
On Thu, Nov 16, 2023 at 08:21:34AM +0100, Stefano Brivio wrote:On Thu, 16 Nov 2023 16:43:49 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Oops, yes. Resent. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonvalgrind complains if we pass a NULL buffer to recv(), even if we use MSG_TRUNC, in which case it's actually safe. For a long time we've had a valgrind suppression for this. It singles out the recv() in tcp_sock_consume(), the only place we use MSG_TRUNC. However, tcp_sock_consume() only has a single caller, which makes it a prime candidate for inlining. If inlined, it won't appear on the stack and valgrind won't match the correct suppression. It appears that certain compiler versions (for example gcc-13.2.1 in Fedora 39) will inline this function even with the -O0 we use for valgrind builds. This breaks the suppression leading to a spurious failure in the tests. There's not really any way to adjust the suppression itself without making it overly broad (we don't want to match other recv() calls). So, as a hack explicitly prevent inlining of this function when we're making a valgrind build. To accomplish this add an explicit -DVALGRIND when making such a build. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- Makefile | 2 +- tcp.c | 9 +++++++++ test/valgrind.supp | 3 +-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index ff21459..57b2544 100644 --- a/Makefile +++ b/Makefile @@ -128,7 +128,7 @@ qrap: $(QRAP_SRCS) passt.h valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ getpid gettid kill clock_gettime mmap \ munmap open unlink gettimeofday futex -valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) +valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) -DVALGRIND valgrind: all .PHONY: clean diff --git a/tcp.c b/tcp.c index cfcd40a..b86d433 100644 --- a/tcp.c +++ b/tcp.c @@ -2097,6 +2097,15 @@ static void tcp_conn_from_tap(struct ctx *c, * * Return: 0 on success, negative error code from recv() on failure */ +#ifdef VALGRIND +/* valgrind doesn't realise that passing a NULL buffer to recv() is ok if using + * MSG_TRUNC. We have a supression for this in the tests, but it relies on^ pp (I can fix it up on merge but I guess you prefer to repost). Rest of the series looks good to me.
When we plan to use valgrind, we need to build passt a bit differently: * We need debug symbols so that valgrind can match problems it finds to meaningful locations * We need to allow additional syscalls in the seccomp filter, because valgrind's wrappers need them Currently we also disable optimization (-O0). This is unfortunate, because it will make valgrind tests even slower than they'd otherwise be. Worse, it's possible that the asm behaviour without optimization might be different enough that valgrind could miss a use of uninitialized variable or other fault it would detect. I suspect this was originally done because without it inlining could mean that suppressions we use don't reliably match the places we want them to. Alas, it turns out this is true even with -O0. We've now implemented a more robust workaround for this (explicit ((noinline)) attributes when compiled with -DVALGRIND). So, we can re-enable optimization for valgrind builds, making them closer to regular builds. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 57b2544..6c53695 100644 --- a/Makefile +++ b/Makefile @@ -128,7 +128,7 @@ qrap: $(QRAP_SRCS) passt.h valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ getpid gettid kill clock_gettime mmap \ munmap open unlink gettimeofday futex -valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) -DVALGRIND +valgrind: FLAGS += -g -DVALGRIND valgrind: all .PHONY: clean -- 2.41.0