On Wed, Sep 04, 2024 at 07:19:22PM +0200, Stefano Brivio wrote:
On Wed, 4 Sep 2024 13:17:53 +1000 David Gibson
wrote: On Tue, Sep 03, 2024 at 09:25:54PM +0200, Stefano Brivio wrote:
On Tue, 3 Sep 2024 22:02:29 +1000 David Gibson
wrote: This is a draft patch working towards adding EPOLLOUT handling to the tap code, which could then be used to "unstick" flows which have unsent data from the socket side. For now that's just a stub, but makes what I think are some worthwhile cleanups to the tap side event handling in the meantime.
Except for the issue in 3/6 and nits elsewhere, it all makes sense and tap-side EPOLLOUT handling is definitely going to be an improvement.
I wonder if it's the right moment for this kind of series, though, in terms of future bisections, as long as we're grappling with https://github.com/containers/podman/issues/23686 and https://bugs.passt.top/show_bug.cgi?id=94. Assuming, of course, that this series doesn't fix anything.
I don't think this series will fix anything as it stands. It is, indirectly, aimed at addressing bug 94. I'm struggling to figure out what to do with bug 94, because I find it almost impossible to reason about the current event masks in TCP.
I don't see at the moment anything indicating TCP issues other than the one you addressed with your tentative debug patch at:
https://passt.top/passt/commit/?h=podman23686&id=026fb71d1dde60135d95741552906fd5320384bc
Given that, with that patch, we had at least another report of event storms, this time on UDP, that is, the one from:
https://github.com/containers/podman/issues/23686#issuecomment-2324945010
I shared this other one on top:
https://passt.top/passt/commit/?h=podman23686&id=0c6c20dee5c24bd324834a99f409ad43c50812ae
Ah, nice.
I'd really like to simplify them so it's clearer what's correct and not and I think the most obvious path to doing so is using EPOLLET all the time. That requires some sort of kick when the tap is ready to accept more data, hence this series as a prerequisite.
Sure, it's going to be simpler and more robust, but on the other hand we wouldn't notice these kind of issues.
Uh.. I'm confused. In what way would we not notice issues, other than the issues not existing which.. would be good, right?
That is, once/if we come up with fixes for those, as they might involve setting different event masks, I'd rather have those in *before* this series, to avoid further noise in case we manage to break something else with those hypothetical fixes.
Right, I understand the impetus. Although as I said I find the current TCP event handling nigh-incomprehensible so I'm not as yet confident we can find a small fix without cleaning up the event handling more generally.
I'm not sure either, but I don't think we have any indication, at the moment, that any of the issues from those two tickets have anything to do with TCP event handling (minus the one you tentatively fixed).
Right, this reasoning is pretty much specific to the EPOLLRDHUP storm. I may have written some of the descriptions before registering that the EPOLLERR storm was UDP and therefore unrelated.
That said, these changes to tap side event handling are a prerequisite / preliminary and shouldn't as yet really alter the TCP event flow. So I don't think this series will of itself make bisection harder, although follow on things based on it might.
I understand that they shouldn't alter it, but if we missed something subtle and they actually do, they'll make bisection more complicated.
I guess. Seems pretty unlikely to me given this doesn't touch the TCP events themselves.
If this series is only needed for switching TCP sockets to EPOLLET (well, minus 4/6, which is a fix on its own), maybe we could wait until you have the whole thing ready (and, hopefully, we manage to fix those two tickets meanwhile)?
Right, I'm ok to wait on this until I have the whole picture including TCP event masks as well. That's kind of why it's an RFC. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson