The current code assumes that we'll get one event per read() on inotify descriptors, but that's not the case, not from documentation, and not from reports. Add loops in the two inotify handlers we have, in pasta-specific code and passt-repair, to go through all the events we receive. While in pasta the the inotify watch is in an epoll list, and we'll get epoll wakeups as long as there's data to read, in passt-repair that's not the case. So, in pasta we can simply size the buffer for a single event and try to read one, but in passt-repair, we'll need to size the buffer to a safer, reasonable amount of events. Link: https://bugs.passt.top/show_bug.cgi?id=119 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- I'm posting this as RFC because, while it seems to do the job and I tested all the code paths, Coverity isn't amused by the fact that we assume that inotify 'name' attributes (ev->name) are NULL-terminated. They actually are, but the code is not very robust. Addressing that is kind of trivial but I keep getting it wrong, so I'll start posting this and fix that up later (direct fixes/edits of this patch are also welcome, of course). passt-repair.c | 26 +++++++++++++++++++------- pasta.c | 20 +++++++++++++------- 2 files changed, 32 insertions(+), 14 deletions(-) diff --git a/passt-repair.c b/passt-repair.c index 120f7aa..86581c6 100644 --- a/passt-repair.c +++ b/passt-repair.c @@ -111,14 +111,14 @@ int main(int argc, char **argv) } if ((sb.st_mode & S_IFMT) == S_IFDIR) { - char buf[sizeof(struct inotify_event) + NAME_MAX + 1]; + char buf[(sizeof(struct inotify_event) + NAME_MAX + 1) * 128] + __attribute__ ((aligned(__alignof__(struct inotify_event)))); const struct inotify_event *ev; char path[PATH_MAX + 1]; + bool found = false; ssize_t n; int fd; - ev = (struct inotify_event *)buf; - if ((fd = inotify_init1(IN_CLOEXEC)) < 0) { fprintf(stderr, "inotify_init1: %i\n", errno); _exit(1); @@ -130,6 +130,8 @@ int main(int argc, char **argv) } do { + char *p; + n = read(fd, buf, sizeof(buf)); if (n < 0) { fprintf(stderr, "inotify read: %i", errno); @@ -138,11 +140,21 @@ int main(int argc, char **argv) if (n < (ssize_t)sizeof(*ev)) { fprintf(stderr, "Short inotify read: %zi", n); - _exit(1); + continue; + } + + for (p = buf; p < buf + n; p += sizeof(*ev) + ev->len) { + ev = (const struct inotify_event *)p; + + if (ev->len >= REPAIR_EXT_LEN && + !memcmp(ev->name + strlen(ev->name) - + REPAIR_EXT_LEN, + REPAIR_EXT, REPAIR_EXT_LEN)) { + found = true; + break; + } } - } while (ev->len < REPAIR_EXT_LEN || - memcmp(ev->name + strlen(ev->name) - REPAIR_EXT_LEN, - REPAIR_EXT, REPAIR_EXT_LEN)); + } while (!found); snprintf(path, sizeof(path), "%s/%s", argv[1], ev->name); if ((stat(path, &sb))) { diff --git a/pasta.c b/pasta.c index fa3e7de..017fa32 100644 --- a/pasta.c +++ b/pasta.c @@ -498,17 +498,23 @@ void pasta_netns_quit_init(const struct ctx *c) */ void pasta_netns_quit_inotify_handler(struct ctx *c, int inotify_fd) { - char buf[sizeof(struct inotify_event) + NAME_MAX + 1]; - const struct inotify_event *in_ev = (struct inotify_event *)buf; + char buf[sizeof(struct inotify_event) + NAME_MAX + 1] + __attribute__ ((aligned(__alignof__(struct inotify_event)))); + const struct inotify_event *ev; + ssize_t n; + char *p; - if (read(inotify_fd, buf, sizeof(buf)) < (ssize_t)sizeof(*in_ev)) + if ((n = read(inotify_fd, buf, sizeof(buf))) < (ssize_t)sizeof(*ev)) return; - if (strncmp(in_ev->name, c->netns_base, sizeof(c->netns_base))) - return; + for (p = buf; p < buf + n; p += sizeof(*ev) + ev->len) { + ev = (const struct inotify_event *)p; - info("Namespace %s is gone, exiting", c->netns_base); - _exit(EXIT_SUCCESS); + if (!strncmp(ev->name, c->netns_base, sizeof(c->netns_base))) { + info("Namespace %s is gone, exiting", c->netns_base); + _exit(EXIT_SUCCESS); + } + } } /** -- 2.43.0
On Thu, Mar 27, 2025 at 10:28:22PM +0100, Stefano Brivio wrote:The current code assumes that we'll get one event per read() on inotify descriptors, but that's not the case, not from documentation, and not from reports. Add loops in the two inotify handlers we have, in pasta-specific code and passt-repair, to go through all the events we receive. While in pasta the the inotify watch is in an epoll list, and we'll"the the"get epoll wakeups as long as there's data to read, in passt-repair that's not the case. So, in pasta we can simply size the buffer for a single event and try to read one, but in passt-repair, we'll need to size the buffer to a safer, reasonable amount of events.I'm not following the reasoning here. In passt-repair we're just looping on the read() until we find what we want, so we'll still eventually get what we want. AFAICT it's a stream-like interface, so we won't lose events just because we didn't get them in the first read().Link: https://bugs.passt.top/show_bug.cgi?id=119 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Otherwise, LGTM.--- I'm posting this as RFC because, while it seems to do the job and I tested all the code paths, Coverity isn't amused by the fact that we assume that inotify 'name' attributes (ev->name) are NULL-terminated. They actually are, but the code is not very robust. Addressing that is kind of trivial but I keep getting it wrong, so I'll start posting this and fix that up later (direct fixes/edits of this patch are also welcome, of course).Ok, I might try to polish this up todaypasst-repair.c | 26 +++++++++++++++++++------- pasta.c | 20 +++++++++++++------- 2 files changed, 32 insertions(+), 14 deletions(-) diff --git a/passt-repair.c b/passt-repair.c index 120f7aa..86581c6 100644 --- a/passt-repair.c +++ b/passt-repair.c @@ -111,14 +111,14 @@ int main(int argc, char **argv) } if ((sb.st_mode & S_IFMT) == S_IFDIR) { - char buf[sizeof(struct inotify_event) + NAME_MAX + 1]; + char buf[(sizeof(struct inotify_event) + NAME_MAX + 1) * 128] + __attribute__ ((aligned(__alignof__(struct inotify_event)))); const struct inotify_event *ev; char path[PATH_MAX + 1]; + bool found = false; ssize_t n; int fd; - ev = (struct inotify_event *)buf; - if ((fd = inotify_init1(IN_CLOEXEC)) < 0) { fprintf(stderr, "inotify_init1: %i\n", errno); _exit(1); @@ -130,6 +130,8 @@ int main(int argc, char **argv) } do { + char *p; + n = read(fd, buf, sizeof(buf)); if (n < 0) { fprintf(stderr, "inotify read: %i", errno); @@ -138,11 +140,21 @@ int main(int argc, char **argv) if (n < (ssize_t)sizeof(*ev)) { fprintf(stderr, "Short inotify read: %zi", n); - _exit(1); + continue; + } + + for (p = buf; p < buf + n; p += sizeof(*ev) + ev->len) { + ev = (const struct inotify_event *)p; + + if (ev->len >= REPAIR_EXT_LEN && + !memcmp(ev->name + strlen(ev->name) - + REPAIR_EXT_LEN, + REPAIR_EXT, REPAIR_EXT_LEN)) { + found = true; + break; + } } - } while (ev->len < REPAIR_EXT_LEN || - memcmp(ev->name + strlen(ev->name) - REPAIR_EXT_LEN, - REPAIR_EXT, REPAIR_EXT_LEN)); + } while (!found); snprintf(path, sizeof(path), "%s/%s", argv[1], ev->name); if ((stat(path, &sb))) { diff --git a/pasta.c b/pasta.c index fa3e7de..017fa32 100644 --- a/pasta.c +++ b/pasta.c @@ -498,17 +498,23 @@ void pasta_netns_quit_init(const struct ctx *c) */ void pasta_netns_quit_inotify_handler(struct ctx *c, int inotify_fd) { - char buf[sizeof(struct inotify_event) + NAME_MAX + 1]; - const struct inotify_event *in_ev = (struct inotify_event *)buf; + char buf[sizeof(struct inotify_event) + NAME_MAX + 1] + __attribute__ ((aligned(__alignof__(struct inotify_event)))); + const struct inotify_event *ev; + ssize_t n; + char *p; - if (read(inotify_fd, buf, sizeof(buf)) < (ssize_t)sizeof(*in_ev)) + if ((n = read(inotify_fd, buf, sizeof(buf))) < (ssize_t)sizeof(*ev)) return; - if (strncmp(in_ev->name, c->netns_base, sizeof(c->netns_base))) - return; + for (p = buf; p < buf + n; p += sizeof(*ev) + ev->len) { + ev = (const struct inotify_event *)p; - info("Namespace %s is gone, exiting", c->netns_base); - _exit(EXIT_SUCCESS); + if (!strncmp(ev->name, c->netns_base, sizeof(c->netns_base))) { + info("Namespace %s is gone, exiting", c->netns_base); + _exit(EXIT_SUCCESS); + } + } } /**-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, 28 Mar 2025 10:36:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Mar 27, 2025 at 10:28:22PM +0100, Stefano Brivio wrote:Under the assumption that message (event) boundaries are preserved across read() calls, yes. And... this assumption actually holds (I had forgotten about that), so expanding the buffer as I did makes no sense, because by reading sizeof(struct inotify_event) + NAME_MAX + 1) bytes we always read one or more events, but never a partial buffer. So, yes, never mind.The current code assumes that we'll get one event per read() on inotify descriptors, but that's not the case, not from documentation, and not from reports. Add loops in the two inotify handlers we have, in pasta-specific code and passt-repair, to go through all the events we receive. While in pasta the the inotify watch is in an epoll list, and we'll"the the"get epoll wakeups as long as there's data to read, in passt-repair that's not the case. So, in pasta we can simply size the buffer for a single event and try to read one, but in passt-repair, we'll need to size the buffer to a safer, reasonable amount of events.I'm not following the reasoning here. In passt-repair we're just looping on the read() until we find what we want, so we'll still eventually get what we want. AFAICT it's a stream-like interface, so we won't lose events just because we didn't get them in the first read().Thanks! -- StefanoLink: https://bugs.passt.top/show_bug.cgi?id=119 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Otherwise, LGTM.--- I'm posting this as RFC because, while it seems to do the job and I tested all the code paths, Coverity isn't amused by the fact that we assume that inotify 'name' attributes (ev->name) are NULL-terminated. They actually are, but the code is not very robust. Addressing that is kind of trivial but I keep getting it wrong, so I'll start posting this and fix that up later (direct fixes/edits of this patch are also welcome, of course).Ok, I might try to polish this up today
On Fri, Mar 28, 2025 at 10:23:40AM +0100, Stefano Brivio wrote:On Fri, 28 Mar 2025 10:36:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Right. My observation was that was the case, and inotify(7) also seems to state so outright: | Each successful read(2) returns a buffer containing one or more of the following structures: | | struct inotify_event { | ... | };On Thu, Mar 27, 2025 at 10:28:22PM +0100, Stefano Brivio wrote:Under the assumption that message (event) boundaries are preserved across read() calls, yes.The current code assumes that we'll get one event per read() on inotify descriptors, but that's not the case, not from documentation, and not from reports. Add loops in the two inotify handlers we have, in pasta-specific code and passt-repair, to go through all the events we receive. While in pasta the the inotify watch is in an epoll list, and we'll"the the"get epoll wakeups as long as there's data to read, in passt-repair that's not the case. So, in pasta we can simply size the buffer for a single event and try to read one, but in passt-repair, we'll need to size the buffer to a safer, reasonable amount of events.I'm not following the reasoning here. In passt-repair we're just looping on the read() until we find what we want, so we'll still eventually get what we want. AFAICT it's a stream-like interface, so we won't lose events just because we didn't get them in the first read().And... this assumption actually holds (I had forgotten about that), so expanding the buffer as I did makes no sense, because by reading sizeof(struct inotify_event) + NAME_MAX + 1) bytes we always read one or more events, but never a partial buffer. So, yes, never mind.-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonThanks!Link: https://bugs.passt.top/show_bug.cgi?id=119 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Otherwise, LGTM.--- I'm posting this as RFC because, while it seems to do the job and I tested all the code paths, Coverity isn't amused by the fact that we assume that inotify 'name' attributes (ev->name) are NULL-terminated. They actually are, but the code is not very robust. Addressing that is kind of trivial but I keep getting it wrong, so I'll start posting this and fix that up later (direct fixes/edits of this patch are also welcome, of course).Ok, I might try to polish this up today