[PATCH 0/4] New line reading implementation
As discussed on our call, the current line_read() implementation is pretty ugly and has some definite bugs. This series replaces it with a new implementation which should be more solid and more readable. The new implementation is larger than I'd really like, but it turns out its fiddlier to handle the various edge cases than you might think. David Gibson (4): Add cleaner line-by-line reading primitives Parse resolv.conf with new lineread implementation Use new lineread implementation for procfs_scan_listen() Remove unused line_read() Makefile | 8 ++-- conf.c | 22 +++++++---- lineread.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++++ lineread.h | 23 ++++++++++++ util.c | 64 +++---------------------------- 5 files changed, 155 insertions(+), 70 deletions(-) create mode 100644 lineread.c create mode 100644 lineread.h -- 2.36.1
Two places in passt need to read files line by line (one parsing
resolv.conf, the other parsing /proc/net/*. They can't use fgets()
because in glibc that can allocate memory. Instead they use an
implementation line_read() in util.c. This has some problems:
* It has two completely separate modes of operation, one buffering
and one not, the relation between these and how they're activated
is subtle and confusing
* At least in non-buffered mode, it will mishandle an empty line,
folding them onto the start of the next non-empty line
* In non-buffered mode it will use lseek() which prevents using this
on non-regular files (we don't need that at present, but it's a
surprising limitation)
* It has a lot of difficult to read pointer mangling
Add a new cleaner implementation of allocation-free line-by-line
reading in lineread.c. This one already buffers, using a state
structure to keep track of what we need. This is larger than I'd
like, but it turns out handling all the edge cases of line-by-line
reading in C is surprisingly hard.
This just adds the code, subsequent patches will change the existing
users of line_read() to the new implementation.
Signed-off-by: David Gibson
On Fri, 17 Jun 2022 13:10:23 +1000
David Gibson
Two places in passt need to read files line by line (one parsing resolv.conf, the other parsing /proc/net/*. They can't use fgets() because in glibc that can allocate memory. Instead they use an implementation line_read() in util.c. This has some problems:
* It has two completely separate modes of operation, one buffering and one not, the relation between these and how they're activated is subtle and confusing * At least in non-buffered mode, it will mishandle an empty line, folding them onto the start of the next non-empty line * In non-buffered mode it will use lseek() which prevents using this on non-regular files (we don't need that at present, but it's a surprising limitation) * It has a lot of difficult to read pointer mangling
Add a new cleaner implementation of allocation-free line-by-line reading in lineread.c. This one already buffers, using a state structure to keep track of what we need. This is larger than I'd like, but it turns out handling all the edge cases of line-by-line reading in C is surprisingly hard.
Still much simpler (albeit a bit more verbose) than the original version, thanks. :)
This just adds the code, subsequent patches will change the existing users of line_read() to the new implementation.
Signed-off-by: David Gibson
--- Makefile | 8 ++-- lineread.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++++ lineread.h | 23 ++++++++++++ 3 files changed, 135 insertions(+), 4 deletions(-) create mode 100644 lineread.c create mode 100644 lineread.h diff --git a/Makefile b/Makefile index b0dde68..d059efb 100644 --- a/Makefile +++ b/Makefile @@ -32,16 +32,16 @@ CFLAGS += -DRLIMIT_STACK_VAL=$(RLIMIT_STACK_VAL) CFLAGS += -DARCH=\"$(TARGET_ARCH)\"
PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c icmp.c igmp.c \ - mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \ - tap.c tcp.c tcp_splice.c udp.c util.c + lineread.c mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c \ + siphash.c tap.c tcp.c tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c SRCS = $(PASST_SRCS) $(QRAP_SRCS)
MANPAGES = passt.1 pasta.1 qrap.1
PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h icmp.h \ - ndp.h netlink.h packet.h passt.h pasta.h pcap.h siphash.h \ - tap.h tcp.h tcp_splice.h udp.h util.h + lineread.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h \ + siphash.h tap.h tcp.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS)
# On gcc 11.2, with -O2 and -flto, tcp_hash() and siphash_20b(), if inlined, diff --git a/lineread.c b/lineread.c new file mode 100644 index 0000000..3e263cf --- /dev/null +++ b/lineread.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: AGPL-3.0-or-later + +/* PASST - Plug A Simple Socket Transport + * for qemu/UNIX domain socket mode + * + * PASTA - Pack A Subtle Tap Abstraction + * for network namespace/tap device mode + * + * lineread.c - Allocation free line-by-line buffered file input + * + * Copyright Red Hat + * Author: David Gibson
+ */ + +#include +#include +#include +#include +#include +#include + +#include "lineread.h" + +/** + * lineread_init() - Prepare for line by line file reading without allocation + * @lr: Line reader state structure to initialize + * @fd: File handle to read lines from
I think by "handle" most people refer to "FILE" stream handles, I would drop "handle" here or replace by "descriptor".
+ */ +void lineread_init(struct lineread *lr, int fd) +{ + lr->fd = fd; + lr->next_line = lr->count = 0; +} + +static int peek_line(struct lineread *lr, bool eof)
This lacks a description in kerneldoc style, I would add: /** * peek_line() - Find and NULL-terminate next line in buffer * @lr: Line reader state structure * @eof: Caller indicates end-of-file was already found by read() * * Return: length of line in bytes, -1 if no line was found */ By the way, if you're wondering why you're introducing the first usage of 'bool', I switched to C99 just recently. :)
+{ + char *nl; + + /* Sanity checks (which also document invariants) */ + assert(lr->count >= 0); + assert(lr->next_line >= 0); + assert(lr->next_line + lr->count >= lr->next_line); + assert(lr->next_line + lr->count <= LINEREAD_BUFFER_SIZE); + + nl = memchr(lr->buf + lr->next_line, '\n', lr->count); + + if (nl) { + *nl = '\0'; + return nl - lr->buf - lr->next_line + 1;
clang-tidy complains about the else-after-return here (llvm-else-after-return, readability-else-after-return), and at a second glance me too :) I would drop them if you're fine with it.
+ } else if (eof) { + lr->buf[lr->next_line + lr->count] = '\0'; + /* + * No trailing newline, so treat all remaining bytes + * as the last line + */ + return lr->count; + } else { + return -1; + } +} + +/** + * lineread_get() - Read a single line from file (no allocation) + * @lr: Line reader state structure + * @line: Place a pointer to the next line in this variable + * + * Return: Length of line read on success, 0 on EOF, negative on error + */ +int lineread_get(struct lineread *lr, char **line) +{ + bool eof = false; + int line_len; + + while ((line_len = peek_line(lr, eof)) < 0) { + int rc; + + if ((lr->next_line + lr->count) == LINEREAD_BUFFER_SIZE) { + /* No space at end */ + if (lr->next_line == 0) { + /*
Nit: elsewhere, I used "net" kernel style comments, which are the same as every other area of the Linux kernel except that there's no opening newline: /* Buffer is full, which means we've I would change it here and in the comment above.
+ * Buffer is full, which means we've + * hit a line too long for us to + * process. FIXME: report error + * better + */ + return -1; + } + memmove(lr->buf, lr->buf + lr->next_line, lr->count); + lr->next_line = 0; + } +
Stray tabs here, dropped.
+ /* Read more data into the end of buffer */ + rc = read(lr->fd, lr->buf + lr->next_line + lr->count, + LINEREAD_BUFFER_SIZE - lr->next_line - lr->count); + if (rc < 0) { + return rc; + } else if (rc == 0) { + eof = true; + } else { + lr->count += rc; + } + } + + *line = lr->buf + lr->next_line; + lr->next_line += line_len; + lr->count -= line_len; + return line_len; +} diff --git a/lineread.h b/lineread.h new file mode 100644 index 0000000..972dc51 --- /dev/null +++ b/lineread.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: AGPL-3.0-or-later + * Copyright Red Hat + * Author: David Gibson
+ */ + +#ifndef LINEREAD_H +#define LINEREAD_H + +#define LINEREAD_BUFFER_SIZE 8192 +
I would also stick to kerneldoc style comment here: /** * struct lineread - Line reader state * @fd: File descriptor lines are read from * @next_line: ...
+struct lineread { + int fd; + int next_line; /* start of next unread line in buffer */ + int count; /* number of unreturned bytes in buffer */ + + /* One extra byte for possible trailing \0 */ + char buf[LINEREAD_BUFFER_SIZE+1]; +}; + +void lineread_init(struct lineread *lr, int fd); +int lineread_get(struct lineread *lr, char **line); + +#endif /* _LINEREAD_H */
I can change stuff on merge, let me know. -- Stefano
On Thu, Jun 23, 2022 at 11:31:43AM +0200, Stefano Brivio wrote:
On Fri, 17 Jun 2022 13:10:23 +1000 David Gibson
wrote: Two places in passt need to read files line by line (one parsing resolv.conf, the other parsing /proc/net/*. They can't use fgets() because in glibc that can allocate memory. Instead they use an implementation line_read() in util.c. This has some problems:
* It has two completely separate modes of operation, one buffering and one not, the relation between these and how they're activated is subtle and confusing * At least in non-buffered mode, it will mishandle an empty line, folding them onto the start of the next non-empty line * In non-buffered mode it will use lseek() which prevents using this on non-regular files (we don't need that at present, but it's a surprising limitation) * It has a lot of difficult to read pointer mangling
Add a new cleaner implementation of allocation-free line-by-line reading in lineread.c. This one already buffers, using a state structure to keep track of what we need. This is larger than I'd like, but it turns out handling all the edge cases of line-by-line reading in C is surprisingly hard.
Still much simpler (albeit a bit more verbose) than the original version, thanks. :)
That's encouraging to hear.
This just adds the code, subsequent patches will change the existing users of line_read() to the new implementation.
Signed-off-by: David Gibson
--- Makefile | 8 ++-- lineread.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++++ lineread.h | 23 ++++++++++++ 3 files changed, 135 insertions(+), 4 deletions(-) create mode 100644 lineread.c create mode 100644 lineread.h diff --git a/Makefile b/Makefile index b0dde68..d059efb 100644 --- a/Makefile +++ b/Makefile @@ -32,16 +32,16 @@ CFLAGS += -DRLIMIT_STACK_VAL=$(RLIMIT_STACK_VAL) CFLAGS += -DARCH=\"$(TARGET_ARCH)\"
PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c icmp.c igmp.c \ - mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \ - tap.c tcp.c tcp_splice.c udp.c util.c + lineread.c mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c \ + siphash.c tap.c tcp.c tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c SRCS = $(PASST_SRCS) $(QRAP_SRCS)
MANPAGES = passt.1 pasta.1 qrap.1
PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h icmp.h \ - ndp.h netlink.h packet.h passt.h pasta.h pcap.h siphash.h \ - tap.h tcp.h tcp_splice.h udp.h util.h + lineread.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h \ + siphash.h tap.h tcp.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS)
# On gcc 11.2, with -O2 and -flto, tcp_hash() and siphash_20b(), if inlined, diff --git a/lineread.c b/lineread.c new file mode 100644 index 0000000..3e263cf --- /dev/null +++ b/lineread.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: AGPL-3.0-or-later + +/* PASST - Plug A Simple Socket Transport + * for qemu/UNIX domain socket mode + * + * PASTA - Pack A Subtle Tap Abstraction + * for network namespace/tap device mode + * + * lineread.c - Allocation free line-by-line buffered file input + * + * Copyright Red Hat + * Author: David Gibson
+ */ + +#include +#include +#include +#include +#include +#include + +#include "lineread.h" + +/** + * lineread_init() - Prepare for line by line file reading without allocation + * @lr: Line reader state structure to initialize + * @fd: File handle to read lines from I think by "handle" most people refer to "FILE" stream handles, I would drop "handle" here or replace by "descriptor".
Fair point, adjusted.
+ */ +void lineread_init(struct lineread *lr, int fd) +{ + lr->fd = fd; + lr->next_line = lr->count = 0; +} + +static int peek_line(struct lineread *lr, bool eof)
This lacks a description in kerneldoc style, I would add:
/** * peek_line() - Find and NULL-terminate next line in buffer * @lr: Line reader state structure * @eof: Caller indicates end-of-file was already found by read() * * Return: length of line in bytes, -1 if no line was found */
Ok, I wasn't sure if they were considered necessary for local functions. Added now.
By the way, if you're wondering why you're introducing the first usage of 'bool', I switched to C99 just recently. :)
Heh, ok.
+{ + char *nl; + + /* Sanity checks (which also document invariants) */ + assert(lr->count >= 0); + assert(lr->next_line >= 0); + assert(lr->next_line + lr->count >= lr->next_line); + assert(lr->next_line + lr->count <= LINEREAD_BUFFER_SIZE); + + nl = memchr(lr->buf + lr->next_line, '\n', lr->count); + + if (nl) { + *nl = '\0'; + return nl - lr->buf - lr->next_line + 1;
clang-tidy complains about the else-after-return here (llvm-else-after-return, readability-else-after-return), and at a second glance me too :) I would drop them if you're fine with it.
Fwiw, the reason I used the else style is that these are not "early" returns in the sense that they're not handling error or other exceptional conditions with the bulk of the function's logic to follow in the "else" case. Rather they're the 3 more-or-less co-equal possible conclusions to the function. If this were Rust, I'd drop the explicit 'return's entirely and return the entire if expression. That's only a very weak stylistic preference for me though, so I've changed it to make you and the checkers happy :).
+ } else if (eof) { + lr->buf[lr->next_line + lr->count] = '\0'; + /* + * No trailing newline, so treat all remaining bytes + * as the last line + */ + return lr->count; + } else { + return -1; + } +} + +/** + * lineread_get() - Read a single line from file (no allocation) + * @lr: Line reader state structure + * @line: Place a pointer to the next line in this variable + * + * Return: Length of line read on success, 0 on EOF, negative on error + */ +int lineread_get(struct lineread *lr, char **line) +{ + bool eof = false; + int line_len; + + while ((line_len = peek_line(lr, eof)) < 0) { + int rc; + + if ((lr->next_line + lr->count) == LINEREAD_BUFFER_SIZE) { + /* No space at end */ + if (lr->next_line == 0) { + /*
Nit: elsewhere, I used "net" kernel style comments, which are the same as every other area of the Linux kernel except that there's no opening newline:
/* Buffer is full, which means we've
I would change it here and in the comment above.
Fair enough, done.
+ * Buffer is full, which means we've + * hit a line too long for us to + * process. FIXME: report error + * better + */ + return -1; + } + memmove(lr->buf, lr->buf + lr->next_line, lr->count); + lr->next_line = 0; + } +
Stray tabs here, dropped.
Oops. Usually I notice that when I commit.
+ /* Read more data into the end of buffer */ + rc = read(lr->fd, lr->buf + lr->next_line + lr->count, + LINEREAD_BUFFER_SIZE - lr->next_line - lr->count); + if (rc < 0) { + return rc; + } else if (rc == 0) { + eof = true; + } else { + lr->count += rc; + } + } + + *line = lr->buf + lr->next_line; + lr->next_line += line_len; + lr->count -= line_len; + return line_len; +} diff --git a/lineread.h b/lineread.h new file mode 100644 index 0000000..972dc51 --- /dev/null +++ b/lineread.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: AGPL-3.0-or-later + * Copyright Red Hat + * Author: David Gibson
+ */ + +#ifndef LINEREAD_H +#define LINEREAD_H + +#define LINEREAD_BUFFER_SIZE 8192 + I would also stick to kerneldoc style comment here:
/** * struct lineread - Line reader state * @fd: File descriptor lines are read from * @next_line: ...
Makes sense.
+struct lineread { + int fd; + int next_line; /* start of next unread line in buffer */ + int count; /* number of unreturned bytes in buffer */ + + /* One extra byte for possible trailing \0 */ + char buf[LINEREAD_BUFFER_SIZE+1]; +}; + +void lineread_init(struct lineread *lr, int fd); +int lineread_get(struct lineread *lr, char **line); + +#endif /* _LINEREAD_H */
I can change stuff on merge, let me know.
Well, I just made the changes, so I'll send a v2 shortly. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Switch the resolv.conf parsing in conf.c to use the new lineread
implementation. This means that it can now handle a resolv.conf file which
contains blank lines.
There are quite a few other fragilities with the resolv.conf parsing, but
that's out of scope for this patch.
Signed-off-by: David Gibson
Use the new more solid implementation of line by line reading for
procfs_scan_listen().
Signed-off-by: David Gibson
The old, ugly implementation of line_read() is no longer used. Remove it.
Signed-off-by: David Gibson
participants (2)
-
David Gibson
-
Stefano Brivio