These macros take the end of the array argument implicitly to avoid
programmer mistakes. This guarantees that the input is an array, unlike
snprintf(buf, sizeof(buf), ...);
which is dangerous if the programmer passes a pointer instead of an
array.
These macros are essentially the same as the 2-argument version of
strscpy(), but with a formatted string, and returning a pointer to the
terminating '\0' (or NULL, on error).
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
include/linux/sprintf.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h
index a0dc35574521..8576a543e62c 100644
--- a/include/linux/sprintf.h
+++ b/include/linux/sprintf.h
@@ -4,6 +4,10 @@
#include <linux/compiler_attributes.h>
#include <linux/types.h>
+#include <linux/array_size.h>
+
+#define sprintf_array(a, fmt, ...) sprintf_end(a, ENDOF(a), fmt, ##__VA_ARGS__)
+#define vsprintf_array(a, fmt, ap) vsprintf_end(a, ENDOF(a), fmt, ap)
int num_to_str(char *buf, int size, unsigned long long num, unsigned int width);
--
2.50.0
On Thu, 10 Jul 2025 at 14:31, Alejandro Colomar <alx@kernel.org> wrote: > > These macros are essentially the same as the 2-argument version of > strscpy(), but with a formatted string, and returning a pointer to the > terminating '\0' (or NULL, on error). No. Stop this garbage. You took my suggestion, and then you messed it up. Your version of sprintf_array() is broken. It evaluates 'a' twice. Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the argument. And you did it for no reason I can see. You said that you wanted to return the end of the resulting string, but the fact is, not a single user seems to care, and honestly, I think it would be wrong to care. The size of the result is likely the more useful thing, or you could even make these 'void' or something. But instead you made the macro be dangerous to use. This kind of churn is WRONG. It _looks_ like a cleanup that doesn't change anything, but then it has subtle bugs that will come and bite us later because you did things wrong. I'm NAK'ing all of this. This is BAD. Cleanup patches had better be fundamentally correct, not introduce broken "helpers" that will make for really subtle bugs. Maybe nobody ever ends up having that first argument with a side effect. MAYBE. It's still very very wrong. Linus
Am Donnerstag, dem 10.07.2025 um 14:58 -0700 schrieb Linus Torvalds: > On Thu, 10 Jul 2025 at 14:31, Alejandro Colomar <alx@kernel.org> wrote: > > > > These macros are essentially the same as the 2-argument version of > > strscpy(), but with a formatted string, and returning a pointer to the > > terminating '\0' (or NULL, on error). > > No. > > Stop this garbage. > > You took my suggestion, and then you messed it up. > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > argument. > > And you did it for no reason I can see. You said that you wanted to > return the end of the resulting string, but the fact is, not a single > user seems to care, and honestly, I think it would be wrong to care. > The size of the result is likely the more useful thing, or you could > even make these 'void' or something. > > But instead you made the macro be dangerous to use. > > This kind of churn is WRONG. It _looks_ like a cleanup that doesn't > change anything, but then it has subtle bugs that will come and bite > us later because you did things wrong. > > I'm NAK'ing all of this. This is BAD. Cleanup patches had better be > fundamentally correct, not introduce broken "helpers" that will make > for really subtle bugs. > > Maybe nobody ever ends up having that first argument with a side > effect. MAYBE. It's still very very wrong. > > Linus What I am puzzled about is that - if you revise your string APIs -, you do not directly go for a safe abstraction that combines length and pointer and instead keep using these fragile 80s-style string functions and open-coded pointer and size computations that everybody gets wrong all the time. String handling could also look like this: https://godbolt.org/z/dqGz9b4sM and be completely bounds safe. (Note that those function abort() on allocation failure, but this is an unfinished demo and also not for kernel use. Also I need to rewrite this using string views.) Martin
On Fri, 11 Jul 2025 08:05:38 +0200 Martin Uecker <ma.uecker@gmail.com> wrote: > Am Donnerstag, dem 10.07.2025 um 14:58 -0700 schrieb Linus Torvalds: > > On Thu, 10 Jul 2025 at 14:31, Alejandro Colomar <alx@kernel.org> wrote: > > > > > > These macros are essentially the same as the 2-argument version of > > > strscpy(), but with a formatted string, and returning a pointer to the > > > terminating '\0' (or NULL, on error). > > > > No. > > > > Stop this garbage. > > > > You took my suggestion, and then you messed it up. > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > argument. > > > > And you did it for no reason I can see. You said that you wanted to > > return the end of the resulting string, but the fact is, not a single > > user seems to care, and honestly, I think it would be wrong to care. > > The size of the result is likely the more useful thing, or you could > > even make these 'void' or something. > > > > But instead you made the macro be dangerous to use. > > > > This kind of churn is WRONG. It _looks_ like a cleanup that doesn't > > change anything, but then it has subtle bugs that will come and bite > > us later because you did things wrong. > > > > I'm NAK'ing all of this. This is BAD. Cleanup patches had better be > > fundamentally correct, not introduce broken "helpers" that will make > > for really subtle bugs. > > > > Maybe nobody ever ends up having that first argument with a side > > effect. MAYBE. It's still very very wrong. > > > > Linus > > What I am puzzled about is that - if you revise your string APIs -, > you do not directly go for a safe abstraction that combines length > and pointer and instead keep using these fragile 80s-style string > functions and open-coded pointer and size computations that everybody > gets wrong all the time. > > String handling could also look like this: What does that actually look like behind all the #defines and generics? It it continually doing malloc/free it is pretty much inappropriate for a lot of system/kernel code. David > > > https://godbolt.org/z/dqGz9b4sM > > and be completely bounds safe. > > (Note that those function abort() on allocation failure, but this > is an unfinished demo and also not for kernel use. Also I need to > rewrite this using string views.) > > > Martin > > > >
Am Freitag, dem 11.07.2025 um 18:45 +0100 schrieb David Laight: > On Fri, 11 Jul 2025 08:05:38 +0200 > Martin Uecker <ma.uecker@gmail.com> wrote: > > > Am Donnerstag, dem 10.07.2025 um 14:58 -0700 schrieb Linus Torvalds: > > > On Thu, 10 Jul 2025 at 14:31, Alejandro Colomar <alx@kernel.org> wrote: > > > > > > > > These macros are essentially the same as the 2-argument version of > > > > strscpy(), but with a formatted string, and returning a pointer to the > > > > terminating '\0' (or NULL, on error). > > > > > > No. > > > > > > Stop this garbage. > > > > > > You took my suggestion, and then you messed it up. > > > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > > argument. > > > > > > And you did it for no reason I can see. You said that you wanted to > > > return the end of the resulting string, but the fact is, not a single > > > user seems to care, and honestly, I think it would be wrong to care. > > > The size of the result is likely the more useful thing, or you could > > > even make these 'void' or something. > > > > > > But instead you made the macro be dangerous to use. > > > > > > This kind of churn is WRONG. It _looks_ like a cleanup that doesn't > > > change anything, but then it has subtle bugs that will come and bite > > > us later because you did things wrong. > > > > > > I'm NAK'ing all of this. This is BAD. Cleanup patches had better be > > > fundamentally correct, not introduce broken "helpers" that will make > > > for really subtle bugs. > > > > > > Maybe nobody ever ends up having that first argument with a side > > > effect. MAYBE. It's still very very wrong. > > > > > > Linus > > > > What I am puzzled about is that - if you revise your string APIs -, > > you do not directly go for a safe abstraction that combines length > > and pointer and instead keep using these fragile 80s-style string > > functions and open-coded pointer and size computations that everybody > > gets wrong all the time. > > > > String handling could also look like this: > > What does that actually look like behind all the #defines and generics? > It it continually doing malloc/free it is pretty much inappropriate > for a lot of system/kernel code. The example I linked would allocate behind your back and would clearly not be useful for the kernel also because it would abort() on allocation failure (as I pointed out below). Still, I do not see why similar functions could not work for the kernel. The main point is to keep pointer and length together in a single struct. But it is certainly more difficult to define APIs which make sense for the kernel. I explain a bit how such types work here: https://uecker.codeberg.page/2025-07-02.html https://uecker.codeberg.page/2025-07-09.html Martin > > > > > https://godbolt.org/z/dqGz9b4sM > > > > and be completely bounds safe. > > > > (Note that those function abort() on allocation failure, but this > > is an unfinished demo and also not for kernel use. Also I need to > > rewrite this using string views.) > > > > > > Martin > > > > > > > > >
On Fri, 11 Jul 2025 at 10:45, David Laight <david.laight.linux@gmail.com> wrote: > > What does that actually look like behind all the #defines and generics? > It it continually doing malloc/free it is pretty much inappropriate > for a lot of system/kernel code. Honestly, the kernel approximately *never* has "string handling" in the traditional sense. But we do have "buffers with text". The difference typically exactly being that allocation has to happen separately from any text operation. It's why I already suggested people look at our various existing buffer abstractions: we have several, although they tend to often be somewhat specialized. So, for example, we have things like "struct qstr" for path components: it's specialized not only in having an associated hash value for the string, but because it's a "initialize once" kind of buffer that gets initialized at creation time, and the string contents are constant (it literally contains a "const char *" in addition to the length/hash). That kind of "string buffer" obviously isn't useful for things like the printf family, but we do have others. Like "struct seq_buf", which already has "seq_buf_printf()" helpers. That's the one you probably should use for most kernel "print to buffer", but it has very few users despite not being complicated to use: struct seq_buf s; seq_buf_init(&s, buf, szie); and you're off to the races, and can do things like seq_buf_printf(&s, ....); without ever having to worry about overflows etc. So we already do *have* good interfaces. But they aren't the traditional ones that everybody knows about. Linus
On Fri, Jul 11, 2025 at 10:58:56AM -0700, Linus Torvalds wrote: > struct seq_buf s; > seq_buf_init(&s, buf, szie); And because some folks didn't like this "declaration that requires a function call", we even added: DECLARE_SEQ_BUF(s, 32); to do it in 1 line. :P I would love to see more string handling replaced with seq_buf. -- Kees Cook
Hi Kees, On Mon, Jul 14, 2025 at 10:19:39PM -0700, Kees Cook wrote: > On Fri, Jul 11, 2025 at 10:58:56AM -0700, Linus Torvalds wrote: > > struct seq_buf s; > > seq_buf_init(&s, buf, szie); > > And because some folks didn't like this "declaration that requires a > function call", we even added: > > DECLARE_SEQ_BUF(s, 32); > > to do it in 1 line. :P > > I would love to see more string handling replaced with seq_buf. The thing is, it's not as easy as the fixes I'm proposing, and sprintf_end() solves a lot of UB in a minimal diff that you can dumbly apply. And transitioning from sprintf_end() to seq_buf will still be a possibility --probably even easier, because the code is simpler than with s[c]nprintf()--. Another thing, and this is my opinion, is that I'm not fond of APIs that keep an internal state. With sprintf_end(), the state is minimal and external: the state is the 'p' pointer to where you're going to write. That way, the programmer knows exactly where the writes occur, and can reason about it without having to read the implementation and keep a model of the state in its head. With a struct-based approach, you hide the state inside the structure, which means it's not so easy to reason about how an action will affect the string, at first glance; you need an expert in the API to know how to use it. With sprintf_end(), either one is stupid/careless enough to get the parameters wrong, or the function necessarily works well, *and is simple to fully understand*. And considering that we have ENDOF(), it's hard to understand how one could get it wrong: p = buf; e = ENDOF(buf); p = sprintf_end(p, e, ...); p = sprintf_end(p, e, ...); p = sprintf_end(p, e, ...); p = sprintf_end(p, e, ...); Admittedly, ENDOF() doesn't compile if buf is not an array, so in those cases, there's a chance of a paranoic programmer slapping a -1 just in case, but that doesn't hurt: p = buf; e = buf + size; // Someone might accidentally -1 that? I'm working on extending the _Countof() operator so that it can be applied to array parameters to functions, so that it can be used to count arrays that are not arrays: void f(size_t n, char buf[n]) { p = buf; e = buf + _Countof(buf); // _Countof(buf) will evaluate to n. ... } Which will significantly enhance the usability of sprintf_end(). I want to implement this for GCC next year (there are a few things that need to be improved first to be able to do that), and also propose it for standardization. For a similar comparison of stateful vs stateless functions, there are strtok(3) and strsep(3), which apart from minor differences (strtok(3) collapses adjacent delimiters) are more or less the same. But I'd use strsep(3) over strtok(3), even if just because strtok(3) keeps an internal state, so I always need to be very careful of reading the documentation to remind myself of what happens to the state after each call. strsep(3) is dead simple: you call it, and it updates the pointer you passed; nothing is kept secretly from the programmer. Have a lovely day! Alex -- <https://www.alejandro-colomar.es/>
On Tue, Jul 15, 2025 at 09:08:14AM +0200, Alejandro Colomar wrote: > Hi Kees, > > On Mon, Jul 14, 2025 at 10:19:39PM -0700, Kees Cook wrote: > > On Fri, Jul 11, 2025 at 10:58:56AM -0700, Linus Torvalds wrote: > > > struct seq_buf s; > > > seq_buf_init(&s, buf, szie); > > > > And because some folks didn't like this "declaration that requires a > > function call", we even added: > > > > DECLARE_SEQ_BUF(s, 32); > > > > to do it in 1 line. :P > > > > I would love to see more string handling replaced with seq_buf. > > The thing is, it's not as easy as the fixes I'm proposing, and > sprintf_end() solves a lot of UB in a minimal diff that you can dumbly > apply. Note that I'm not arguing against your idea -- I just think it's not going to be likely to end up in Linux soon given Linus's objections. My perspective is mainly one of pragmatic damage control: what *can* we do in Linux that would make things better? Currently, seq_buf is better than raw C strings... -- Kees Cook
Hi Kees, On Thu, Jul 17, 2025 at 04:47:04PM -0700, Kees Cook wrote: > On Tue, Jul 15, 2025 at 09:08:14AM +0200, Alejandro Colomar wrote: > > Hi Kees, > > > > On Mon, Jul 14, 2025 at 10:19:39PM -0700, Kees Cook wrote: > > > On Fri, Jul 11, 2025 at 10:58:56AM -0700, Linus Torvalds wrote: > > > > struct seq_buf s; > > > > seq_buf_init(&s, buf, szie); > > > > > > And because some folks didn't like this "declaration that requires a > > > function call", we even added: > > > > > > DECLARE_SEQ_BUF(s, 32); > > > > > > to do it in 1 line. :P > > > > > > I would love to see more string handling replaced with seq_buf. > > > > The thing is, it's not as easy as the fixes I'm proposing, and > > sprintf_end() solves a lot of UB in a minimal diff that you can dumbly > > apply. > > Note that I'm not arguing against your idea -- I just think it's not > going to be likely to end up in Linux soon given Linus's objections. It would be interesting to hear if Linus holds his objections on v6. > My > perspective is mainly one of pragmatic damage control: what *can* we do > in Linux that would make things better? Currently, seq_buf is better > than raw C strings... TBH, I'm not fully convinced. While it may look simpler at first glance, I'm worried that it might bite in the details. I default to not trusting APIs that hide the complexity in hidden state. On the other hand, I agree that almost anything is safer than snprintf(3). But one good thing of snprintf(3) is that it's simple, and thus relatively obvious to see that it's wrong, so it's easy to fix (it's easy to transition from snprintf(3) to sprintf_end()). So, maybe keeping it bogus until it's replaced by sprintf_end() is a better approach than using seq_buf. (Unless the current code is found exploitable, but I assume not.) Have a lovely night! Alex -- <https://www.alejandro-colomar.es/>
Am Montag, dem 14.07.2025 um 22:19 -0700 schrieb Kees Cook: > On Fri, Jul 11, 2025 at 10:58:56AM -0700, Linus Torvalds wrote: > > struct seq_buf s; > > seq_buf_init(&s, buf, szie); > > And because some folks didn't like this "declaration that requires a > function call", we even added: > > DECLARE_SEQ_BUF(s, 32); > > to do it in 1 line. :P > > I would love to see more string handling replaced with seq_buf. Why not have? struct seq_buf s = SEQ_BUF(32); So the kernel has safe abstractions, there are just not used enough. Do you also have a string view abstraction? I found this really useful as basic building block for safe string handling, and equally important to a string builder type such as seq_buf. The string builder is for safely construcing new strings, the string view is for safely accessing parts of existing strings. Also what I found really convenient and useful in this context was to have an accessor macro that expose the buffer as a regular array cast to the correct size: *( (char(*)[(x)->N]) (x)->data ) (put into statement expressions to avoid double evaluation) instead of simply returning a char* You can then access the array directly with [] which then can be bounds checked with UBsan, one can measure its length with sizeof, and one can also let it decay and get a char* to pass it to legacy code (and to some degree this can be protected by BDOS). Martin
On Tue, Jul 15, 2025 at 08:24:29AM +0200, Martin Uecker wrote: > Am Montag, dem 14.07.2025 um 22:19 -0700 schrieb Kees Cook: > > On Fri, Jul 11, 2025 at 10:58:56AM -0700, Linus Torvalds wrote: > > > struct seq_buf s; > > > seq_buf_init(&s, buf, szie); > > > > And because some folks didn't like this "declaration that requires a > > function call", we even added: > > > > DECLARE_SEQ_BUF(s, 32); > > > > to do it in 1 line. :P > > > > I would love to see more string handling replaced with seq_buf. > > Why not have? > > struct seq_buf s = SEQ_BUF(32); > > > So the kernel has safe abstractions, there are just not used enough. Yeah, that should be fine. The trouble is encapsulating the actual buffer itself. But things like spinlocks need initialization too, so it's not too unusual to need a constructor for things living in a struct. If the struct had DECLARE which created 2 variables, then an INIT could just reuse the special name... > The string builder is for safely construcing new strings, the > string view is for safely accessing parts of existing strings. seq_buf doesn't currently have a "view" API, just a "make sure the result is NUL terminated, please enjoy this char *" > Also what I found really convenient and useful in this context > was to have an accessor macro that expose the buffer as a > regular array cast to the correct size: > > *( (char(*)[(x)->N]) (x)->data ) > > (put into statement expressions to avoid double evaluation) > > instead of simply returning a char* Yeah, I took a look through your proposed C string library routines. I think it would be pretty nice, but it does feel like it has to go through a lot of hoops when C should have something native. Though to be clear, I'm not saying seq_buf is the answer. :) -- Kees Cook
On Fri, Jul 11, 2025 at 10:58:56AM -0700, Linus Torvalds wrote: > That kind of "string buffer" obviously isn't useful for things like > the printf family, but we do have others. Like "struct seq_buf", which > already has "seq_buf_printf()" helpers. > > That's the one you probably should use for most kernel "print to > buffer", but it has very few users despite not being complicated to > use: > > struct seq_buf s; > seq_buf_init(&s, buf, szie); > > and you're off to the races, and can do things like > > seq_buf_printf(&s, ....); > > without ever having to worry about overflows etc. I actually wanted to go one step further with this (that's why I took readpos out of seq_buf in d0ed46b60396). If you look at the guts of vsprintf.c, it'd be much improved by using seq_buf internally instead of passing around buf and end. Once we've done that, maybe we can strip these annoying %pXYZ out of vsprintf.c and use seq_buf routines like it's a StringBuilder (or whatever other language/library convention you prefer). Anyway, I ran out of time to work on it, but I still think it's worthwhile. And then there'd be a lot more commonality between regular printing and trace printing, which would be nice.
Am Freitag, dem 11.07.2025 um 08:05 +0200 schrieb Martin Uecker: > Am Donnerstag, dem 10.07.2025 um 14:58 -0700 schrieb Linus Torvalds: > > On Thu, 10 Jul 2025 at 14:31, Alejandro Colomar <alx@kernel.org> wrote: > > > > > > These macros are essentially the same as the 2-argument version of > > > strscpy(), but with a formatted string, and returning a pointer to the > > > terminating '\0' (or NULL, on error). > > > > No. > > > > Stop this garbage. > > > > You took my suggestion, and then you messed it up. > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > argument. > > > > And you did it for no reason I can see. You said that you wanted to > > return the end of the resulting string, but the fact is, not a single > > user seems to care, and honestly, I think it would be wrong to care. > > The size of the result is likely the more useful thing, or you could > > even make these 'void' or something. > > > > But instead you made the macro be dangerous to use. > > > > This kind of churn is WRONG. It _looks_ like a cleanup that doesn't > > change anything, but then it has subtle bugs that will come and bite > > us later because you did things wrong. > > > > I'm NAK'ing all of this. This is BAD. Cleanup patches had better be > > fundamentally correct, not introduce broken "helpers" that will make > > for really subtle bugs. > > > > Maybe nobody ever ends up having that first argument with a side > > effect. MAYBE. It's still very very wrong. > > > > Linus > > What I am puzzled about is that - if you revise your string APIs -, > you do not directly go for a safe abstraction that combines length > and pointer and instead keep using these fragile 80s-style string > functions and open-coded pointer and size computations that everybody > gets wrong all the time. > > String handling could also look like this: > > > https://godbolt.org/z/dqGz9b4sM > > and be completely bounds safe. > > (Note that those function abort() on allocation failure, but this > is an unfinished demo and also not for kernel use. Also I need to > rewrite this using string views.) > And *if* you want functions that manipulate buffers, why not pass a pointer to the buffer instead of to its first element to not loose the type information. int foo(size_t s, char (*p)[s]); char buf[10; foo(ARRAY_SIZE(buf), &buf); may look slightly unusual but is a lot safer than int foo(char *buf, size_t len); char buf[10]; foo(buf, ARRAY_SIZE(buf); and - once you are used to it - also more logical because why would you pass a pointer to part of an object to a function that is supposed to work on the complete object. Martin
Hi Linus, [I'll reply to both of your emails at once] On Thu, Jul 10, 2025 at 02:58:24PM -0700, Linus Torvalds wrote: > You took my suggestion, and then you messed it up. > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > argument. An array has no issue being evaluated twice (unless it's a VLA). On the other hand, I agree it's better to not do that in the first place. My bad for forgetting about it. Sorry. On Thu, Jul 10, 2025 at 03:08:29PM -0700, Linus Torvalds wrote: > If you want to return an error on truncation, do it right. Not by > returning NULL, but by actually returning an error. Okay. > For example, in the kernel, we finally fixed 'strcpy()'. After about a > million different versions of 'copy a string' where every single > version was complete garbage, we ended up with 'strscpy()'. Yeah, the > name isn't lovely, but the *use* of it is: I have implemented the same thing in shadow, called strtcpy() (T for truncation). (With the difference that we read the string twice, since we don't care about threads.) I also plan to propose standardization of that one in ISO C. > - it returns the length of the result for people who want it - which > is by far the most common thing people want Agree. > - it returns an actual honest-to-goodness error code if something > overflowed, instead of the absoilutely horrible "source length" of the > string that strlcpy() does and which is fundamentally broken (because > it requires that you walk *past* the end of the source, > Christ-on-a-stick what a broken interface) Agree. > - it can take an array as an argument (without the need for another > name - see my earlier argument about not making up new names by just > having generics) We can't make the same thing with sprintf() variants because they're variadic, so you can't count the number of arguments. And since the 'end' argument is of the same type as the formatted string, we can't do it with _Generic reliably either. > Now, it has nasty naming (exactly the kind of 'add random character' > naming that I was arguing against), and that comes from so many > different broken versions until we hit on something that works. > > strncpy is horrible garbage. strlcpy is even worse. strscpy actually > works and so far hasn't caused issues (there's a 'pad' version for the > very rare situation where you want 'strncpy-like' padding, but it > still guarantees NUL-termination, and still has a good return value). Agree. > Let's agree to *not* make horrible garbage when making up new versions > of sprintf. Agree. I indeed introduced the mistake accidentally in v4, after you complained of having too many functions, as I was introducing not one but two APIs: seprintf() and stprintf(), where seprintf() is what now we're calling sprintf_end(), and stprintf() we could call it sprintf_trunc(). So I did the mistake by trying to reduce the number of functions to just one, which is wrong. So, maybe I should go back to those functions, and just give them good names. What do you think of the following? #define sprintf_array(a, ...) sprintf_trunc(a, ARRAY_SIZE(a), __VA_ARGS__) #define vsprintf_array(a, ap) vsprintf_trunc(a, ARRAY_SIZE(a), ap) char *sprintf_end(char *p, const char end[0], const char *fmt, ...); char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args); int sprintf_trunc(char *buf, size_t size, const char *fmt, ...); int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args); char *sprintf_end(char *p, const char end[0], const char *fmt, ...) { va_list args; va_start(args, fmt); p = vseprintf(p, end, fmt, args); va_end(args); return p; } char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args) { int len; if (unlikely(p == NULL)) return NULL; len = vsprintf_trunc(p, end - p, fmt, args); if (unlikely(len < 0)) return NULL; return p + len; } int sprintf_trunc(char *buf, size_t size, const char *fmt, ...) { va_list args; int len; va_start(args, fmt); len = vstprintf(buf, size, fmt, args); va_end(args); return len; } int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args) { int len; if (WARN_ON_ONCE(size == 0 || size > INT_MAX)) return -EOVERFLOW; len = vsnprintf(buf, size, fmt, args); if (unlikely(len >= size)) return -E2BIG; return len; } sprintf_trunc() is like strscpy(), but with a formatted string. It could replace uses of s[c]nprintf() where there's a single call (no chained calls). sprintf_array() is like the 2-argument version of strscpy(). It could replace s[c]nprintf() calls where there's no chained calls, where the input is an array. sprintf_end() would replace the chained calls. Does this sound good to you? Cheers, Alex -- <https://www.alejandro-colomar.es/>
On Fri, 11 Jul 2025 01:23:49 +0200 Alejandro Colomar <alx@kernel.org> wrote: > Hi Linus, > > [I'll reply to both of your emails at once] > > On Thu, Jul 10, 2025 at 02:58:24PM -0700, Linus Torvalds wrote: > > You took my suggestion, and then you messed it up. > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > argument. > > An array has no issue being evaluated twice (unless it's a VLA). On the > other hand, I agree it's better to not do that in the first place. > My bad for forgetting about it. Sorry. Or a function that returns an array... David > > On Thu, Jul 10, 2025 at 03:08:29PM -0700, Linus Torvalds wrote: > > If you want to return an error on truncation, do it right. Not by > > returning NULL, but by actually returning an error. > > Okay. > > > For example, in the kernel, we finally fixed 'strcpy()'. After about a > > million different versions of 'copy a string' where every single > > version was complete garbage, we ended up with 'strscpy()'. Yeah, the > > name isn't lovely, but the *use* of it is: > > I have implemented the same thing in shadow, called strtcpy() (T for > truncation). (With the difference that we read the string twice, since > we don't care about threads.) > > I also plan to propose standardization of that one in ISO C. > > > - it returns the length of the result for people who want it - which > > is by far the most common thing people want > > Agree. > > > - it returns an actual honest-to-goodness error code if something > > overflowed, instead of the absoilutely horrible "source length" of the > > string that strlcpy() does and which is fundamentally broken (because > > it requires that you walk *past* the end of the source, > > Christ-on-a-stick what a broken interface) > > Agree. > > > - it can take an array as an argument (without the need for another > > name - see my earlier argument about not making up new names by just > > having generics) > > We can't make the same thing with sprintf() variants because they're > variadic, so you can't count the number of arguments. And since the > 'end' argument is of the same type as the formatted string, we can't > do it with _Generic reliably either. > > > Now, it has nasty naming (exactly the kind of 'add random character' > > naming that I was arguing against), and that comes from so many > > different broken versions until we hit on something that works. > > > > strncpy is horrible garbage. strlcpy is even worse. strscpy actually > > works and so far hasn't caused issues (there's a 'pad' version for the > > very rare situation where you want 'strncpy-like' padding, but it > > still guarantees NUL-termination, and still has a good return value). > > Agree. > > > Let's agree to *not* make horrible garbage when making up new versions > > of sprintf. > > Agree. I indeed introduced the mistake accidentally in v4, after you > complained of having too many functions, as I was introducing not one > but two APIs: seprintf() and stprintf(), where seprintf() is what now > we're calling sprintf_end(), and stprintf() we could call it > sprintf_trunc(). So I did the mistake by trying to reduce the number of > functions to just one, which is wrong. > > So, maybe I should go back to those functions, and just give them good > names. > > What do you think of the following? > > #define sprintf_array(a, ...) sprintf_trunc(a, ARRAY_SIZE(a), __VA_ARGS__) > #define vsprintf_array(a, ap) vsprintf_trunc(a, ARRAY_SIZE(a), ap) > > char *sprintf_end(char *p, const char end[0], const char *fmt, ...); > char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args); > int sprintf_trunc(char *buf, size_t size, const char *fmt, ...); > int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args); > > char *sprintf_end(char *p, const char end[0], const char *fmt, ...) > { > va_list args; > > va_start(args, fmt); > p = vseprintf(p, end, fmt, args); > va_end(args); > > return p; > } > > char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args) > { > int len; > > if (unlikely(p == NULL)) > return NULL; > > len = vsprintf_trunc(p, end - p, fmt, args); > if (unlikely(len < 0)) > return NULL; > > return p + len; > } > > int sprintf_trunc(char *buf, size_t size, const char *fmt, ...) > { > va_list args; > int len; > > va_start(args, fmt); > len = vstprintf(buf, size, fmt, args); > va_end(args); > > return len; > } > > int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args) > { > int len; > > if (WARN_ON_ONCE(size == 0 || size > INT_MAX)) > return -EOVERFLOW; > > len = vsnprintf(buf, size, fmt, args); > if (unlikely(len >= size)) > return -E2BIG; > > return len; > } > > sprintf_trunc() is like strscpy(), but with a formatted string. It > could replace uses of s[c]nprintf() where there's a single call (no > chained calls). > > sprintf_array() is like the 2-argument version of strscpy(). It could > replace s[c]nprintf() calls where there's no chained calls, where the > input is an array. > > sprintf_end() would replace the chained calls. > > Does this sound good to you? > > > Cheers, > Alex >
Hi David, On Fri, Jul 11, 2025 at 06:43:43PM +0100, David Laight wrote: > On Fri, 11 Jul 2025 01:23:49 +0200 > Alejandro Colomar <alx@kernel.org> wrote: > > > Hi Linus, > > > > [I'll reply to both of your emails at once] > > > > On Thu, Jul 10, 2025 at 02:58:24PM -0700, Linus Torvalds wrote: > > > You took my suggestion, and then you messed it up. > > > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > > argument. > > > > An array has no issue being evaluated twice (unless it's a VLA). On the > > other hand, I agree it's better to not do that in the first place. > > My bad for forgetting about it. Sorry. > > Or a function that returns an array... Actually, I was forgetting that the array could be gotten from a pointer to array: int (*ap)[42] = ...; ENDOF(ap++); // Evaluates ap++ Anyway, fixed in v6. Cheers, Alex -- <https://www.alejandro-colomar.es/>
On Fri, Jul 11, 2025 at 09:17:28PM +0200, Alejandro Colomar wrote: > Hi David, > > On Fri, Jul 11, 2025 at 06:43:43PM +0100, David Laight wrote: > > On Fri, 11 Jul 2025 01:23:49 +0200 > > Alejandro Colomar <alx@kernel.org> wrote: > > > > > Hi Linus, > > > > > > [I'll reply to both of your emails at once] > > > > > > On Thu, Jul 10, 2025 at 02:58:24PM -0700, Linus Torvalds wrote: > > > > You took my suggestion, and then you messed it up. > > > > > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > > > argument. > > > > > > An array has no issue being evaluated twice (unless it's a VLA). On the > > > other hand, I agree it's better to not do that in the first place. > > > My bad for forgetting about it. Sorry. > > > > Or a function that returns an array... > > Actually, I was forgetting that the array could be gotten from a pointer > to array: > > int (*ap)[42] = ...; > > ENDOF(ap++); // Evaluates ap++ D'oh! That should have been ENDOF(*ap++). > Anyway, fixed in v6. > > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/>
On Fri, Jul 11, 2025 at 01:23:56AM +0200, Alejandro Colomar wrote: > Hi Linus, > > [I'll reply to both of your emails at once] > > On Thu, Jul 10, 2025 at 02:58:24PM -0700, Linus Torvalds wrote: > > You took my suggestion, and then you messed it up. > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > argument. > > An array has no issue being evaluated twice (unless it's a VLA). On the > other hand, I agree it's better to not do that in the first place. > My bad for forgetting about it. Sorry. > > On Thu, Jul 10, 2025 at 03:08:29PM -0700, Linus Torvalds wrote: > > If you want to return an error on truncation, do it right. Not by > > returning NULL, but by actually returning an error. > > Okay. > > > For example, in the kernel, we finally fixed 'strcpy()'. After about a > > million different versions of 'copy a string' where every single > > version was complete garbage, we ended up with 'strscpy()'. Yeah, the > > name isn't lovely, but the *use* of it is: > > I have implemented the same thing in shadow, called strtcpy() (T for > truncation). (With the difference that we read the string twice, since > we don't care about threads.) > > I also plan to propose standardization of that one in ISO C. > > > - it returns the length of the result for people who want it - which > > is by far the most common thing people want > > Agree. > > > - it returns an actual honest-to-goodness error code if something > > overflowed, instead of the absoilutely horrible "source length" of the > > string that strlcpy() does and which is fundamentally broken (because > > it requires that you walk *past* the end of the source, > > Christ-on-a-stick what a broken interface) > > Agree. > > > - it can take an array as an argument (without the need for another > > name - see my earlier argument about not making up new names by just > > having generics) > > We can't make the same thing with sprintf() variants because they're > variadic, so you can't count the number of arguments. And since the > 'end' argument is of the same type as the formatted string, we can't > do it with _Generic reliably either. > > > Now, it has nasty naming (exactly the kind of 'add random character' > > naming that I was arguing against), and that comes from so many > > different broken versions until we hit on something that works. > > > > strncpy is horrible garbage. strlcpy is even worse. strscpy actually > > works and so far hasn't caused issues (there's a 'pad' version for the > > very rare situation where you want 'strncpy-like' padding, but it > > still guarantees NUL-termination, and still has a good return value). > > Agree. > > > Let's agree to *not* make horrible garbage when making up new versions > > of sprintf. > > Agree. I indeed introduced the mistake accidentally in v4, after you > complained of having too many functions, as I was introducing not one > but two APIs: seprintf() and stprintf(), where seprintf() is what now > we're calling sprintf_end(), and stprintf() we could call it > sprintf_trunc(). So I did the mistake by trying to reduce the number of > functions to just one, which is wrong. > > So, maybe I should go back to those functions, and just give them good > names. > > What do you think of the following? > > #define sprintf_array(a, ...) sprintf_trunc(a, ARRAY_SIZE(a), __VA_ARGS__) > #define vsprintf_array(a, ap) vsprintf_trunc(a, ARRAY_SIZE(a), ap) Typo: forgot the fmt argument. > > char *sprintf_end(char *p, const char end[0], const char *fmt, ...); > char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args); > int sprintf_trunc(char *buf, size_t size, const char *fmt, ...); > int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args); > > char *sprintf_end(char *p, const char end[0], const char *fmt, ...) > { > va_list args; > > va_start(args, fmt); > p = vseprintf(p, end, fmt, args); > va_end(args); > > return p; > } > > char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args) > { > int len; > > if (unlikely(p == NULL)) > return NULL; > > len = vsprintf_trunc(p, end - p, fmt, args); > if (unlikely(len < 0)) > return NULL; > > return p + len; > } > > int sprintf_trunc(char *buf, size_t size, const char *fmt, ...) > { > va_list args; > int len; > > va_start(args, fmt); > len = vstprintf(buf, size, fmt, args); > va_end(args); > > return len; > } > > int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args) > { > int len; > > if (WARN_ON_ONCE(size == 0 || size > INT_MAX)) > return -EOVERFLOW; > > len = vsnprintf(buf, size, fmt, args); > if (unlikely(len >= size)) > return -E2BIG; > > return len; > } > > sprintf_trunc() is like strscpy(), but with a formatted string. It > could replace uses of s[c]nprintf() where there's a single call (no > chained calls). > > sprintf_array() is like the 2-argument version of strscpy(). It could > replace s[c]nprintf() calls where there's no chained calls, where the > input is an array. > > sprintf_end() would replace the chained calls. > > Does this sound good to you? > > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/>
On Fri, Jul 11, 2025 at 01:23:56AM +0200, Alejandro Colomar wrote: > Hi Linus, > > [I'll reply to both of your emails at once] > > On Thu, Jul 10, 2025 at 02:58:24PM -0700, Linus Torvalds wrote: > > You took my suggestion, and then you messed it up. > > > > Your version of sprintf_array() is broken. It evaluates 'a' twice. > > Because unlike ARRAY_SIZE(), your broken ENDOF() macro evaluates the > > argument. > > An array has no issue being evaluated twice (unless it's a VLA). On the > other hand, I agree it's better to not do that in the first place. > My bad for forgetting about it. Sorry. > > On Thu, Jul 10, 2025 at 03:08:29PM -0700, Linus Torvalds wrote: > > If you want to return an error on truncation, do it right. Not by > > returning NULL, but by actually returning an error. > > Okay. > > > For example, in the kernel, we finally fixed 'strcpy()'. After about a > > million different versions of 'copy a string' where every single > > version was complete garbage, we ended up with 'strscpy()'. Yeah, the > > name isn't lovely, but the *use* of it is: > > I have implemented the same thing in shadow, called strtcpy() (T for > truncation). (With the difference that we read the string twice, since > we don't care about threads.) > > I also plan to propose standardization of that one in ISO C. > > > - it returns the length of the result for people who want it - which > > is by far the most common thing people want > > Agree. > > > - it returns an actual honest-to-goodness error code if something > > overflowed, instead of the absoilutely horrible "source length" of the > > string that strlcpy() does and which is fundamentally broken (because > > it requires that you walk *past* the end of the source, > > Christ-on-a-stick what a broken interface) > > Agree. > > > - it can take an array as an argument (without the need for another > > name - see my earlier argument about not making up new names by just > > having generics) > > We can't make the same thing with sprintf() variants because they're > variadic, so you can't count the number of arguments. And since the > 'end' argument is of the same type as the formatted string, we can't > do it with _Generic reliably either. > > > Now, it has nasty naming (exactly the kind of 'add random character' > > naming that I was arguing against), and that comes from so many > > different broken versions until we hit on something that works. > > > > strncpy is horrible garbage. strlcpy is even worse. strscpy actually > > works and so far hasn't caused issues (there's a 'pad' version for the > > very rare situation where you want 'strncpy-like' padding, but it > > still guarantees NUL-termination, and still has a good return value). > > Agree. > > > Let's agree to *not* make horrible garbage when making up new versions > > of sprintf. > > Agree. I indeed introduced the mistake accidentally in v4, after you > complained of having too many functions, as I was introducing not one > but two APIs: seprintf() and stprintf(), where seprintf() is what now > we're calling sprintf_end(), and stprintf() we could call it > sprintf_trunc(). So I did the mistake by trying to reduce the number of > functions to just one, which is wrong. > > So, maybe I should go back to those functions, and just give them good > names. > > What do you think of the following? > > #define sprintf_array(a, ...) sprintf_trunc(a, ARRAY_SIZE(a), __VA_ARGS__) > #define vsprintf_array(a, ap) vsprintf_trunc(a, ARRAY_SIZE(a), ap) > > char *sprintf_end(char *p, const char end[0], const char *fmt, ...); > char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args); > int sprintf_trunc(char *buf, size_t size, const char *fmt, ...); > int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args); > > char *sprintf_end(char *p, const char end[0], const char *fmt, ...) > { > va_list args; > > va_start(args, fmt); > p = vseprintf(p, end, fmt, args); Typo here. It's vsprintf_end(). > va_end(args); > > return p; > } > > char *vsprintf_end(char *p, const char end[0], const char *fmt, va_list args) > { > int len; > > if (unlikely(p == NULL)) > return NULL; > > len = vsprintf_trunc(p, end - p, fmt, args); > if (unlikely(len < 0)) > return NULL; > > return p + len; > } > > int sprintf_trunc(char *buf, size_t size, const char *fmt, ...) > { > va_list args; > int len; > > va_start(args, fmt); > len = vstprintf(buf, size, fmt, args); Typo here. It's vsprintf_trunc(). > va_end(args); > > return len; > } > > int vsprintf_trunc(char *buf, size_t size, const char *fmt, va_list args) > { > int len; > > if (WARN_ON_ONCE(size == 0 || size > INT_MAX)) > return -EOVERFLOW; > > len = vsnprintf(buf, size, fmt, args); > if (unlikely(len >= size)) > return -E2BIG; > > return len; > } > > sprintf_trunc() is like strscpy(), but with a formatted string. It > could replace uses of s[c]nprintf() where there's a single call (no > chained calls). > > sprintf_array() is like the 2-argument version of strscpy(). It could > replace s[c]nprintf() calls where there's no chained calls, where the > input is an array. > > sprintf_end() would replace the chained calls. > > Does this sound good to you? > > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/>
© 2016 - 2025 Red Hat, Inc.