[RFC v1 0/3] Add and use seprintf() instead of less ergonomic APIs

Alejandro Colomar posted 3 patches 3 months ago
There is a newer version of this series
include/linux/stackdepot.h |  13 +++++
include/linux/stacktrace.h |   3 +
kernel/stacktrace.c        |  28 ++++++++++
lib/stackdepot.c           |  12 ++++
lib/vsprintf.c             | 109 +++++++++++++++++++++++++++++++++++++
mm/kfence/kfence_test.c    |  24 ++++----
mm/kmsan/kmsan_test.c      |   4 +-
mm/mempolicy.c             |  18 +++---
mm/page_owner.c            |  32 ++++++-----
mm/slub.c                  |   5 +-
10 files changed, 208 insertions(+), 40 deletions(-)
[RFC v1 0/3] Add and use seprintf() instead of less ergonomic APIs
Posted by Alejandro Colomar 3 months ago
Hi Kees,

As I anticipated in private, here's an API that we're using in the
shadow project.  I've added it in the kernel, and started replacing some
existing calls to s*printf() calls, and it was a bug mine.

I haven't even built the code yet.  I present it for discussion only at
the moment.  (Thus, RFC, not PATCH.)  Also, I've used ==NULL style for
null checks, to be more explicit, even if that's against the coding
style.  I'll change that for the actual patches, but for now during
discussion, I prefer having the explicit tests for my own readability.

The improvement seems quite obvious.  Please let me know your opinion.
I also have a few questions for the maintainers of the specific code, or
at least for someone who deeply understands it, as I found some
questionable code.  (See the individual commit messages, and code
comments for those.)

On top of that, I have a question about the functions I'm adding,
and the existing kernel snprintf(3): The standard snprintf(3)
can fail (return -1), but the kernel one doesn't seem to return <0 ever.
Should I assume that snprintf(3) doesn't fail here?  (I have a comment
in the source code of the implementation asking for that.)

What do you think?

Alejandro Colomar (3):
  vsprintf: Add [v]seprintf(), [v]stprintf()
  stacktrace, stackdepot: Add seprintf()-like variants of functions
  mm: Use seprintf() instead of less ergonomic APIs

 include/linux/stackdepot.h |  13 +++++
 include/linux/stacktrace.h |   3 +
 kernel/stacktrace.c        |  28 ++++++++++
 lib/stackdepot.c           |  12 ++++
 lib/vsprintf.c             | 109 +++++++++++++++++++++++++++++++++++++
 mm/kfence/kfence_test.c    |  24 ++++----
 mm/kmsan/kmsan_test.c      |   4 +-
 mm/mempolicy.c             |  18 +++---
 mm/page_owner.c            |  32 ++++++-----
 mm/slub.c                  |   5 +-
 10 files changed, 208 insertions(+), 40 deletions(-)

Range-diff against v0:
-:  ------------ > 1:  2d20eaf1752e vsprintf: Add [v]seprintf(), [v]stprintf()
-:  ------------ > 2:  ec2e375c2d1e stacktrace, stackdepot: Add seprintf()-like variants of functions
-:  ------------ > 3:  be193e1856aa mm: Use seprintf() instead of less ergonomic APIs
-- 
2.50.0
Re: [RFC v1 0/3] Add and use seprintf() instead of less ergonomic APIs
Posted by Rasmus Villemoes 3 months ago
On Sat, Jul 05 2025, Alejandro Colomar <alx@kernel.org> wrote:

> On top of that, I have a question about the functions I'm adding,
> and the existing kernel snprintf(3): The standard snprintf(3)
> can fail (return -1), but the kernel one doesn't seem to return <0 ever.
> Should I assume that snprintf(3) doesn't fail here?

Yes. Just because the standard says it may return an error, as a QoI
thing the kernel's implementation never fails. That also means that we
do not ever do memory allocation or similar in the guts of vsnsprintf
(that would anyway be a mine field of locking bugs).

If we hit some invalid or unsupported format specifier (i.e. a bug in
the caller), we return early, but still report what we wrote until
hitting that.

Rasmus
Re: [RFC v1 0/3] Add and use seprintf() instead of less ergonomic APIs
Posted by Alejandro Colomar 3 months ago
Hi Rasmus,

On Tue, Jul 08, 2025 at 08:43:57AM +0200, Rasmus Villemoes wrote:
> On Sat, Jul 05 2025, Alejandro Colomar <alx@kernel.org> wrote:
> 
> > On top of that, I have a question about the functions I'm adding,
> > and the existing kernel snprintf(3): The standard snprintf(3)
> > can fail (return -1), but the kernel one doesn't seem to return <0 ever.
> > Should I assume that snprintf(3) doesn't fail here?
> 
> Yes. Just because the standard says it may return an error, as a QoI
> thing the kernel's implementation never fails. That also means that we
> do not ever do memory allocation or similar in the guts of vsnsprintf
> (that would anyway be a mine field of locking bugs).

All of that sounds reasonable.

> If we hit some invalid or unsupported format specifier (i.e. a bug in
> the caller), we return early, but still report what we wrote until
> hitting that.

However, there's the early return due to size>INT_MAX || size==0, which
results in no string at all, and there's not an error code for this.
A user might think that the string is reliable after a vsprintf(3) call,
as it returned 0 --as if it had written ""--, but it didn't write
anything.

I would have returned -EOVERFLOW in that case.

I think something similar is true of strscpy(): it returns -E2BIG on
size==0 || size>INT_MAX but it should be a different error code, as
there's no string at all.

I'll propose something very close to strscpy() for standardization, but
the behavior for size==0 will either be undefined, or errno will be
EOVERFLOW.


Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>
Re: [RFC v1 0/3] Add and use seprintf() instead of less ergonomic APIs
Posted by Rasmus Villemoes 3 months ago
On Tue, Jul 08 2025, Alejandro Colomar <alx@kernel.org> wrote:

> Hi Rasmus,
>
> On Tue, Jul 08, 2025 at 08:43:57AM +0200, Rasmus Villemoes wrote:
>> On Sat, Jul 05 2025, Alejandro Colomar <alx@kernel.org> wrote:
>> 
>> > On top of that, I have a question about the functions I'm adding,
>> > and the existing kernel snprintf(3): The standard snprintf(3)
>> > can fail (return -1), but the kernel one doesn't seem to return <0 ever.
>> > Should I assume that snprintf(3) doesn't fail here?
>> 
>> Yes. Just because the standard says it may return an error, as a QoI
>> thing the kernel's implementation never fails. That also means that we
>> do not ever do memory allocation or similar in the guts of vsnsprintf
>> (that would anyway be a mine field of locking bugs).
>
> All of that sounds reasonable.
>
>> If we hit some invalid or unsupported format specifier (i.e. a bug in
>> the caller), we return early, but still report what we wrote until
>> hitting that.
>
> However, there's the early return due to size>INT_MAX || size==0,
> which

First of all, there's no early return for size==0, that's absolutely
supported and the standard way for the caller to figure out how much to
allocate before redoing the formatting - as userspace asprintf() and
kernel kasprintf() does. And one of the primary reasons for me to write
the kernel's printf test suite in the first place, as a number of the %p
extensions weren't conforming to that requirement.

> results in no string at all, and there's not an error code for this.
> A user might think that the string is reliable after a vsprintf(3) call,
> as it returned 0 --as if it had written ""--, but it didn't write
> anything.

No, because when passed invalid/bogus input we cannot trust that we can
write anything at all to the buffer. We don't return a negative value,
true, but it's not exactly silent - there's a WARN_ON to help find such
bogus callers.

So no, there's "no string at all", but nothing vsnprint() could do in
that situation could help - there's a bug in the caller, we point it out
loudly. Returning -Ewhatever would not remove that bug and would only
make a difference if the caller checked for that.

We don't want to force everybody to check the return value of snprintf()
for errors, and having an interface that says "you have to check for
errors if your code might be buggy", well...

In fact, returning -Ewhatever is more likely to make the problem worse;
the caller mismanages buffer/size computations, so probably he's likely
to just be adding the return value to some size_t or char* variable,
making a subsequent use of that variable point to some completely
out-of-bounds memory.

Rasmus
Re: [RFC v1 0/3] Add and use seprintf() instead of less ergonomic APIs
Posted by Alejandro Colomar 3 months ago
Hi Rasmus,

On Tue, Jul 08, 2025 at 03:51:10PM +0200, Rasmus Villemoes wrote:
> > However, there's the early return due to size>INT_MAX || size==0,
> > which
> 
> First of all, there's no early return for size==0, that's absolutely
> supported and the standard way for the caller to figure out how much to
> allocate before redoing the formatting - as userspace asprintf() and
> kernel kasprintf() does. And one of the primary reasons for me to write
> the kernel's printf test suite in the first place, as a number of the %p
> extensions weren't conforming to that requirement.

Yup, sorry, I was talking from memory, and forgot about the size==0.
I've introduced the check of size==0 for seprintf(), but snprintf(3) is
okay with it.  My bad.  The issue with INT_MAX holds.

> > results in no string at all, and there's not an error code for this.
> > A user might think that the string is reliable after a vsprintf(3) call,
> > as it returned 0 --as if it had written ""--, but it didn't write
> > anything.
> 
> No, because when passed invalid/bogus input we cannot trust that we can
> write anything at all to the buffer. We don't return a negative value,
> true, but it's not exactly silent - there's a WARN_ON to help find such
> bogus callers.

Yup, I know.  It's silent to the caller, I meant.

> So no, there's "no string at all", but nothing vsnprint() could do in
> that situation could help - there's a bug in the caller, we point it out
> loudly. Returning -Ewhatever would not remove that bug and would only
> make a difference if the caller checked for that.
> 
> We don't want to force everybody to check the return value of snprintf()
> for errors, and having an interface that says "you have to check for
> errors if your code might be buggy", well...
> 
> In fact, returning -Ewhatever is more likely to make the problem worse;
> the caller mismanages buffer/size computations, so probably he's likely
> to just be adding the return value to some size_t or char* variable,
> making a subsequent use of that variable point to some completely
> out-of-bounds memory.

That's why seprintf() controls that addition, and gives a pointer
directly to the user, which doesn't need to add anything.  I think this
is easier to handle.  There, I can report NULL for bad input, instead of
adding 0.


Have a lovely day!
Alex

> 
> Rasmus

-- 
<https://www.alejandro-colomar.es/>