[v3] RE: [PATCH v3 0/4] Make sscanf() stricter

RE: [PATCH v3 0/4] Make sscanf() stricter

Posted by David Laight 2 years, 7 months ago

From: Demi Marie Obenour
> Sent: 13 June 2023 16:35
> 
> On Tue, Jun 13, 2023 at 01:02:59PM +0000, David Laight wrote:
> > From: Demi Marie Obenour
> > > Sent: 12 June 2023 22:23
> > ....
> > > sscanf(), except to the extent that -Werror=format can keep working.
> > > Userspace sscanf() is almost useless: it has undefined behavior on
> > > integer overflow and swallows spaces that should usually be rejected.
> >
> > scanf() is designed for parsing space separated data.
> > Eating spaces it part of its job description.
> >
> > 	David
> 
> In this case I would prefer to have two versions: one that eats spaces
> and one that does not.  For instance, I don’t think any user of
> xenbus_scanf() wants the space-swallowing behavior.  This can be worked
> around in xenbus_scanf(), of course, by having it reject strings with
> spaces (as determened by isspace()) before calling vsscanf().

What sort of formats and data are being used?
The "%s" format terminates on whitespace.
Even stroul() (and friends) will skip leading whitespace.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Demi Marie Obenour 2 years, 7 months ago

On Wed, Jun 14, 2023 at 08:23:56AM +0000, David Laight wrote:
> From: Demi Marie Obenour
> > Sent: 13 June 2023 16:35
> > 
> > On Tue, Jun 13, 2023 at 01:02:59PM +0000, David Laight wrote:
> > > From: Demi Marie Obenour
> > > > Sent: 12 June 2023 22:23
> > > ....
> > > > sscanf(), except to the extent that -Werror=format can keep working.
> > > > Userspace sscanf() is almost useless: it has undefined behavior on
> > > > integer overflow and swallows spaces that should usually be rejected.
> > >
> > > scanf() is designed for parsing space separated data.
> > > Eating spaces it part of its job description.
> > >
> > > 	David
> > 
> > In this case I would prefer to have two versions: one that eats spaces
> > and one that does not.  For instance, I don’t think any user of
> > xenbus_scanf() wants the space-swallowing behavior.  This can be worked
> > around in xenbus_scanf(), of course, by having it reject strings with
> > spaces (as determened by isspace()) before calling vsscanf().
> 
> What sort of formats and data are being used?

Base-10 or base-16 integers, with whitespace never being valid.

> The "%s" format terminates on whitespace.
> Even stroul() (and friends) will skip leading whitespace.

Yes, which is a reason that strto*l() are just broken IMO.  I’m trying
to replace their uses in Xen with custom parsing code.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

RE: [PATCH v3 0/4] Make sscanf() stricter

Posted by David Laight 2 years, 7 months ago

From: Demi Marie Obenour
> Sent: 14 June 2023 21:09
....
> > What sort of formats and data are being used?
> 
> Base-10 or base-16 integers, with whitespace never being valid.

In which case sscanf() really isn't what you are looking for.

> > The "%s" format terminates on whitespace.
> > Even stroul() (and friends) will skip leading whitespace.
> 
> Yes, which is a reason that strto*l() are just broken IMO.

They are not 'broken', that is what is useful most of the time.
The usual problem is that "020" is treated as octal.

> I’m trying to replace their uses in Xen with custom parsing code.

Then write a custom parser :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Andy Shevchenko 2 years, 7 months ago

On Thu, Jun 15, 2023 at 08:06:46AM +0000, David Laight wrote:
> From: Demi Marie Obenour
> > Sent: 14 June 2023 21:09

...

> > > What sort of formats and data are being used?
> > 
> > Base-10 or base-16 integers, with whitespace never being valid.
> 
> In which case sscanf() really isn't what you are looking for.
> 
> > > The "%s" format terminates on whitespace.
> > > Even stroul() (and friends) will skip leading whitespace.
> > 
> > Yes, which is a reason that strto*l() are just broken IMO.
> 
> They are not 'broken', that is what is useful most of the time.
> The usual problem is that "020" is treated as octal.
> 
> > I’m trying to replace their uses in Xen with custom parsing code.
> 
> Then write a custom parser :-)

Hmm... Usually we are against zillion implementations of the same with zillion
bugs hidden (each buggy implementation with its own bugs).

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Petr Mladek 2 years, 7 months ago

On Thu 2023-06-15 14:23:59, Andy Shevchenko wrote:
> On Thu, Jun 15, 2023 at 08:06:46AM +0000, David Laight wrote:
> > From: Demi Marie Obenour
> > > Sent: 14 June 2023 21:09
> 
> ...
> 
> > > > What sort of formats and data are being used?
> > > 
> > > Base-10 or base-16 integers, with whitespace never being valid.
> > 
> > In which case sscanf() really isn't what you are looking for.
> > 
> > > > The "%s" format terminates on whitespace.
> > > > Even stroul() (and friends) will skip leading whitespace.
> > > 
> > > Yes, which is a reason that strto*l() are just broken IMO.
> >
> > They are not 'broken', that is what is useful most of the time.
> > The usual problem is that "020" is treated as octal.

I do not know how many users depend on this behavior. But I believe
that there are such users. And breaking compatibility with userspace
implementation would make more harm then good in this case.

> > > I’m trying to replace their uses in Xen with custom parsing code.
> > 
> > Then write a custom parser :-)

Honestly, I dislike any sscanf() modification which have been suggested
so far:

  + %!d is not acceptable because it produces compiler errors

  + %d! is not acceptable because "use 64!" is a realistic string.
    We could not be sure that "<number>!" will never be parsed
    in kernel.

  + %d%[!] produces compiler error either. It is hard to parse by eyes.
    Also the meaning of such a format would be far from obvious.

  + %pj or another %p modifiers would be hard to understand either.

    Yes, we have %pe but I think that only few people really use it.
    And it is kind of self-explanatory because it is typically
    used together with ERR_PTR() and with variables called
    "err" or "ret".

> Hmm... Usually we are against zillion implementations of the same with zillion
> bugs hidden (each buggy implementation with its own bugs).

I would really like to see the code depending on it. The cover letter
suggests that there already is a patch with such a custom parser.
I am sorry if it has already been mentioned. There were so many threads.

Sure, we do not want two full featured sscanf() implementations. But a
wrapper checking for leading whitespace and using kstrto<foo>
family does not sound too complex.

There should always be a good reason to introduce an incompatibility
between the kernel and the userspace implementation of a commonly
used API.

Best Regards,
Petr

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Demi Marie Obenour 2 years, 7 months ago

On Tue, Jun 20, 2023 at 03:34:09PM +0200, Petr Mladek wrote:
> On Thu 2023-06-15 14:23:59, Andy Shevchenko wrote:
> > On Thu, Jun 15, 2023 at 08:06:46AM +0000, David Laight wrote:
> > > From: Demi Marie Obenour
> > > > Sent: 14 June 2023 21:09
> > 
> > ...
> > 
> > > > > What sort of formats and data are being used?
> > > > 
> > > > Base-10 or base-16 integers, with whitespace never being valid.
> > > 
> > > In which case sscanf() really isn't what you are looking for.
> > > 
> > > > > The "%s" format terminates on whitespace.
> > > > > Even stroul() (and friends) will skip leading whitespace.
> > > > 
> > > > Yes, which is a reason that strto*l() are just broken IMO.
> > >
> > > They are not 'broken', that is what is useful most of the time.
> > > The usual problem is that "020" is treated as octal.
> 
> I do not know how many users depend on this behavior. But I believe
> that there are such users. And breaking compatibility with userspace
> implementation would make more harm then good in this case.
> 
> > > > I’m trying to replace their uses in Xen with custom parsing code.
> > > 
> > > Then write a custom parser :-)
> 
> Honestly, I dislike any sscanf() modification which have been suggested
> so far:
> 
>   + %!d is not acceptable because it produces compiler errors
> 
>   + %d! is not acceptable because "use 64!" is a realistic string.
>     We could not be sure that "<number>!" will never be parsed
>     in kernel.
> 
>   + %d%[!] produces compiler error either. It is hard to parse by eyes.
>     Also the meaning of such a format would be far from obvious.
> 
>   + %pj or another %p modifiers would be hard to understand either.
> 
>     Yes, we have %pe but I think that only few people really use it.
>     And it is kind of self-explanatory because it is typically
>     used together with ERR_PTR() and with variables called
>     "err" or "ret".
> 
> 
> > Hmm... Usually we are against zillion implementations of the same with zillion
> > bugs hidden (each buggy implementation with its own bugs).
> 
> I would really like to see the code depending on it. The cover letter
> suggests that there already is a patch with such a custom parser.
> I am sorry if it has already been mentioned. There were so many threads.
> 
> Sure, we do not want two full featured sscanf() implementations. But a
> wrapper checking for leading whitespace and using kstrto<foo>
> family does not sound too complex.
> 
> There should always be a good reason to introduce an incompatibility
> between the kernel and the userspace implementation of a commonly
> used API.
> 
> Best Regards,
> Petr

I strongly believe that overflow should be forbidden by default, but it
turns out that I do not have time to advance this patch further.  My
understanding is that Xen never wants to allow spaces in Xenstore
entries, but that is easy to ensure via an explicit check prior to
calling vsscanf().
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Andy Shevchenko 2 years, 7 months ago

On Tue, Jun 20, 2023 at 03:34:09PM +0200, Petr Mladek wrote:
> On Thu 2023-06-15 14:23:59, Andy Shevchenko wrote:
> > On Thu, Jun 15, 2023 at 08:06:46AM +0000, David Laight wrote:

...

>   + %pj or another %p modifiers would be hard to understand either.
> 
>     Yes, we have %pe but I think that only few people really use it.
>     And it is kind of self-explanatory because it is typically
>     used together with ERR_PTR() and with variables called
>     "err" or "ret".

j, besides the luck of no (yet) use in the kernel's printf(), is
described for printf(3)

  j   A  following integer conversion corresponds to an intmax_t or uintmax_t
      argument, or a following n conversion corresponds to a pointer to an
      intmax_t argument.

So, I this among all proposals, this one is the best (while all of them may
sound not good).

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Petr Mladek 2 years, 7 months ago

On Tue 2023-06-20 16:52:42, Andy Shevchenko wrote:
> On Tue, Jun 20, 2023 at 03:34:09PM +0200, Petr Mladek wrote:
> > On Thu 2023-06-15 14:23:59, Andy Shevchenko wrote:
> > > On Thu, Jun 15, 2023 at 08:06:46AM +0000, David Laight wrote:
> 
> ...
> 
> >   + %pj or another %p modifiers would be hard to understand either.
> > 
> >     Yes, we have %pe but I think that only few people really use it.
> >     And it is kind of self-explanatory because it is typically
> >     used together with ERR_PTR() and with variables called
> >     "err" or "ret".
> 
> j, besides the luck of no (yet) use in the kernel's printf(), is
> described for printf(3)
> 
>   j   A  following integer conversion corresponds to an intmax_t or uintmax_t
>       argument, or a following n conversion corresponds to a pointer to an
>       intmax_t argument.

I see, I have missed this coincidence. And we would really need to use %pj.
%jd requires intmax_t variable. Otherwise, the compiler produces:

kernel/lib/test.c:10:17: error: format ‘%jd’ expects argument of type ‘intmax_t *’, but argument 3 has type ‘int *’ [-Werror=format=]
  sscanf(str, "%jd hello.", &tmp);

Hmm, %pj might even make some sense for sscanf() which requires pointers anyway.
But still, we would lose the compiler check of the size of the passed
buffer.

This is actually my concern with many other %p modifiers. The compiler
is not able to check that we pass the right pointer. I know that this
might happen even with wrong buffer passed to %s or so. But number
types is another category.

> So, I think among all proposals, this one is the best (while all of them may
> sound not good).

I still prefer the custom handler when it is not too complex.

Or if there are many users, we could create sscanf_strict() or so.

Best Regards,
Petr

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Andy Shevchenko 2 years, 7 months ago

On Tue, Jun 20, 2023 at 04:57:55PM +0200, Petr Mladek wrote:
> On Tue 2023-06-20 16:52:42, Andy Shevchenko wrote:
> > On Tue, Jun 20, 2023 at 03:34:09PM +0200, Petr Mladek wrote:
> > > On Thu 2023-06-15 14:23:59, Andy Shevchenko wrote:
> > > > On Thu, Jun 15, 2023 at 08:06:46AM +0000, David Laight wrote:

...

> > >   + %pj or another %p modifiers would be hard to understand either.
> > > 
> > >     Yes, we have %pe but I think that only few people really use it.
> > >     And it is kind of self-explanatory because it is typically
> > >     used together with ERR_PTR() and with variables called
> > >     "err" or "ret".
> > 
> > j, besides the luck of no (yet) use in the kernel's printf(), is
> > described for printf(3)
> > 
> >   j   A  following integer conversion corresponds to an intmax_t or uintmax_t
> >       argument, or a following n conversion corresponds to a pointer to an
> >       intmax_t argument.
> 
> I see, I have missed this coincidence. And we would really need to use %pj.
> %jd requires intmax_t variable. Otherwise, the compiler produces:
> 
> kernel/lib/test.c:10:17: error: format ‘%jd’ expects argument of type ‘intmax_t *’, but argument 3 has type ‘int *’ [-Werror=format=]
>   sscanf(str, "%jd hello.", &tmp);
> 
> Hmm, %pj might even make some sense for sscanf() which requires pointers anyway.
> But still, we would lose the compiler check of the size of the passed
> buffer.
> 
> This is actually my concern with many other %p modifiers. The compiler
> is not able to check that we pass the right pointer. I know that this
> might happen even with wrong buffer passed to %s or so. But number
> types is another category.

Yeah, it was a discussion IIRC for the compiler plugin to support %p
extensions, but I have no idea where it's now.

> > So, I think among all proposals, this one is the best (while all of them may
> > sound not good).
> 
> I still prefer the custom handler when it is not too complex.
> 
> Or if there are many users, we could create sscanf_strict() or so.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v3 0/4] Make sscanf() stricter

Posted by Andy Shevchenko 2 years, 7 months ago

On Tue, Jun 20, 2023 at 04:52:43PM +0300, Andy Shevchenko wrote:
> On Tue, Jun 20, 2023 at 03:34:09PM +0200, Petr Mladek wrote:
> > On Thu 2023-06-15 14:23:59, Andy Shevchenko wrote:
> > > On Thu, Jun 15, 2023 at 08:06:46AM +0000, David Laight wrote:

...

> >   + %pj or another %p modifiers would be hard to understand either.
> > 
> >     Yes, we have %pe but I think that only few people really use it.
> >     And it is kind of self-explanatory because it is typically
> >     used together with ERR_PTR() and with variables called
> >     "err" or "ret".
> 
> j, besides the luck of no (yet) use in the kernel's printf(), is
> described for printf(3)
> 
>   j   A  following integer conversion corresponds to an intmax_t or uintmax_t
>       argument, or a following n conversion corresponds to a pointer to an
>       intmax_t argument.
> 
> So, I this among all proposals, this one is the best (while all of them may

s/this/think

> sound not good).

-- 
With Best Regards,
Andy Shevchenko

RE: [PATCH v3 0/4] Make sscanf() stricter

Posted by David Laight 2 years, 7 months ago

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Sent: 15 June 2023 12:24
> ...
> 
> > > > What sort of formats and data are being used?
> > >
> > > Base-10 or base-16 integers, with whitespace never being valid.
> >
> > In which case sscanf() really isn't what you are looking for.
> >
> > > > The "%s" format terminates on whitespace.
> > > > Even stroul() (and friends) will skip leading whitespace.
> > >
> > > Yes, which is a reason that strto*l() are just broken IMO.
> >
> > They are not 'broken', that is what is useful most of the time.
> > The usual problem is that "020" is treated as octal.
> >
> > > I’m trying to replace their uses in Xen with custom parsing code.
> >
> > Then write a custom parser :-)
> 
> Hmm... Usually we are against zillion implementations of the same with zillion
> bugs hidden (each buggy implementation with its own bugs).

Using strtoull() for the actual numeric conversion removes most of that.

I am intrigued about why you want to error leading whitespace.
I bet you'll find something that is passing space-padded input.
It might be against the letter of the spec - but it doesn't mean
it doesn't happen.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)