[PATCH v2] exec: make printable macro more concise

Nir Lichtman posted 1 patch 5 days, 22 hours ago
fs/exec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v2] exec: make printable macro more concise
Posted by Nir Lichtman 5 days, 22 hours ago
Problem: The printable macro in exec.c uses custom logic
to determine if a character is printable, even though
the kernel supplies existing facilities.

Solution: Refactor to use isprint and isspace.

Signed-off-by: Nir Lichtman <nir@lichtman.org>
---

v2: fix to also consider space characters as printables
Side-note: I called the previous version "refactor an invalid
executable check to use isprint"

 fs/exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 6c53920795c2..3b4c7548427f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1723,7 +1723,7 @@ int remove_arg_zero(struct linux_binprm *bprm)
 }
 EXPORT_SYMBOL(remove_arg_zero);
 
-#define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e))
+#define printable(c) (isprint(c) || isspace(c))
 /*
  * cycle the list of binary formats handler, until one recognizes the image
  */
-- 
2.39.2
Re: [PATCH v2] exec: make printable macro more concise
Posted by Al Viro 5 days, 20 hours ago
On Sat, Nov 16, 2024 at 06:12:58AM +0000, Nir Lichtman wrote:

> -#define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e))
> +#define printable(c) (isprint(c) || isspace(c))

cat >a.c <<'EOF'
#include <ctype.h>
#include <stdio.h>

#define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e))

int main()
{
        for (int i = 0; i < 255; i++)
		if (printable(i) != (isspace(i) || isprint(i)))
			printf("%02x\n", i);
	return 0;
}
EOF
cc a.c
LANG=C ./a.out

and watch the output.  Now, whether that logics makes sense is a separate story;
that's before my time (1.3.60), so...

AFAICS, it's a filter for autoloading a binfmt module by way of an alias;
we do _not_ want to step on a random text file (back then #! had been
handled directly, prior to that area; by the time when it was made a module,
everyone probably forgot about the entire mess - that was in 2013).

So it picked the 3rd and 4th bytes of the binary and tries to modprobe
binfmt-<number>.  Ho-hum...  AFAICS, there hadn't been such aliases
in any binfmt module back then (or now, for that matter), so it would
have to come from userland config...

OK, 1.3.61 is when fs/binfmt_aout.c got separated from fs/exec.c and became
modular, so that's almost certainly what it had been about.  Setups with
modular aout support wanting to autoload it when finally hitting an
aout binary...

These days aout support is gone.  And while that stuff might have been
useful once upon a time (e.g. for binfmt_java, back when it existed),
but these days anything of that sort would be handled by binfmt_misc-based
setups.

ISTR seeing a bunch of stale binfmt_aout aliases of that form way back,
but e.g. Debian had gotten rid of those quite a while ago...

<after a bit more software coproarchaeology>
Looks like module-init-tools used to put (completely useless by that time,
due to the printf format change from %hd to %04x) aliases for binfmt_aout
into /etc/modprobe.d/aliases.conf:
	alias binfmt-204 binfmt_aout
	alias binfmt-263 binfmt_aout
	alias binfmt-264 binfmt_aout
	alias binfmt-267 binfmt_aout
	alias binfmt-387 binfmt_aout
until 2011, when it had been put out of its misery.

Anyway, looks like nothing had been using that other than for aout,
aout32 and java.  All of those are gone, so... is there any point
keeping that crap around?

Linus, could we simply bury that request_module("binfmt-%04x",...) in
search_binary_handler(), along with 'retry' logics in there?  What
could possibly use it?
Re: [PATCH v2] exec: make printable macro more concise
Posted by Linus Torvalds 5 days, 11 hours ago
On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Now, whether that logics makes sense is a separate story;
> that's before my time (1.3.60), so...

Bah. The whole ctype stuff is a mess, partly because it's
fundamentally a broken concept and depends on locale.

The original ctype array was US-ASCII only, and at some point in the
random past it got changed to be based on Latin1. Maybe indeed 1.3.60
as you say, I didn't go digging around.

And Latin1 is not only what I used to use, it's the "low range of
unicode". So it makes *some* sense, but not a whole lot.

It might be good to go back to US-ASCII just as a true lowest common
denominator, because people who use the ctype macros almost certainly
don't actually do it on unicode characters, they do it on bytes, and
then UTF-8 will not actually DTRT with anything but US-ASCII anyway.

And this fs/exec.c code is really confused. It explicitly doesn't use
our ctype, probably exactly *because* of high-bit issues, but then it
calls '\t' and '\n' is "printable", makes little sense.

What that code seems to basically want to do is to check that
'bprm->buf' is some binary data, not text, and using "printable()" as
a name for that is actively misleading and wrong.

So I really think that all it does is try to protect from even
*trying* to execute a text-file or script as a binary.

Anyway, the fs/exec.c "printable()" code most definitely shouldn't use
the ctype stuff. I'm not sure it should exist at all, and if it should
exist it probably should be renamed. Because it has *nothing* to do
with "isprint()".

> Looks like module-init-tools used to put (completely useless by that time,
> due to the printf format change from %hd to %04x) aliases for binfmt_aout
> into /etc/modprobe.d/aliases.conf:
>         alias binfmt-204 binfmt_aout
>         alias binfmt-263 binfmt_aout
>         alias binfmt-264 binfmt_aout
>         alias binfmt-267 binfmt_aout
>         alias binfmt-387 binfmt_aout
> until 2011, when it had been put out of its misery.

What a crock. Yeah, that code seems to be entirely dead.

The "two first bytes" thing only makes sense for a.out, and those
numbers are odd. The first four ones make sense (QMAGIC, OMAGIC,
NMAGIC and ZMAGIC respectively), but 387 doesn't even seem to match
any a.out magic number

Oh, it's CMAGIC. Which is a core file. That you should never try to
execute. So it's just insane.

Very strange.

Anyway, a.out support is dead, so I think this code is pure historical
leftovers and should be removed.

           Linus
Re: [PATCH v2] exec: make printable macro more concise
Posted by Rasmus Villemoes 3 days, 16 hours ago
On Sat, Nov 16 2024, Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Now, whether that logics makes sense is a separate story;
>> that's before my time (1.3.60), so...
>
> Bah. The whole ctype stuff is a mess, partly because it's
> fundamentally a broken concept and depends on locale.
>
> The original ctype array was US-ASCII only, and at some point in the
> random past it got changed to be based on Latin1. Maybe indeed 1.3.60
> as you say, I didn't go digging around.
>
> And Latin1 is not only what I used to use, it's the "low range of
> unicode". So it makes *some* sense, but not a whole lot.

Yes, but the kernel's ctype is almost-but-not-quite latin1...

> It might be good to go back to US-ASCII just as a true lowest common
> denominator, because people who use the ctype macros almost certainly
> don't actually do it on unicode characters, they do it on bytes, and
> then UTF-8 will not actually DTRT with anything but US-ASCII anyway.

Exactly. But you said otherwise two years ago:
https://lore.kernel.org/lkml/3a2fa7c1-2e31-0479-761f-9c189f8ed8c3@rasmusvillemoes.dk/

Rasmus
Re: [PATCH v2] exec: make printable macro more concise
Posted by Linus Torvalds 3 days, 11 hours ago
On Mon, 18 Nov 2024 at 03:45, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
>
> Exactly. But you said otherwise two years ago:
> https://lore.kernel.org/lkml/3a2fa7c1-2e31-0479-761f-9c189f8ed8c3@rasmusvillemoes.dk/

I do tend to stand by the "anybody who uses ctype deserves to get what
they asked for" thing.

But yeah, I think the time has come to admit that Latin1 isn't what we
want. And I'm looking at some of the users we _do_ have, and (ignoring
tools/ and script/) the most common ones seem to be

     14 isascii
     32 isalpha
     46 isxdigit
     56 isprint
     59 isalnum
    158 isspace
    187 isdigit

and none of them would really care if we just limited it to ASCII
again. In fact, the isprint() ones would seem to generally be much
improved (looking at the ACPI uses).

I suspect we could make them be range-based instead of based on that
table lookup while at it.

So if somebody sends me a tested patch...

               Linus
Re: [PATCH v2] exec: make printable macro more concise
Posted by Al Viro 5 days, 6 hours ago
On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote:

> The original ctype array was US-ASCII only, and at some point in the
> random past it got changed to be based on Latin1. Maybe indeed 1.3.60
> as you say, I didn't go digging around.

Almost certainly unrelated.
0.10:
	initial support of #! in do_execve()
0.99.12:
	binfmt added; array, with aout as the hardwired first entry.
	#! handling happens prior to looking through that array.
0.99.13:
	binfmt_elf added; goes into the second slot of array if
	enabled.
0.99.14g:
	binfmt_coff added.
1.1.14:
	formats is a linked list instead of an array now, aout is
	still the hardwired first entry.  Anything else
	should call register_binfmt().  Nobody does, though, so
	elf and coff support got broken?  Lovely...
	At the same point binfmt_elf.c got copied from fs/ to ibcs/,
	along with fs/binfmt_coff.c (originals gone in 1.1.23)
1.1.54:
	fs/binfmt_elf.c returns (ibcs copy sticks around)
	elf_format is added back to the list - explicitly kludged
	into initializer of aout_format if non-modular and
	handled by register_binfmt() if modular.
1.1.77:
	ibcs moved to arch/i386/ibcs, stuff in there still not
	reachable?
1.3.5:
	arch/i386/ibcs finally gone; exists out of tree, with
	varying degrees of rot.
1.3.60:
	binfmt autoload if no match found, setting aliases up is
	up to userland (decimal aliases, at that).  Check for "printable"
	first 4 bytes added, apparently to filter out text files (recall
	that msdosfs marked everything executable).
	Aliases potentially useful for iBCS2 binaries (with out of tree
	module).
1.3.61:
	aout made modular, both elf and aout use register_binfmt()
	both in modular and built-in cases (earlier kludge would
	be hopeless, so it's gone and good riddance).
	These two (aout and elf) are the only binfmt in the tree.
	Aliases are theoretically useful for both, probably set up only
	for aout - insmod(8) itself is ELF by that point.  Again,
	that's up to distros - we are decades before MODULE_ALIAS
	machinery, so the kernel build is not setting those up.

1.3.60/1.3.61 look like a transition to modular a.out, on systems with
already mostly ELF userland - arranging for autoload of modular aout
and allowing it to become modular.  Filter hadn't been there before
that - it appeared along with autoload, so ctype changes are irrelevant;
there hadn't been an earlier stage of that thing anyway.

1.3.71:
	binfmt_script added (unconditional, nominally may be modular,
	in reality always built-in).  Special-casing of #! is gone.
	Never going to be autoloaded (and filter would reject it
	anyway).
1.3.100:
	binfmt_java added.  Two formats (java binary and crapplet);
	the former might be subject to autoload (and I've seen
	such aliases in examples of /etc/module* on the net),
	the latter couldn't - text files.
2.1.23:
	aliases went from binfmt-%hd to binfmt-%04x; userland either
	adapts, or it hadn't needed those in the first place...
2.1.32:
	binfmt_em86 added (alpha emulation of i386 ELF); aliases
	are not going to be useful for that - the first 4 bytes are
	identical to those for native ELF.
2.1.34:
	binfmt_elf32 added (sparc64 compat ELF); same story wrt
	aliases usefulness.
2.1.43pre1:
	binfmt_misc added; at that point aliases have become useless -
	in effect, that's a better replacement for those.
	another binfmt_elf32 (itanic compat)
2.1.44pre3:
	binfmt_irixelf added (irix compat); ELF binaries, so aliases
	are useless.
2.1.44:
	binfmt_aout32 added (sparc64 compat aout)
2.3.5:
	binfmt_java gone

Past that point there's really not much happening - and by now all
traces of aout are gone, which leaves this autoload logics pretty
much useless.

I mean, all ELF binaries are indistinguishable to it, scripts are
out of consideration anyway, aliases for binfmt_misc are pretty
much insane...  What does it leave?  binfmt_flat?  Sorry, the first
4 bytes are "bFLT", so it won't pass the filter...

All alias examples I had been able to find were for
	* a.out variants
	* iBCS2
	* java
Oh, and binfmt-0000 aliased to /bin/true, so that trying to exec a
zero-filled file wouldn't trigger whining from the modprobe when
we hit that autoload.

IMO we should simply take it out.
Re: [PATCH v2] exec: make printable macro more concise
Posted by Nir Lichtman 5 days, 11 hours ago
On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote:
> On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Now, whether that logics makes sense is a separate story;
> > that's before my time (1.3.60), so...
> 
[...]
> 
> Anyway, the fs/exec.c "printable()" code most definitely shouldn't use
> the ctype stuff. I'm not sure it should exist at all, and if it should
> exist it probably should be renamed. Because it has *nothing* to do
> with "isprint()".
> 
[...]
> 
> Anyway, a.out support is dead, so I think this code is pure historical
> leftovers and should be removed.
> 
>            Linus

Thanks for answering Al and Linus.
Al, continuing forward, to work on a new version of the patch removing the
support for dynamically loading binfmt kernel modules or you'll take it
from here?

Thanks,
Nir
Re: [PATCH v2] exec: make printable macro more concise
Posted by Al Viro 5 days, 6 hours ago
On Sat, Nov 16, 2024 at 05:23:01PM +0000, Nir Lichtman wrote:
> On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote:
> > On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > Now, whether that logics makes sense is a separate story;
> > > that's before my time (1.3.60), so...
> > 
> [...]
> > 
> > Anyway, the fs/exec.c "printable()" code most definitely shouldn't use
> > the ctype stuff. I'm not sure it should exist at all, and if it should
> > exist it probably should be renamed. Because it has *nothing* to do
> > with "isprint()".
> > 
> [...]
> > 
> > Anyway, a.out support is dead, so I think this code is pure historical
> > leftovers and should be removed.
> > 
> >            Linus
> 
> Thanks for answering Al and Linus.
> Al, continuing forward, to work on a new version of the patch removing the
> support for dynamically loading binfmt kernel modules or you'll take it
> from here?

Just kill it off, since you are poking in the area anyway...  No point
coordinating patches, etc. - removal is completely straightforward,
with something along the lines of "that was an ancient leftover from
a.out-to-ELF transition, left without a single valid use after removal
of a.out support; anyone who might find future uses for it (currently
there's none) would be better off using binfmt_misc to trigger whatever
module loading they might need - would be more flexible that way"
Re: [PATCH v2] exec: make printable macro more concise
Posted by Nir Lichtman 5 days, 5 hours ago
On Sat, Nov 16, 2024 at 10:06:02PM +0000, Al Viro wrote:
> On Sat, Nov 16, 2024 at 05:23:01PM +0000, Nir Lichtman wrote:
> > On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote:
> > > On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > Anyway, a.out support is dead, so I think this code is pure historical
> > > leftovers and should be removed.
> > > 
> > >            Linus
> > 
> > Thanks for answering Al and Linus.
> > Al, continuing forward, to work on a new version of the patch removing the
> > support for dynamically loading binfmt kernel modules or you'll take it
> > from here?
> 
> Just kill it off, since you are poking in the area anyway...  No point

Roger, will send out a new patch shortly.

> coordinating patches, etc. - removal is completely straightforward,
> with something along the lines of "that was an ancient leftover from
> a.out-to-ELF transition, left without a single valid use after removal
> of a.out support; anyone who might find future uses for it (currently
> there's none) would be better off using binfmt_misc to trigger whatever
> module loading they might need - would be more flexible that way"