fs/exec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Problem: The printable macro in exec.c uses custom logic
to determine if a character is printable, even though
the kernel supplies existing facilities.
Solution: Refactor to use isprint and isspace.
Signed-off-by: Nir Lichtman <nir@lichtman.org>
---
v2: fix to also consider space characters as printables
Side-note: I called the previous version "refactor an invalid
executable check to use isprint"
fs/exec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c
index 6c53920795c2..3b4c7548427f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1723,7 +1723,7 @@ int remove_arg_zero(struct linux_binprm *bprm)
}
EXPORT_SYMBOL(remove_arg_zero);
-#define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e))
+#define printable(c) (isprint(c) || isspace(c))
/*
* cycle the list of binary formats handler, until one recognizes the image
*/
--
2.39.2
On Sat, Nov 16, 2024 at 06:12:58AM +0000, Nir Lichtman wrote: > -#define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e)) > +#define printable(c) (isprint(c) || isspace(c)) cat >a.c <<'EOF' #include <ctype.h> #include <stdio.h> #define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e)) int main() { for (int i = 0; i < 255; i++) if (printable(i) != (isspace(i) || isprint(i))) printf("%02x\n", i); return 0; } EOF cc a.c LANG=C ./a.out and watch the output. Now, whether that logics makes sense is a separate story; that's before my time (1.3.60), so... AFAICS, it's a filter for autoloading a binfmt module by way of an alias; we do _not_ want to step on a random text file (back then #! had been handled directly, prior to that area; by the time when it was made a module, everyone probably forgot about the entire mess - that was in 2013). So it picked the 3rd and 4th bytes of the binary and tries to modprobe binfmt-<number>. Ho-hum... AFAICS, there hadn't been such aliases in any binfmt module back then (or now, for that matter), so it would have to come from userland config... OK, 1.3.61 is when fs/binfmt_aout.c got separated from fs/exec.c and became modular, so that's almost certainly what it had been about. Setups with modular aout support wanting to autoload it when finally hitting an aout binary... These days aout support is gone. And while that stuff might have been useful once upon a time (e.g. for binfmt_java, back when it existed), but these days anything of that sort would be handled by binfmt_misc-based setups. ISTR seeing a bunch of stale binfmt_aout aliases of that form way back, but e.g. Debian had gotten rid of those quite a while ago... <after a bit more software coproarchaeology> Looks like module-init-tools used to put (completely useless by that time, due to the printf format change from %hd to %04x) aliases for binfmt_aout into /etc/modprobe.d/aliases.conf: alias binfmt-204 binfmt_aout alias binfmt-263 binfmt_aout alias binfmt-264 binfmt_aout alias binfmt-267 binfmt_aout alias binfmt-387 binfmt_aout until 2011, when it had been put out of its misery. Anyway, looks like nothing had been using that other than for aout, aout32 and java. All of those are gone, so... is there any point keeping that crap around? Linus, could we simply bury that request_module("binfmt-%04x",...) in search_binary_handler(), along with 'retry' logics in there? What could possibly use it?
On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Now, whether that logics makes sense is a separate story; > that's before my time (1.3.60), so... Bah. The whole ctype stuff is a mess, partly because it's fundamentally a broken concept and depends on locale. The original ctype array was US-ASCII only, and at some point in the random past it got changed to be based on Latin1. Maybe indeed 1.3.60 as you say, I didn't go digging around. And Latin1 is not only what I used to use, it's the "low range of unicode". So it makes *some* sense, but not a whole lot. It might be good to go back to US-ASCII just as a true lowest common denominator, because people who use the ctype macros almost certainly don't actually do it on unicode characters, they do it on bytes, and then UTF-8 will not actually DTRT with anything but US-ASCII anyway. And this fs/exec.c code is really confused. It explicitly doesn't use our ctype, probably exactly *because* of high-bit issues, but then it calls '\t' and '\n' is "printable", makes little sense. What that code seems to basically want to do is to check that 'bprm->buf' is some binary data, not text, and using "printable()" as a name for that is actively misleading and wrong. So I really think that all it does is try to protect from even *trying* to execute a text-file or script as a binary. Anyway, the fs/exec.c "printable()" code most definitely shouldn't use the ctype stuff. I'm not sure it should exist at all, and if it should exist it probably should be renamed. Because it has *nothing* to do with "isprint()". > Looks like module-init-tools used to put (completely useless by that time, > due to the printf format change from %hd to %04x) aliases for binfmt_aout > into /etc/modprobe.d/aliases.conf: > alias binfmt-204 binfmt_aout > alias binfmt-263 binfmt_aout > alias binfmt-264 binfmt_aout > alias binfmt-267 binfmt_aout > alias binfmt-387 binfmt_aout > until 2011, when it had been put out of its misery. What a crock. Yeah, that code seems to be entirely dead. The "two first bytes" thing only makes sense for a.out, and those numbers are odd. The first four ones make sense (QMAGIC, OMAGIC, NMAGIC and ZMAGIC respectively), but 387 doesn't even seem to match any a.out magic number Oh, it's CMAGIC. Which is a core file. That you should never try to execute. So it's just insane. Very strange. Anyway, a.out support is dead, so I think this code is pure historical leftovers and should be removed. Linus
On Sat, Nov 16 2024, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> Now, whether that logics makes sense is a separate story; >> that's before my time (1.3.60), so... > > Bah. The whole ctype stuff is a mess, partly because it's > fundamentally a broken concept and depends on locale. > > The original ctype array was US-ASCII only, and at some point in the > random past it got changed to be based on Latin1. Maybe indeed 1.3.60 > as you say, I didn't go digging around. > > And Latin1 is not only what I used to use, it's the "low range of > unicode". So it makes *some* sense, but not a whole lot. Yes, but the kernel's ctype is almost-but-not-quite latin1... > It might be good to go back to US-ASCII just as a true lowest common > denominator, because people who use the ctype macros almost certainly > don't actually do it on unicode characters, they do it on bytes, and > then UTF-8 will not actually DTRT with anything but US-ASCII anyway. Exactly. But you said otherwise two years ago: https://lore.kernel.org/lkml/3a2fa7c1-2e31-0479-761f-9c189f8ed8c3@rasmusvillemoes.dk/ Rasmus
On Mon, 18 Nov 2024 at 03:45, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > Exactly. But you said otherwise two years ago: > https://lore.kernel.org/lkml/3a2fa7c1-2e31-0479-761f-9c189f8ed8c3@rasmusvillemoes.dk/ I do tend to stand by the "anybody who uses ctype deserves to get what they asked for" thing. But yeah, I think the time has come to admit that Latin1 isn't what we want. And I'm looking at some of the users we _do_ have, and (ignoring tools/ and script/) the most common ones seem to be 14 isascii 32 isalpha 46 isxdigit 56 isprint 59 isalnum 158 isspace 187 isdigit and none of them would really care if we just limited it to ASCII again. In fact, the isprint() ones would seem to generally be much improved (looking at the ACPI uses). I suspect we could make them be range-based instead of based on that table lookup while at it. So if somebody sends me a tested patch... Linus
On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote: > The original ctype array was US-ASCII only, and at some point in the > random past it got changed to be based on Latin1. Maybe indeed 1.3.60 > as you say, I didn't go digging around. Almost certainly unrelated. 0.10: initial support of #! in do_execve() 0.99.12: binfmt added; array, with aout as the hardwired first entry. #! handling happens prior to looking through that array. 0.99.13: binfmt_elf added; goes into the second slot of array if enabled. 0.99.14g: binfmt_coff added. 1.1.14: formats is a linked list instead of an array now, aout is still the hardwired first entry. Anything else should call register_binfmt(). Nobody does, though, so elf and coff support got broken? Lovely... At the same point binfmt_elf.c got copied from fs/ to ibcs/, along with fs/binfmt_coff.c (originals gone in 1.1.23) 1.1.54: fs/binfmt_elf.c returns (ibcs copy sticks around) elf_format is added back to the list - explicitly kludged into initializer of aout_format if non-modular and handled by register_binfmt() if modular. 1.1.77: ibcs moved to arch/i386/ibcs, stuff in there still not reachable? 1.3.5: arch/i386/ibcs finally gone; exists out of tree, with varying degrees of rot. 1.3.60: binfmt autoload if no match found, setting aliases up is up to userland (decimal aliases, at that). Check for "printable" first 4 bytes added, apparently to filter out text files (recall that msdosfs marked everything executable). Aliases potentially useful for iBCS2 binaries (with out of tree module). 1.3.61: aout made modular, both elf and aout use register_binfmt() both in modular and built-in cases (earlier kludge would be hopeless, so it's gone and good riddance). These two (aout and elf) are the only binfmt in the tree. Aliases are theoretically useful for both, probably set up only for aout - insmod(8) itself is ELF by that point. Again, that's up to distros - we are decades before MODULE_ALIAS machinery, so the kernel build is not setting those up. 1.3.60/1.3.61 look like a transition to modular a.out, on systems with already mostly ELF userland - arranging for autoload of modular aout and allowing it to become modular. Filter hadn't been there before that - it appeared along with autoload, so ctype changes are irrelevant; there hadn't been an earlier stage of that thing anyway. 1.3.71: binfmt_script added (unconditional, nominally may be modular, in reality always built-in). Special-casing of #! is gone. Never going to be autoloaded (and filter would reject it anyway). 1.3.100: binfmt_java added. Two formats (java binary and crapplet); the former might be subject to autoload (and I've seen such aliases in examples of /etc/module* on the net), the latter couldn't - text files. 2.1.23: aliases went from binfmt-%hd to binfmt-%04x; userland either adapts, or it hadn't needed those in the first place... 2.1.32: binfmt_em86 added (alpha emulation of i386 ELF); aliases are not going to be useful for that - the first 4 bytes are identical to those for native ELF. 2.1.34: binfmt_elf32 added (sparc64 compat ELF); same story wrt aliases usefulness. 2.1.43pre1: binfmt_misc added; at that point aliases have become useless - in effect, that's a better replacement for those. another binfmt_elf32 (itanic compat) 2.1.44pre3: binfmt_irixelf added (irix compat); ELF binaries, so aliases are useless. 2.1.44: binfmt_aout32 added (sparc64 compat aout) 2.3.5: binfmt_java gone Past that point there's really not much happening - and by now all traces of aout are gone, which leaves this autoload logics pretty much useless. I mean, all ELF binaries are indistinguishable to it, scripts are out of consideration anyway, aliases for binfmt_misc are pretty much insane... What does it leave? binfmt_flat? Sorry, the first 4 bytes are "bFLT", so it won't pass the filter... All alias examples I had been able to find were for * a.out variants * iBCS2 * java Oh, and binfmt-0000 aliased to /bin/true, so that trying to exec a zero-filled file wouldn't trigger whining from the modprobe when we hit that autoload. IMO we should simply take it out.
On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote: > On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Now, whether that logics makes sense is a separate story; > > that's before my time (1.3.60), so... > [...] > > Anyway, the fs/exec.c "printable()" code most definitely shouldn't use > the ctype stuff. I'm not sure it should exist at all, and if it should > exist it probably should be renamed. Because it has *nothing* to do > with "isprint()". > [...] > > Anyway, a.out support is dead, so I think this code is pure historical > leftovers and should be removed. > > Linus Thanks for answering Al and Linus. Al, continuing forward, to work on a new version of the patch removing the support for dynamically loading binfmt kernel modules or you'll take it from here? Thanks, Nir
On Sat, Nov 16, 2024 at 05:23:01PM +0000, Nir Lichtman wrote: > On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote: > > On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > Now, whether that logics makes sense is a separate story; > > > that's before my time (1.3.60), so... > > > [...] > > > > Anyway, the fs/exec.c "printable()" code most definitely shouldn't use > > the ctype stuff. I'm not sure it should exist at all, and if it should > > exist it probably should be renamed. Because it has *nothing* to do > > with "isprint()". > > > [...] > > > > Anyway, a.out support is dead, so I think this code is pure historical > > leftovers and should be removed. > > > > Linus > > Thanks for answering Al and Linus. > Al, continuing forward, to work on a new version of the patch removing the > support for dynamically loading binfmt kernel modules or you'll take it > from here? Just kill it off, since you are poking in the area anyway... No point coordinating patches, etc. - removal is completely straightforward, with something along the lines of "that was an ancient leftover from a.out-to-ELF transition, left without a single valid use after removal of a.out support; anyone who might find future uses for it (currently there's none) would be better off using binfmt_misc to trigger whatever module loading they might need - would be more flexible that way"
On Sat, Nov 16, 2024 at 10:06:02PM +0000, Al Viro wrote: > On Sat, Nov 16, 2024 at 05:23:01PM +0000, Nir Lichtman wrote: > > On Sat, Nov 16, 2024 at 08:49:39AM -0800, Linus Torvalds wrote: > > > On Fri, 15 Nov 2024 at 23:28, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > > Anyway, a.out support is dead, so I think this code is pure historical > > > leftovers and should be removed. > > > > > > Linus > > > > Thanks for answering Al and Linus. > > Al, continuing forward, to work on a new version of the patch removing the > > support for dynamically loading binfmt kernel modules or you'll take it > > from here? > > Just kill it off, since you are poking in the area anyway... No point Roger, will send out a new patch shortly. > coordinating patches, etc. - removal is completely straightforward, > with something along the lines of "that was an ancient leftover from > a.out-to-ELF transition, left without a single valid use after removal > of a.out support; anyone who might find future uses for it (currently > there's none) would be better off using binfmt_misc to trigger whatever > module loading they might need - would be more flexible that way"
© 2016 - 2024 Red Hat, Inc.