[PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe

Theodore Dubois posted 1 patch 1 year, 3 months ago
kernel/fork.c | 22 ----------------------
1 file changed, 22 deletions(-)
[PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by Theodore Dubois 1 year, 3 months ago
As far as I can tell, the original purpose of this check was simply as
the easiest way to work with a quirk of /proc/self/exe at the time. From
the original patch[1]:

    Note it allows to change /proc/$pid/exe iif there
    are no VM_EXECUTABLE vmas present for current process,
    simply because this feature is a special to C/R
    and mm::num_exe_file_vmas become meaningless after
    that.

num_exe_file_vmas was created to preserve a quirk of the original
/proc/self/exe implementation: if you unmapped all executable VMAs,
/proc/self/exe would disappear (because it worked by scanning the
address space for the first executable VMA.) Keeping the quirk after
switching to just saving the executable on the mm worked by keeping a
count of executable VMAs in num_exe_file_vmas, and zeroing exe_file when
it reached zero. You can probably see how it would have been annoying to
handle both num_exe_file_vmas and this prctl intending to change
exe_file, and it's easier to only allow changing exe_file after
num_exe_file_vmas has already gone to 0 and made exe_file null.

However, num_exe_file_vmas no longer exists[2]. This quirk was taken out
because it would save a bit in the vma flags, and it seems clear by now
that nobody was relying on it. These days you can simply update exe_file
with no interference.

Recently a use case for this prctl has come up outside of
checkpoint/restore, namely binfmt_misc based emulators such as FEX[3].
Any program that uses /proc/self/exe will, of course, expect it to point
to its own executable. But when executed through binfmt_misc, it will be
the emulator, resulting in compatibility issues. Emulators currently
have to attempting to intercept syscalls targeting /proc/self/exe to
redirect the path, and this is not possible in the general case
considering how flexible path resolution is. For more detail on this see
[3].

The above seems to me like a solid case for simply dropping the check.
It's also worth noting that it is already possible to achieve the same
result by the laborious and complex process of just unmapping all your
code and remapping it again after the switch (just remember to put the
code that does this in a .so!), so this is not strictly allowing
anything that wasn't allowed before. It's just cutting red tape.

[1]: https://lore.kernel.org/lkml/20120316210343.925446961@openvz.org/
[2]: https://lore.kernel.org/lkml/20120731104230.20515.72416.stgit@zurg/
[3]: https://lore.kernel.org/lkml/CABnRqDdzqfB1_ixd-2JnfSocKvXNM+9ivM1hhd1C=ejLQyen8g@mail.gmail.com/

Signed-off-by: Theodore Dubois <tblodt@icloud.com>
Cc: Ryan Houdek <sonicadvance1@gmail.com>
Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
---
 kernel/fork.c | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f..407e515b9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1430,30 +1430,8 @@ int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
  */
 int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 {
-	struct vm_area_struct *vma;
-	struct file *old_exe_file;
 	int ret = 0;
 
-	/* Forbid mm->exe_file change if old file still mapped. */
-	old_exe_file = get_mm_exe_file(mm);
-	if (old_exe_file) {
-		VMA_ITERATOR(vmi, mm, 0);
-		mmap_read_lock(mm);
-		for_each_vma(vmi, vma) {
-			if (!vma->vm_file)
-				continue;
-			if (path_equal(&vma->vm_file->f_path,
-				       &old_exe_file->f_path)) {
-				ret = -EBUSY;
-				break;
-			}
-		}
-		mmap_read_unlock(mm);
-		fput(old_exe_file);
-		if (ret)
-			return ret;
-	}
-
 	get_file(new_exe_file);
 
 	/* set the new file */
-- 
2.46.0
Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by kernel test robot 1 year, 3 months ago
Hi Theodore,

kernel test robot noticed the following build errors:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on kees/for-next/kspp linus/master v6.11-rc5 next-20240829]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Theodore-Dubois/prctl-allow-prctl_set_mm_exe_file-without-unmapping-old-exe/20240828-060019
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
patch link:    https://lore.kernel.org/r/20240827215930.24703-1-tblodt%40icloud.com
patch subject: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240830/202408301005.IiGdSjk2-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240830/202408301005.IiGdSjk2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408301005.IiGdSjk2-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   kernel/fork.c: In function 'replace_mm_exe_file':
>> kernel/fork.c:1439:9: error: 'old_exe_file' undeclared (first use in this function); did you mean 'new_exe_file'?
    1439 |         old_exe_file = rcu_dereference_raw(mm->exe_file);
         |         ^~~~~~~~~~~~
         |         new_exe_file
   kernel/fork.c:1439:9: note: each undeclared identifier is reported only once for each function it appears in
>> kernel/fork.c:1433:13: warning: unused variable 'ret' [-Wunused-variable]
    1433 |         int ret = 0;
         |             ^~~


vim +1439 kernel/fork.c

3864601387cf41 Jiri Slaby              2011-05-26  1421  
35d7bdc86031a2 David Hildenbrand       2021-04-23  1422  /**
35d7bdc86031a2 David Hildenbrand       2021-04-23  1423   * replace_mm_exe_file - replace a reference to the mm's executable file
ff0712ea71f173 Matthew Wilcox (Oracle  2023-08-24  1424)  * @mm: The mm to change.
ff0712ea71f173 Matthew Wilcox (Oracle  2023-08-24  1425)  * @new_exe_file: The new file to use.
35d7bdc86031a2 David Hildenbrand       2021-04-23  1426   *
a7031f14525751 Mateusz Guzik           2023-08-14  1427   * This changes mm's executable file (shown as symlink /proc/[pid]/exe).
35d7bdc86031a2 David Hildenbrand       2021-04-23  1428   *
35d7bdc86031a2 David Hildenbrand       2021-04-23  1429   * Main user is sys_prctl(PR_SET_MM_MAP/EXE_FILE).
35d7bdc86031a2 David Hildenbrand       2021-04-23  1430   */
35d7bdc86031a2 David Hildenbrand       2021-04-23  1431  int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
35d7bdc86031a2 David Hildenbrand       2021-04-23  1432  {
35d7bdc86031a2 David Hildenbrand       2021-04-23 @1433  	int ret = 0;
35d7bdc86031a2 David Hildenbrand       2021-04-23  1434  
35d7bdc86031a2 David Hildenbrand       2021-04-23  1435  	get_file(new_exe_file);
fe69d560b5bd9e David Hildenbrand       2021-04-23  1436  
a7031f14525751 Mateusz Guzik           2023-08-14  1437  	/* set the new file */
a7031f14525751 Mateusz Guzik           2023-08-14  1438  	mmap_write_lock(mm);
a7031f14525751 Mateusz Guzik           2023-08-14 @1439  	old_exe_file = rcu_dereference_raw(mm->exe_file);
a7031f14525751 Mateusz Guzik           2023-08-14  1440  	rcu_assign_pointer(mm->exe_file, new_exe_file);
a7031f14525751 Mateusz Guzik           2023-08-14  1441  	mmap_write_unlock(mm);
a7031f14525751 Mateusz Guzik           2023-08-14  1442  
2a010c41285345 Christian Brauner       2024-05-31  1443  	if (old_exe_file)
35d7bdc86031a2 David Hildenbrand       2021-04-23  1444  		fput(old_exe_file);
35d7bdc86031a2 David Hildenbrand       2021-04-23  1445  	return 0;
35d7bdc86031a2 David Hildenbrand       2021-04-23  1446  }
3864601387cf41 Jiri Slaby              2011-05-26  1447  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by tblodt@icloud.com 1 year, 3 months ago
> On Aug 29, 2024, at 7:55 PM, kernel test robot <lkp@intel.com> wrote:
> 
> Hi Theodore,
> 
> kernel test robot noticed the following build errors:

Good bot
Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by Eric W. Biederman 1 year, 3 months ago
Theodore Dubois <tblodt@icloud.com> writes:

> As far as I can tell, the original purpose of this check was simply as
> the easiest way to work with a quirk of /proc/self/exe at the time. From
> the original patch[1]:
>
>     Note it allows to change /proc/$pid/exe iif there
>     are no VM_EXECUTABLE vmas present for current process,
>     simply because this feature is a special to C/R
>     and mm::num_exe_file_vmas become meaningless after
>     that.
>
> num_exe_file_vmas was created to preserve a quirk of the original
> /proc/self/exe implementation: if you unmapped all executable VMAs,
> /proc/self/exe would disappear (because it worked by scanning the
> address space for the first executable VMA.) Keeping the quirk after
> switching to just saving the executable on the mm worked by keeping a
> count of executable VMAs in num_exe_file_vmas, and zeroing exe_file when
> it reached zero. You can probably see how it would have been annoying to
> handle both num_exe_file_vmas and this prctl intending to change
> exe_file, and it's easier to only allow changing exe_file after
> num_exe_file_vmas has already gone to 0 and made exe_file null.
>
> However, num_exe_file_vmas no longer exists[2]. This quirk was taken out
> because it would save a bit in the vma flags, and it seems clear by now
> that nobody was relying on it. These days you can simply update exe_file
> with no interference.
>
> Recently a use case for this prctl has come up outside of
> checkpoint/restore, namely binfmt_misc based emulators such as FEX[3].
> Any program that uses /proc/self/exe will, of course, expect it to point
> to its own executable. But when executed through binfmt_misc, it will be
> the emulator, resulting in compatibility issues. Emulators currently
> have to attempting to intercept syscalls targeting /proc/self/exe to
> redirect the path, and this is not possible in the general case
> considering how flexible path resolution is. For more detail on this see
> [3].
>
> The above seems to me like a solid case for simply dropping the check.
> It's also worth noting that it is already possible to achieve the same
> result by the laborious and complex process of just unmapping all your
> code and remapping it again after the switch (just remember to put the
> code that does this in a .so!), so this is not strictly allowing
> anything that wasn't allowed before. It's just cutting red tape.

One of my original concerns is that allowing changing the /proc/self/exe
has the potential to make /proc/self/exe unreliable and specifically it
has the potential for a rouge program to hide itself by setting a false
/proc/self/exe.

That is part of the reason for the red tape.

Maybe I am wrong but I am concerned that your change may be making it
too easy to change /proc/self/exe, and would too easily allow setting a
false /proc/self/exe.

So it may make better sense to have a special case for interpreters,
so we don't have to worry about people setting a false /proc/self/exe.

Looking at the code I am a bit perplexed at the moment as I don't see
a check currently to ensure the new exe_file is actually mapped.

...

Beyond that your change reduces replace_mm_exe_file to set_mm_exe_file
so it would probably make sense to combine the two of them if you
are going to go this far.

Eric


> [1]: https://lore.kernel.org/lkml/20120316210343.925446961@openvz.org/
> [2]: https://lore.kernel.org/lkml/20120731104230.20515.72416.stgit@zurg/
> [3]: https://lore.kernel.org/lkml/CABnRqDdzqfB1_ixd-2JnfSocKvXNM+9ivM1hhd1C=ejLQyen8g@mail.gmail.com/
>
> Signed-off-by: Theodore Dubois <tblodt@icloud.com>
> Cc: Ryan Houdek <sonicadvance1@gmail.com>
> Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Kees Cook <keescook@chromium.org>
> ---
>  kernel/fork.c | 22 ----------------------
>  1 file changed, 22 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index cc760491f..407e515b9 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1430,30 +1430,8 @@ int set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
>   */
>  int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
>  {
> -	struct vm_area_struct *vma;
> -	struct file *old_exe_file;
>  	int ret = 0;
>  
> -	/* Forbid mm->exe_file change if old file still mapped. */
> -	old_exe_file = get_mm_exe_file(mm);
> -	if (old_exe_file) {
> -		VMA_ITERATOR(vmi, mm, 0);
> -		mmap_read_lock(mm);
> -		for_each_vma(vmi, vma) {
> -			if (!vma->vm_file)
> -				continue;
> -			if (path_equal(&vma->vm_file->f_path,
> -				       &old_exe_file->f_path)) {
> -				ret = -EBUSY;
> -				break;
> -			}
> -		}
> -		mmap_read_unlock(mm);
> -		fput(old_exe_file);
> -		if (ret)
> -			return ret;
> -	}
> -
>  	get_file(new_exe_file);
>  
>  	/* set the new file */
Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by tblodt@icloud.com 1 year, 3 months ago
> On Aug 28, 2024, at 6:32 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> One of my original concerns is that allowing changing the /proc/self/exe
> has the potential to make /proc/self/exe unreliable and specifically it
> has the potential for a rouge program to hide itself by setting a false
> /proc/self/exe.
> 
> That is part of the reason for the red tape.
> 
> Maybe I am wrong but I am concerned that your change may be making it
> too easy to change /proc/self/exe, and would too easily allow setting a
> false /proc/self/exe.
> 
> So it may make better sense to have a special case for interpreters,
> so we don't have to worry about people setting a false /proc/self/exe.

After sending this it was pointed out to me that the ioctl actually requires CAP_SYS_RESOURCE. On one hand, this resolves the problem of a rogue program - it would already have to be privileged. On the other hand, this makes it that much harder for FEX to use because it would have to rely on getting setcapped during installation, and we'd like to be able to make use of sandboxed binfmt_misc[1] to install FEX in an unprivileged user namespace.

I would prefer resurrecting the last proposal for a special case for interpreters in some form, given the above and also that the previous proposal would always allow you to see the interpreter from a different /proc file, *somewhat* reducing the security concerns...

A thought: Being registered in binfmt_misc already means you're privileged, so it would seem to be ok to allow the interpreter this extra privilege. In the case of unprivileged/sandboxed binfmt_misc, it would be sufficient to only change the view of /proc/self/exe for processes in the user namespace with the interpreter registered. But is it ok for a view of procfs to be dependent on the user namespace and not just the pid namespace..? is there prior art?

> Looking at the code I am a bit perplexed at the moment as I don't see
> a check currently to ensure the new exe_file is actually mapped.

Yes - since the check was just to avoid confusion with num_exe_file_vmas, not for any serious reason, it seems...

> 
> ...
> 
> Beyond that your change reduces replace_mm_exe_file to set_mm_exe_file
> so it would probably make sense to combine the two of them if you
> are going to go this far.
> 
> Eric

[1]: https://lore.kernel.org/all/20211216112659.310979-2-brauner@kernel.org/
Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by David Hildenbrand 1 year, 3 months ago
On 27.08.24 23:59, Theodore Dubois wrote:


> As far as I can tell, the original purpose of this check was simply as
> the easiest way to work with a quirk of /proc/self/exe at the time. From
> the original patch[1]:
> 
>      Note it allows to change /proc/$pid/exe iif there
>      are no VM_EXECUTABLE vmas present for current process,
>      simply because this feature is a special to C/R
>      and mm::num_exe_file_vmas become meaningless after
>      that.
> 
> num_exe_file_vmas was created to preserve a quirk of the original
> /proc/self/exe implementation: if you unmapped all executable VMAs,
> /proc/self/exe would disappear (because it worked by scanning the
> address space for the first executable VMA.) Keeping the quirk after
> switching to just saving the executable on the mm worked by keeping a
> count of executable VMAs in num_exe_file_vmas, and zeroing exe_file when
> it reached zero. You can probably see how it would have been annoying to
> handle both num_exe_file_vmas and this prctl intending to change
> exe_file, and it's easier to only allow changing exe_file after
> num_exe_file_vmas has already gone to 0 and made exe_file null.
> 
> However, num_exe_file_vmas no longer exists[2]. This quirk was taken out
> because it would save a bit in the vma flags, and it seems clear by now
> that nobody was relying on it. These days you can simply update exe_file
> with no interference.
> 
> Recently a use case for this prctl has come up outside of
> checkpoint/restore, namely binfmt_misc based emulators such as FEX[3].
> Any program that uses /proc/self/exe will, of course, expect it to point
> to its own executable. But when executed through binfmt_misc, it will be
> the emulator, resulting in compatibility issues. Emulators currently
> have to attempting to intercept syscalls targeting /proc/self/exe to
> redirect the path, and this is not possible in the general case
> considering how flexible path resolution is. For more detail on this see
> [3].
> 
> The above seems to me like a solid case for simply dropping the check.

Interestingly, the man page states:

"You can even type /proc/pid/exe to run another copy of the same 
executable that is being run by process pid."

Is that still true (with that binfmt_misc magic) once we change 
/proc/self/exe? Or does it with changing /proc/self/exe to point at the 
non-emulator exe even work as expected regarding this documentation?

It's a good question what will change if processes start setting random 
other stuff while they are still executing part of the original binary.

commit 4229fb1dc6843c49a14bb098719f8a696cdc44f8
Author: Konstantin Khlebnikov <khlebnikov@openvz.org>
Date:   Wed Jul 11 14:02:11 2012 -0700

     c/r: prctl: less paranoid prctl_set_mm_exe_file()

temporarily switch to checking that not other files besides the 
executable are still mapped.


I agree that b32dfe3771 reads like this check was added primarily " 
because this feature is a special to C/R and mm::num_exe_file_vmas 
become meaningless after that"

Reading mailing list discussions, there was a discussion regarding 
security risks [1] with the conclusion being "We must not trust 
/proc/pid/exe in anyway. An attacker can always execute another binary 
without calling execve()." [2].


So with that in mind, no objection. Clarifying which effect this has on 
what's stated in the man page would be interesting.


[1] 
https://lore.kernel.org/lkml/CAFLxGvxEDs=RG7tX+j6XEUx2+wEvuCGipUzh2vSp3rj15Rq6zA@mail.gmail.com/
[2] 
https://lore.kernel.org/lkml/CAFLxGvyCRAq6t8_ni+VFUVpOGJ4-iz0i=PRFEFpVJ+ZaPEb3-g@mail.gmail.com/


-- 
Cheers,

David / dhildenb
Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by tblodt@icloud.com 1 year, 3 months ago
> Interestingly, the man page states:
> 
> "You can even type /proc/pid/exe to run another copy of the same executable that is being run by process pid."
> 
> Is that still true (with that binfmt_misc magic) once we change /proc/self/exe? Or does it with changing /proc/self/exe to point at the non-emulator exe even work as expected regarding this documentation?

This is actually one of the reasons for an emulator to want to update /proc/self/exe. If it points to the interpreter, running /proc/pid/exe starts a copy of the emulator, but without any idea of what program it was supposed to be running. If it points to the emulated program, running /proc/pid/exe will still start the emulator because it's registered in binfmt_misc, but with the emulated program. The intended result is for references to /proc/self/exe to function the same way they would without the emulator.

~Theodore
Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe
Posted by tblodt@icloud.com 1 year, 3 months ago
PS:

1. I'm not fully sure of my research here, please contradict me if there's
   some non-obvious reason to keep this check.

2. While writing this I discovered that someone else had submitted
   essentially the same patch independently about three years ago and
   gotten no reply. Hopefully this one will have better luck.

~Theodore