[PATCH 0/7] fork: Make init and umh ordinary tasks

Eric W. Biederman posted 7 patches 4 years, 1 month ago
arch/alpha/kernel/process.c      | 13 ++++++------
arch/arc/kernel/process.c        | 13 ++++++------
arch/arm/kernel/process.c        | 12 ++++++-----
arch/arm64/kernel/process.c      | 12 ++++++-----
arch/csky/kernel/process.c       | 15 ++++++-------
arch/h8300/kernel/process.c      | 10 ++++-----
arch/hexagon/kernel/process.c    | 12 ++++++-----
arch/ia64/kernel/process.c       | 15 +++++++------
arch/m68k/kernel/process.c       | 12 ++++++-----
arch/microblaze/kernel/process.c | 12 ++++++-----
arch/mips/kernel/process.c       | 13 ++++++------
arch/nios2/kernel/process.c      | 12 ++++++-----
arch/openrisc/kernel/process.c   | 12 ++++++-----
arch/parisc/kernel/process.c     | 18 +++++++++-------
arch/powerpc/kernel/process.c    | 15 +++++++------
arch/riscv/kernel/process.c      | 12 ++++++-----
arch/s390/kernel/process.c       | 12 ++++++-----
arch/sh/kernel/process_32.c      | 12 ++++++-----
arch/sparc/kernel/process_32.c   | 12 ++++++-----
arch/sparc/kernel/process_64.c   | 12 ++++++-----
arch/um/kernel/process.c         | 15 +++++++------
arch/x86/include/asm/fpu/sched.h |  2 +-
arch/x86/include/asm/switch_to.h |  8 +++----
arch/x86/kernel/fpu/core.c       |  4 ++--
arch/x86/kernel/process.c        | 18 +++++++++-------
arch/xtensa/kernel/process.c     | 17 ++++++++-------
fs/exec.c                        |  8 ++++---
include/linux/sched/task.h       |  8 +++++--
init/initramfs.c                 |  2 ++
init/main.c                      |  2 +-
kernel/fork.c                    | 46 +++++++++++++++++++++++++++++++++-------
kernel/umh.c                     |  6 +++---
32 files changed, 233 insertions(+), 159 deletions(-)
[PATCH 0/7] fork: Make init and umh ordinary tasks
Posted by Eric W. Biederman 4 years, 1 month ago
In commit 40966e316f86 ("kthread: Ensure struct kthread is present for
all kthreads") caused init and the user mode helper threads that call
kernel_execve to have struct kthread allocated for them.

I believe my first patch in this series is enough to fix the bug
and is simple enough and obvious enough to be backportable.

The rest of the changes pass struct kernel_clone_args to clean things
up and cause the code to make sense.

There is one rough spot in this change.  In the init process before the
user space init process is exec'd there is a lot going on.  I have found
when async_schedule_domain is low on memory or has more than 32K callers
executing do_populate_rootfs will now run in a user space thread making
flush_delayed_fput meaningless, and __fput_sync is unusable.  I solved
this as I did in usermode_driver.c with an added explicit task_work_run.
I point this out as I have seen some talk about making flushing file
handles more explicit.

Eric W. Biederman (7):
      kthread: Don't allocate kthread_struct for init and umh
      fork: Pass struct kernel_clone_args into copy_thread
      fork: Explicity test for idle tasks in copy_thread
      fork: Generalize PF_IO_WORKER handling
      init: Deal with the init process being a user mode process
      fork: Explicitly set PF_KTHREAD
      fork: Stop allowing kthreads to call execve

 arch/alpha/kernel/process.c      | 13 ++++++------
 arch/arc/kernel/process.c        | 13 ++++++------
 arch/arm/kernel/process.c        | 12 ++++++-----
 arch/arm64/kernel/process.c      | 12 ++++++-----
 arch/csky/kernel/process.c       | 15 ++++++-------
 arch/h8300/kernel/process.c      | 10 ++++-----
 arch/hexagon/kernel/process.c    | 12 ++++++-----
 arch/ia64/kernel/process.c       | 15 +++++++------
 arch/m68k/kernel/process.c       | 12 ++++++-----
 arch/microblaze/kernel/process.c | 12 ++++++-----
 arch/mips/kernel/process.c       | 13 ++++++------
 arch/nios2/kernel/process.c      | 12 ++++++-----
 arch/openrisc/kernel/process.c   | 12 ++++++-----
 arch/parisc/kernel/process.c     | 18 +++++++++-------
 arch/powerpc/kernel/process.c    | 15 +++++++------
 arch/riscv/kernel/process.c      | 12 ++++++-----
 arch/s390/kernel/process.c       | 12 ++++++-----
 arch/sh/kernel/process_32.c      | 12 ++++++-----
 arch/sparc/kernel/process_32.c   | 12 ++++++-----
 arch/sparc/kernel/process_64.c   | 12 ++++++-----
 arch/um/kernel/process.c         | 15 +++++++------
 arch/x86/include/asm/fpu/sched.h |  2 +-
 arch/x86/include/asm/switch_to.h |  8 +++----
 arch/x86/kernel/fpu/core.c       |  4 ++--
 arch/x86/kernel/process.c        | 18 +++++++++-------
 arch/xtensa/kernel/process.c     | 17 ++++++++-------
 fs/exec.c                        |  8 ++++---
 include/linux/sched/task.h       |  8 +++++--
 init/initramfs.c                 |  2 ++
 init/main.c                      |  2 +-
 kernel/fork.c                    | 46 +++++++++++++++++++++++++++++++++-------
 kernel/umh.c                     |  6 +++---
 32 files changed, 233 insertions(+), 159 deletions(-)

Eric
Re: [PATCH 0/7] fork: Make init and umh ordinary tasks
Posted by Qian Cai 4 years, 1 month ago
On Fri, May 06, 2022 at 09:11:36AM -0500, Eric W. Biederman wrote:
> 
> In commit 40966e316f86 ("kthread: Ensure struct kthread is present for
> all kthreads") caused init and the user mode helper threads that call
> kernel_execve to have struct kthread allocated for them.
> 
> I believe my first patch in this series is enough to fix the bug
> and is simple enough and obvious enough to be backportable.
> 
> The rest of the changes pass struct kernel_clone_args to clean things
> up and cause the code to make sense.
> 
> There is one rough spot in this change.  In the init process before the
> user space init process is exec'd there is a lot going on.  I have found
> when async_schedule_domain is low on memory or has more than 32K callers
> executing do_populate_rootfs will now run in a user space thread making
> flush_delayed_fput meaningless, and __fput_sync is unusable.  I solved
> this as I did in usermode_driver.c with an added explicit task_work_run.
> I point this out as I have seen some talk about making flushing file
> handles more explicit.

Reverting the last 3 commits of the series fixed a boot crash.

1b2552cbdbe0 fork: Stop allowing kthreads to call execve
753550eb0ce1 fork: Explicitly set PF_KTHREAD
68d85f0a33b0 init: Deal with the init process being a user mode process

 BUG: KASAN: null-ptr-deref in task_nr_scan_windows.isra.0
 arch_atomic_long_read at ./include/linux/atomic/atomic-long.h:29
 (inlined by) atomic_long_read at ./include/linux/atomic/atomic-instrumented.h:1266
 (inlined by) get_mm_counter at ./include/linux/mm.h:1996
 (inlined by) get_mm_rss at ./include/linux/mm.h:2049
 (inlined by) task_nr_scan_windows at kernel/sched/fair.c:1123
 Read of size 8 at addr 00000000000003d0 by task swapper/0/1

 CPU: 72 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc6-next-20220509-dirty #29
 Call trace:
  dump_backtrace
  show_stack
  dump_stack_lvl
  print_report
  kasan_report
  kasan_check_range
  __kasan_check_read
  task_nr_scan_windows.isra.0
  task_scan_start
  task_scan_min at /home/user/linux/kernel/sched/fair.c:1144
  (inlined by) task_scan_start at /home/user/linux/kernel/sched/fair.c:1150
  task_tick_fair
  task_tick_numa at /home/user/linux/kernel/sched/fair.c:2944
  (inlined by) task_tick_fair at /home/user/linux/kernel/sched/fair.c:11186
  scheduler_tick
  update_process_times
  tick_periodic
  tick_handle_periodic
  arch_timer_handler_phys
  handle_percpu_devid_irq
  generic_handle_domain_irq
  gic_handle_irq
  call_on_irq_stack
  do_interrupt_handler
  el1_interrupt
  el1h_64_irq_handler
  el1h_64_irq
  split_page
  make_alloc_exact
  alloc_pages_exact_nid
  init_section_page_ext
  page_ext_init
  kernel_init_freeable
  kernel_init
  ret_from_fork
 ==================================================================
 Disabling lock debugging due to kernel taint
 Unable to handle kernel paging request at virtual address dfff80000000007a
 KASAN: null-ptr-deref in range [0x00000000000003d0-0x00000000000003d7]
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 [dfff80000000007a] address between user and kernel address ranges
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 72 PID: 1 Comm: swapper/0 Tainted: G    B             5.18.0-rc6-next-20220509-dirty #29
 pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : task_nr_scan_windows.isra.0
 lr : task_nr_scan_windows.isra.0
 sp : ffff800008487cb0
 x29: ffff800008487cb0 x28: ffff07ff89728040 x27: 000000003bc47ee0
 x26: ffff08367f088980 x25: 1fffe0fff12e525f x24: ffff07ff897292f8
 x23: ffff07ff89728040 x22: 1fffe0fff12e5262 x21: 0000000000010000
 x20: 00000000000003d0 x19: 0000000000000000 x18: ffffdd41783f7d1c
 x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
 x14: 3d3d3d3d3d3d3d3d x13: 746e696174206c65 x12: ffff7ba82f3b98b5
 x11: 1ffffba82f3b98b4 x10: ffff7ba82f3b98b4 x9 : dfff800000000000
 x8 : ffffdd4179dcc5a7 x7 : 0000000000000001 x6 : ffff7ba82f3b98b4
 x5 : ffffdd4179dcc5a0 x4 : ffff7ba82f3b98b5 x3 : ffffdd4171de2b14
 x2 : 0000000000000001 x1 : 000000000000007a x0 : dfff800000000000
 Call trace:
  task_nr_scan_windows.isra.0
  task_scan_start
  task_tick_fair
  scheduler_tick
  update_process_times
  tick_periodic
  tick_handle_periodic
  arch_timer_handler_phys
  handle_percpu_devid_irq
  generic_handle_domain_irq
  gic_handle_irq
  call_on_irq_stack
  do_interrupt_handler
  el1_interrupt
  el1h_64_irq_handler
  el1h_64_irq
  split_page
  make_alloc_exact
  alloc_pages_exact_nid
  init_section_page_ext
  page_ext_init
  kernel_init_freeable
  kernel_init
  ret_from_fork
 Code: d343fe81 d2d00000 f2fbffe0 53185eb5 (38e06820)
Re: [PATCH 0/7] fork: Make init and umh ordinary tasks
Posted by Eric W. Biederman 4 years, 1 month ago
Qian Cai <quic_qiancai@quicinc.com> writes:

> On Fri, May 06, 2022 at 09:11:36AM -0500, Eric W. Biederman wrote:
>> 
>> In commit 40966e316f86 ("kthread: Ensure struct kthread is present for
>> all kthreads") caused init and the user mode helper threads that call
>> kernel_execve to have struct kthread allocated for them.
>> 
>> I believe my first patch in this series is enough to fix the bug
>> and is simple enough and obvious enough to be backportable.
>> 
>> The rest of the changes pass struct kernel_clone_args to clean things
>> up and cause the code to make sense.
>> 
>> There is one rough spot in this change.  In the init process before the
>> user space init process is exec'd there is a lot going on.  I have found
>> when async_schedule_domain is low on memory or has more than 32K callers
>> executing do_populate_rootfs will now run in a user space thread making
>> flush_delayed_fput meaningless, and __fput_sync is unusable.  I solved
>> this as I did in usermode_driver.c with an added explicit task_work_run.
>> I point this out as I have seen some talk about making flushing file
>> handles more explicit.
>
> Reverting the last 3 commits of the series fixed a boot crash.
>
> 1b2552cbdbe0 fork: Stop allowing kthreads to call execve
> 753550eb0ce1 fork: Explicitly set PF_KTHREAD
> 68d85f0a33b0 init: Deal with the init process being a user mode process

Hmm.  It looks like I missed a little detail.

task_tick_fair
  task_tick_numa
    task_scan_start
      task_scan_min
        task_nr_scan_windows
          p->mm

If I read this code right task_tick_numa makes the assumption that only
tasks with PF_KTHREAD set don't have an mm.

This should fix the failure.  For init we could possibly populate .mm
and not just .active_mm.  For user mode helpers cloned from kernel
threads I don't think that is a realistic option.  So I think this
is going to be the proper fix.

I believe this only happens when numa rebalancing happens at an
unfortunate moment.

Qian Cai can you test this?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..db6f0df9d43e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2915,7 +2915,7 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr)
        /*
         * We don't care about NUMA placement if we don't have memory.
         */
-       if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
+       if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
                return;
 
        /*


Eric
Re: [PATCH 0/7] fork: Make init and umh ordinary tasks
Posted by Qian Cai 4 years, 1 month ago
On Mon, May 09, 2022 at 04:52:07PM -0500, Eric W. Biederman wrote:
> I believe this only happens when numa rebalancing happens at an
> unfortunate moment.
> 
> Qian Cai can you test this?

Yes, this works fine so far.