[PATCH -v5 0/8] sched: Make task_struct::thread constant size, x86/fpu: Remove thread::fpu

Ingo Molnar posted 8 patches 10 months ago
arch/x86/include/asm/fpu/api.h   |  2 +-
arch/x86/include/asm/fpu/sched.h |  4 +-
arch/x86/include/asm/processor.h | 23 ++++++------
arch/x86/kernel/fpu/context.h    |  4 +-
arch/x86/kernel/fpu/core.c       | 80 +++++++++++++++++++++++-----------------
arch/x86/kernel/fpu/init.c       | 21 ++++++-----
arch/x86/kernel/fpu/regset.c     | 22 +++++------
arch/x86/kernel/fpu/signal.c     | 18 ++++-----
arch/x86/kernel/fpu/xstate.c     | 27 ++++++--------
arch/x86/kernel/fpu/xstate.h     |  6 +--
arch/x86/kernel/process.c        |  9 ++---
arch/x86/kernel/signal.c         |  6 +--
arch/x86/kernel/traps.c          |  2 +-
arch/x86/math-emu/fpu_aux.c      |  2 +-
arch/x86/math-emu/fpu_entry.c    |  4 +-
arch/x86/math-emu/fpu_system.h   |  2 +-
arch/x86/mm/extable.c            |  2 +-
include/linux/sched.h            | 15 ++------
18 files changed, 126 insertions(+), 123 deletions(-)
[PATCH -v5 0/8] sched: Make task_struct::thread constant size, x86/fpu: Remove thread::fpu
Posted by Ingo Molnar 10 months ago
This series is one of the dependencies of the fast-headers work,
which aims to reduce header complexity by removing <asm/processor.h>
from the <linux/sched.h> dependency chain, which headers are headers
are fat enough already even if we do not combine them.

To achieve that decoupling, one of the key steps is to not embedd any
C types from <asm/processor.h> into task_struct.

The only architecture that relies on that in a serious fashion is x86,
via the 'struct thread::fpu' variable size structure. The series below
attempts to resolve it by using a calculated fpu context area address
value via the x86_task_fpu() helper. The allocation layout of
task_struct + fpu-save-area doesn't change.

The -v5 version is a refresh of the -v4 series to v6.15-rc1.

The Git tree of these commits can also be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/fpu

Thanks,

	Ingo

===============>
Ingo Molnar (8):
  x86/fpu: Introduce the x86_task_fpu() helper method
  x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu()
  x86/fpu: Make task_struct::thread constant size
  x86/fpu: Remove the thread::fpu pointer
  x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() call
  x86/fpu: Make sure x86_task_fpu() doesn't get called for PF_KTHREAD|PF_USER_WORKER tasks during exit
  x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks
  x86/fpu: Use 'fpstate' variable names consistently

 arch/x86/include/asm/fpu/api.h   |  2 +-
 arch/x86/include/asm/fpu/sched.h |  4 +-
 arch/x86/include/asm/processor.h | 23 ++++++------
 arch/x86/kernel/fpu/context.h    |  4 +-
 arch/x86/kernel/fpu/core.c       | 80 +++++++++++++++++++++++-----------------
 arch/x86/kernel/fpu/init.c       | 21 ++++++-----
 arch/x86/kernel/fpu/regset.c     | 22 +++++------
 arch/x86/kernel/fpu/signal.c     | 18 ++++-----
 arch/x86/kernel/fpu/xstate.c     | 27 ++++++--------
 arch/x86/kernel/fpu/xstate.h     |  6 +--
 arch/x86/kernel/process.c        |  9 ++---
 arch/x86/kernel/signal.c         |  6 +--
 arch/x86/kernel/traps.c          |  2 +-
 arch/x86/math-emu/fpu_aux.c      |  2 +-
 arch/x86/math-emu/fpu_entry.c    |  4 +-
 arch/x86/math-emu/fpu_system.h   |  2 +-
 arch/x86/mm/extable.c            |  2 +-
 include/linux/sched.h            | 15 ++------
 18 files changed, 126 insertions(+), 123 deletions(-)

-- 
2.45.2
Re: [PATCH -v5 0/8] sched: Make task_struct::thread constant size, x86/fpu: Remove thread::fpu
Posted by Oleg Nesterov 9 months, 2 weeks ago
Ingo, sorry for delay.

So just in case, the whole series looks good to me. I am going to send a
couple of minor cleanups on top of it, but let me ask first if I missed
something or not.

- x86_init_fpu is not really used after 4/8, it can be killed

- DEFINE_EVENT(x86_fpu, x86_fpu_copy_src) can be killed after 7/8

- arch_dup_task_struct() still does

	/* init_task is not dynamically sized (incomplete FPU state) */
	if (unlikely(src == &init_task))
		memcpy_and_pad(dst, arch_task_struct_size, src, sizeof(init_task), 0);
	else
		memcpy(dst, src, arch_task_struct_size);

  and I don't understand why do we need to check src == &init_task. It seems
  that we can always do

	memcpy_and_pad(dst, arch_task_struct_size, src, sizeof(struct task_struct), 0);

  or even just

	memcpy(dst, src, sizeof(struct task_struct));

  fpu_clone() will initialize the "dst_fpu" memory correctly.

- fpu__drop() does

	/* PF_KTHREAD tasks do not use the FPU context area: */
	if (tsk->flags & (PF_KTHREAD | PF_USER_WORKER))
		return;

  and this is correct. But perhaps

	if (test_tsk_thread_flag(tsk, TIF_NEED_FPU_LOAD))
		return;

  makes more sense? PF_KTHREAD's should never clear TIF_NEED_FPU_LOAD,
  and this way we can avoid the unnecessary "fwait" if, say, the exiting
  task does context_switch() at least once on its way to exit_thread().

- Finally, with or without these changes, it seems that the
  switch_fpu_prepare() + switch_fpu_finish() logic can be simplified,
  I'll write another email.
Re: [PATCH -v5 0/8] sched: Make task_struct::thread constant size, x86/fpu: Remove thread::fpu
Posted by Ingo Molnar 9 months, 2 weeks ago
* Oleg Nesterov <oleg@redhat.com> wrote:

> Ingo, sorry for delay.
> 
> So just in case, the whole series looks good to me. I am going to send a
> couple of minor cleanups on top of it,

Great, please do!

> [...] but let me ask first if I missed something or not.

You probably didn't. :-)

> - x86_init_fpu is not really used after 4/8, it can be killed

Indeed!

> 
> - DEFINE_EVENT(x86_fpu, x86_fpu_copy_src) can be killed after 7/8

Agreed.

> - arch_dup_task_struct() still does
> 
> 	/* init_task is not dynamically sized (incomplete FPU state) */
> 	if (unlikely(src == &init_task))
> 		memcpy_and_pad(dst, arch_task_struct_size, src, sizeof(init_task), 0);
> 	else
> 		memcpy(dst, src, arch_task_struct_size);
> 
>   and I don't understand why do we need to check src == &init_task. It seems
>   that we can always do
> 
> 	memcpy_and_pad(dst, arch_task_struct_size, src, sizeof(struct task_struct), 0);
> 
>   or even just
> 
> 	memcpy(dst, src, sizeof(struct task_struct));
>
>   fpu_clone() will initialize the "dst_fpu" memory correctly.

Unecessary paranoia on my part, please send a patch to simplify this.

> - fpu__drop() does
> 
> 	/* PF_KTHREAD tasks do not use the FPU context area: */
> 	if (tsk->flags & (PF_KTHREAD | PF_USER_WORKER))
> 		return;
> 
>   and this is correct. But perhaps
> 
> 	if (test_tsk_thread_flag(tsk, TIF_NEED_FPU_LOAD))
> 		return;
> 
>   makes more sense? PF_KTHREAD's should never clear TIF_NEED_FPU_LOAD,
>   and this way we can avoid the unnecessary "fwait" if, say, the exiting
>   task does context_switch() at least once on its way to exit_thread().

I think you are right here as well.

> - Finally, with or without these changes, it seems that the
>   switch_fpu_prepare() + switch_fpu_finish() logic can be simplified,
>   I'll write another email.

Thank you!

	Ingo