[PATCH RESEND v4 0/7] futex: Create set_robust_list2

André Almeida posted 7 patches 3 months, 3 weeks ago
There is a newer version of this series
arch/alpha/kernel/syscalls/syscall.tbl             |   1 +
arch/arm/tools/syscall.tbl                         |   1 +
arch/m68k/kernel/syscalls/syscall.tbl              |   1 +
arch/microblaze/kernel/syscalls/syscall.tbl        |   1 +
arch/mips/kernel/syscalls/syscall_n32.tbl          |   1 +
arch/mips/kernel/syscalls/syscall_n64.tbl          |   1 +
arch/mips/kernel/syscalls/syscall_o32.tbl          |   1 +
arch/parisc/kernel/syscalls/syscall.tbl            |   1 +
arch/powerpc/kernel/syscalls/syscall.tbl           |   1 +
arch/s390/kernel/syscalls/syscall.tbl              |   1 +
arch/sh/kernel/syscalls/syscall.tbl                |   1 +
arch/sparc/kernel/syscalls/syscall.tbl             |   1 +
arch/x86/entry/syscalls/syscall_32.tbl             |   1 +
arch/x86/entry/syscalls/syscall_64.tbl             |   1 +
arch/xtensa/kernel/syscalls/syscall.tbl            |   1 +
include/linux/compat.h                             |  12 +-
include/linux/futex.h                              |  16 +-
include/linux/sched.h                              |   5 +-
include/uapi/asm-generic/unistd.h                  |   2 +
include/uapi/linux/futex.h                         |  24 +
kernel/futex/core.c                                | 165 ++++-
kernel/futex/futex.h                               |   5 +
kernel/futex/syscalls.c                            |  85 ++-
kernel/sys_ni.c                                    |   1 +
scripts/syscall.tbl                                |   1 +
.../testing/selftests/futex/functional/.gitignore  |   1 +
tools/testing/selftests/futex/functional/Makefile  |   3 +-
.../selftests/futex/functional/robust_list.c       | 706 +++++++++++++++++++++
tools/testing/selftests/futex/include/logging.h    |  38 ++
29 files changed, 1026 insertions(+), 53 deletions(-)
[PATCH RESEND v4 0/7] futex: Create set_robust_list2
Posted by André Almeida 3 months, 3 weeks ago
This patch adds a new robust_list() syscall. The current syscall
can't be expanded to cover the following use case, so a new one is
needed. This new syscall allows users to set multiple robust lists per
process and to have either 32bit or 64bit pointers in the list.

* Use case

FEX-Emu[1] is an application that runs x86 and x86-64 binaries on an
AArch64 Linux host. One of the tasks of FEX-Emu is to translate syscalls
from one platform to another. Existing set_robust_list() can't be easily
translated because of two limitations:

1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel
   this is not a problem, because of the compat entry point. But there's
   no such compat entry point for AArch64, so the kernel would do the
   pointer arithmetic wrongly. Is also unviable to userspace to keep
   track every addition/removal to the robust list and keep a 64bit
   version of it somewhere else to feed the kernel. Thus, the new
   interface has an option of telling the kernel if the list is filled
   with 32bit or 64bit pointers.

2) Apps can set just one robust list (in theory, x86-64 can set two if
   they also use the compat entry point). That means that when a x86 app
   asks FEX-Emu to call set_robust_list(), FEX have two options: to
   overwrite their own robust list pointer and make the app robust, or
   to ignore the app robust list and keep the emulator robust. The new
   interface allows for multiple robust lists per application, solving
   this.

* Interface

This is the proposed interface:

	long set_robust_list2(void *head, int index, unsigned int flags)

`head` is the head of the userspace struct robust_list_head, just as old
set_robust_list(). It needs to be a void pointer since it can point to a normal
robust_list_head or a compat_robust_list_head.

`flags` can be used for defining the list type:

	enum robust_list_type {
	 	ROBUST_LIST_32BIT,
		ROBUST_LIST_64BIT,
	 };

`index` is the index in the internal robust_list's linked list (the naming
starts to get confusing, I reckon). If `index == -1`, that means that user wants
to set a new robust_list, and the kernel will append it in the end of the list,
assign a new index and return this index to the user. If `index >= 0`, that
means that user wants to re-set `*head` of an already existing list (similarly
to what happens when you call set_robust_list() twice with different `*head`).

If `index` is out of range, or it points to a non-existing robust_list, or if
the internal list is full, an error is returned.

* Implementation

The implementation re-uses most of the existing robust list interface as
possible. The new task_struct member `struct list_head robust_list2` is just a
linked list where new lists are appended as the user requests more lists, and by
futex_cleanup(), the kernel walks through the internal list feeding
exit_robust_list() with the robust_list's.

This implementation supports up to 10 lists (defined at ROBUST_LISTS_PER_TASK),
but it was an arbitrary number for this RFC. For the described use case above, 4
should be enough, I'm not sure which should be the limit.

It doesn't support list removal (should it support?). It doesn't have a proper
get_robust_list2() yet as well, but I can add it in a next revision. We could
also have a generic robust_list() syscall that can be used to set/get and be
controlled by flags.

The new interface has a `unsigned int flags` argument, making it
extensible for future use cases as well.

It refuses unaligned `head` addresses. It doesn't have a limit for elements in a
single list (like ROBUST_LIST_LIMIT), it destroys the list as it is parsed to be
safe against circular lists.

* Testing

This patcheset has a selftest patch that expands this one:
https://lore.kernel.org/lkml/20250212131123.37431-1-andrealmeid@igalia.com/

Also, FEX-Emu added support for this interface to validate it:
https://github.com/FEX-Emu/FEX/pull/3966

Feedback is very welcomed!

Thanks,
	André

[1] https://github.com/FEX-Emu/FEX

Changelog:
- Rebased on top of new futex work (private hash)
v4: https://lore.kernel.org/lkml/20250225183531.682556-1-andrealmeid@igalia.com/

- Refuse unaligned head pointers
- Ignore ROBUST_LIST_LIMIT for lists created with this interface and make it
  robust against circular lists
- Fix a get_robust_list() syscall bug for getting the list from another thread
- Adapt selftest to use the new interface
v3: https://lore.kernel.org/lkml/20241217174958.477692-1-andrealmeid@igalia.com/

- Old syscall set_robust_list() adds new head to the internal linked list of
  robust lists pointers, instead of having a field just for them. Remove
  tsk->robust_list and use only tsk->robust_list2
v2: https://lore.kernel.org/lkml/20241101162147.284993-1-andrealmeid@igalia.com/

- Added a patch to properly deal with exit_robust_list() in 64bit vs 32bit
- Wired-up syscall for all archs
- Added more of the cover letter to the commit message
v1: https://lore.kernel.org/lkml/20241024145735.162090-1-andrealmeid@igalia.com/

---
André Almeida (7):
      selftests/futex: Add ASSERT_ macros
      selftests/futex: Create test for robust list
      futex: Use explicit sizes for compat_exit_robust_list
      futex: Create set_robust_list2
      futex: Wire up set_robust_list2 syscall
      futex: Remove the limit of elements for sys_set_robust_list2 lists
      selftests: futex: Expand robust list test for the new interface

 arch/alpha/kernel/syscalls/syscall.tbl             |   1 +
 arch/arm/tools/syscall.tbl                         |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl              |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl        |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl          |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl          |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl          |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl            |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl           |   1 +
 arch/s390/kernel/syscalls/syscall.tbl              |   1 +
 arch/sh/kernel/syscalls/syscall.tbl                |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl             |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl             |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl            |   1 +
 include/linux/compat.h                             |  12 +-
 include/linux/futex.h                              |  16 +-
 include/linux/sched.h                              |   5 +-
 include/uapi/asm-generic/unistd.h                  |   2 +
 include/uapi/linux/futex.h                         |  24 +
 kernel/futex/core.c                                | 165 ++++-
 kernel/futex/futex.h                               |   5 +
 kernel/futex/syscalls.c                            |  85 ++-
 kernel/sys_ni.c                                    |   1 +
 scripts/syscall.tbl                                |   1 +
 .../testing/selftests/futex/functional/.gitignore  |   1 +
 tools/testing/selftests/futex/functional/Makefile  |   3 +-
 .../selftests/futex/functional/robust_list.c       | 706 +++++++++++++++++++++
 tools/testing/selftests/futex/include/logging.h    |  38 ++
 29 files changed, 1026 insertions(+), 53 deletions(-)
---
base-commit: 3ee84e3dd88e39b55b534e17a7b9a181f1d46809
change-id: 20250225-tonyk-robust_futex-60adeedac695

Best regards,
-- 
André Almeida <andrealmeid@igalia.com>

Re: [PATCH RESEND v4 0/7] futex: Create set_robust_list2
Posted by Sebastian Andrzej Siewior 3 months, 3 weeks ago
On 2025-06-17 15:34:17 [-0300], André Almeida wrote:
> This patch adds a new robust_list() syscall. The current syscall
> can't be expanded to cover the following use case, so a new one is
> needed. This new syscall allows users to set multiple robust lists per
> process and to have either 32bit or 64bit pointers in the list.

Thank you for the reminder. It was on my list, it slipped. Two
questions:
- there was a bot warning for v3 but this v4 is a RESEND. It the warning
  addressed in any way?

- You say 64bit x86-64 does not have the problem due the compat syscall.
  Arm64 has this problem. New arm64 do not provide arm32 facility. You
  introduce the syscall here. Why not introduce the compat syscall
  instead? I'm sorry if this has been answered somewhere below but this
  was one question I had while I initially skimmed over the patches.

Sebastian
Re: [PATCH RESEND v4 0/7] futex: Create set_robust_list2
Posted by André Almeida 3 months, 3 weeks ago
Hi Sebastian,

Thanks for the feedback!

Em 18/06/2025 04:08, Sebastian Andrzej Siewior escreveu:
> On 2025-06-17 15:34:17 [-0300], André Almeida wrote:
>> This patch adds a new robust_list() syscall. The current syscall
>> can't be expanded to cover the following use case, so a new one is
>> needed. This new syscall allows users to set multiple robust lists per
>> process and to have either 32bit or 64bit pointers in the list.
> 
> Thank you for the reminder. It was on my list, it slipped. Two
> questions:
> - there was a bot warning for v3 but this v4 is a RESEND. It the warning
>    addressed in any way?
> 

Ops, I forgot to address them. I will do it for v5.

> - You say 64bit x86-64 does not have the problem due the compat syscall.
>    Arm64 has this problem. New arm64 do not provide arm32 facility. You
>    introduce the syscall here. Why not introduce the compat syscall
>    instead? I'm sorry if this has been answered somewhere below but this
>    was one question I had while I initially skimmed over the patches.
> 

The main target for this new syscall is Arm64, that can't handle 32 
pointers in the current syscall, so this new interface allows the robust 
list mechanism to know if it needs to do 64 or 32 bit pointer arithmetic 
operations to walk in the list.

Introducing a compat syscall won't fix this, giving that it only works 
in x86-64. We need an entry point for Arm64 that can handle 32 bit pointers.

I hope that it's clear now, let me know if you have more questions :)

> Sebastian
Re: [PATCH RESEND v4 0/7] futex: Create set_robust_list2
Posted by Sebastian Andrzej Siewior 3 months, 3 weeks ago
On 2025-06-18 13:39:46 [-0300], André Almeida wrote:
> 
> Ops, I forgot to address them. I will do it for v5.
> 
> > - You say 64bit x86-64 does not have the problem due the compat syscall.
> >    Arm64 has this problem. New arm64 do not provide arm32 facility. You
> >    introduce the syscall here. Why not introduce the compat syscall
> >    instead? I'm sorry if this has been answered somewhere below but this
> >    was one question I had while I initially skimmed over the patches.
> > 
> 
> The main target for this new syscall is Arm64, that can't handle 32 pointers
> in the current syscall, so this new interface allows the robust list
> mechanism to know if it needs to do 64 or 32 bit pointer arithmetic
> operations to walk in the list.
> 
> Introducing a compat syscall won't fix this, giving that it only works in
> x86-64. We need an entry point for Arm64 that can handle 32 bit pointers.

I would need to dig into details to figure out why it won't work for
arm64 and works only for x86-64. 
There is the set_robust_list syscall as compat which sets
::compat_robust_list. And non-compat sets ::robust_list. The 32bit
application on 64bit kernel should set ::compat_robust_list which what
your syscall provides.
That is why I don't understand the need for it so far. Maybe I am
missing a detail.
We have other architectures with 64 bit kernel and a possible 32bit
userland such as mips, s390 or powerpc which would have the same issue
then. Or there is something special about arm64 in this case which makes
it unique.

Sebastian
Re: [PATCH RESEND v4 0/7] futex: Create set_robust_list2
Posted by Arnd Bergmann 3 months, 3 weeks ago
On Wed, Jun 18, 2025, at 18:56, Sebastian Andrzej Siewior wrote:
> On 2025-06-18 13:39:46 [-0300], André Almeida wrote:
>> 
>> Ops, I forgot to address them. I will do it for v5.
>> 
>> > - You say 64bit x86-64 does not have the problem due the compat syscall.
>> >    Arm64 has this problem. New arm64 do not provide arm32 facility. You
>> >    introduce the syscall here. Why not introduce the compat syscall
>> >    instead? I'm sorry if this has been answered somewhere below but this
>> >    was one question I had while I initially skimmed over the patches.
>> > 
>> 
>> The main target for this new syscall is Arm64, that can't handle 32 pointers
>> in the current syscall, so this new interface allows the robust list
>> mechanism to know if it needs to do 64 or 32 bit pointer arithmetic
>> operations to walk in the list.
>> 
>> Introducing a compat syscall won't fix this, giving that it only works in
>> x86-64. We need an entry point for Arm64 that can handle 32 bit pointers.
>
> I would need to dig into details to figure out why it won't work for
> arm64 and works only for x86-64. 
> There is the set_robust_list syscall as compat which sets
> ::compat_robust_list. And non-compat sets ::robust_list. The 32bit
> application on 64bit kernel should set ::compat_robust_list which what
> your syscall provides.
> That is why I don't understand the need for it so far. Maybe I am
> missing a detail.
> We have other architectures with 64 bit kernel and a possible 32bit
> userland such as mips, s390 or powerpc which would have the same issue
> then. Or there is something special about arm64 in this case which makes
> it unique.

x86 is the special case here, since it allows applications to
call both the 32-bit (compat) and 64-bit syscalls directly on
a 64-bit kernel. I think MIPS may do that as well, but the other
architectures only allow a process to call syscalls for its native
ABI, so the only way to call a compat syscall is from a 32-bit
task. On Arm and RISC-V it's also common to have CPUs that cannot
run 32-bit tasks at all, so even running your x86-32 emulator as
an arm32 or rv32 task won't work.

      Arnd