[PATCH v4 01/25] rseq: Introduce feature size and alignment ELF auxiliary vector entries

Mathieu Desnoyers posted 25 patches 3 years, 6 months ago
There is a newer version of this series
[PATCH v4 01/25] rseq: Introduce feature size and alignment ELF auxiliary vector entries
Posted by Mathieu Desnoyers 3 years, 6 months ago
Export the rseq feature size supported by the kernel as well as the
required allocation alignment for the rseq per-thread area to user-space
through ELF auxiliary vector entries.

This is part of the extensible rseq ABI.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 fs/binfmt_elf.c             | 5 +++++
 include/uapi/linux/auxvec.h | 2 ++
 include/uapi/linux/rseq.h   | 5 +++++
 3 files changed, 12 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 63c7ebb0da89..04fca1e4cbd2 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -46,6 +46,7 @@
 #include <linux/cred.h>
 #include <linux/dax.h>
 #include <linux/uaccess.h>
+#include <linux/rseq.h>
 #include <asm/param.h>
 #include <asm/page.h>
 
@@ -288,6 +289,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
 	if (bprm->have_execfd) {
 		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
 	}
+#ifdef CONFIG_RSEQ
+	NEW_AUX_ENT(AT_RSEQ_FEATURE_SIZE, offsetof(struct rseq, end));
+	NEW_AUX_ENT(AT_RSEQ_ALIGN, __alignof__(struct rseq));
+#endif
 #undef NEW_AUX_ENT
 	/* AT_NULL is zero; clear the rest too */
 	memset(elf_info, 0, (char *)mm->saved_auxv +
diff --git a/include/uapi/linux/auxvec.h b/include/uapi/linux/auxvec.h
index c7e502bf5a6f..6991c4b8ab18 100644
--- a/include/uapi/linux/auxvec.h
+++ b/include/uapi/linux/auxvec.h
@@ -30,6 +30,8 @@
 				 * differ from AT_PLATFORM. */
 #define AT_RANDOM 25	/* address of 16 random bytes */
 #define AT_HWCAP2 26	/* extension of AT_HWCAP */
+#define AT_RSEQ_FEATURE_SIZE	27	/* rseq supported feature size */
+#define AT_RSEQ_ALIGN		28	/* rseq allocation alignment */
 
 #define AT_EXECFN  31	/* filename of program */
 
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index 77ee207623a9..05d3c4cdeb40 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -130,6 +130,11 @@ struct rseq {
 	 *     this thread.
 	 */
 	__u32 flags;
+
+	/*
+	 * Flexible array member at end of structure, after last feature field.
+	 */
+	char end[];
 } __attribute__((aligned(4 * sizeof(__u64))));
 
 #endif /* _UAPI_LINUX_RSEQ_H */
-- 
2.25.1
Re: [PATCH v4 01/25] rseq: Introduce feature size and alignment ELF auxiliary vector entries
Posted by Florian Weimer 3 years, 5 months ago
* Mathieu Desnoyers:

> Export the rseq feature size supported by the kernel as well as the
> required allocation alignment for the rseq per-thread area to user-space
> through ELF auxiliary vector entries.
>
> This is part of the extensible rseq ABI.
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
>  fs/binfmt_elf.c             | 5 +++++
>  include/uapi/linux/auxvec.h | 2 ++
>  include/uapi/linux/rseq.h   | 5 +++++
>  3 files changed, 12 insertions(+)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 63c7ebb0da89..04fca1e4cbd2 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -46,6 +46,7 @@
>  #include <linux/cred.h>
>  #include <linux/dax.h>
>  #include <linux/uaccess.h>
> +#include <linux/rseq.h>
>  #include <asm/param.h>
>  #include <asm/page.h>
>  
> @@ -288,6 +289,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>  	if (bprm->have_execfd) {
>  		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
>  	}
> +#ifdef CONFIG_RSEQ
> +	NEW_AUX_ENT(AT_RSEQ_FEATURE_SIZE, offsetof(struct rseq, end));
> +	NEW_AUX_ENT(AT_RSEQ_ALIGN, __alignof__(struct rseq));
> +#endif
>  #undef NEW_AUX_ENT
>  	/* AT_NULL is zero; clear the rest too */
>  	memset(elf_info, 0, (char *)mm->saved_auxv +
> diff --git a/include/uapi/linux/auxvec.h b/include/uapi/linux/auxvec.h
> index c7e502bf5a6f..6991c4b8ab18 100644
> --- a/include/uapi/linux/auxvec.h
> +++ b/include/uapi/linux/auxvec.h
> @@ -30,6 +30,8 @@
>  				 * differ from AT_PLATFORM. */
>  #define AT_RANDOM 25	/* address of 16 random bytes */
>  #define AT_HWCAP2 26	/* extension of AT_HWCAP */
> +#define AT_RSEQ_FEATURE_SIZE	27	/* rseq supported feature size */
> +#define AT_RSEQ_ALIGN		28	/* rseq allocation alignment */
>  
>  #define AT_EXECFN  31	/* filename of program */

Do we need the alignment?  Or can we keep it perpetually at 32?  Or we
could steal some bits from AT_RSEQ_FEATURE_SIZE?  (Not the lower
bits—they aren't unused due to the way the feature size works.)

Thanks,
Florian
Re: [PATCH v4 01/25] rseq: Introduce feature size and alignment ELF auxiliary vector entries
Posted by Mathieu Desnoyers 3 years, 5 months ago
On 2022-10-10 08:42, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
>> Export the rseq feature size supported by the kernel as well as the
>> required allocation alignment for the rseq per-thread area to user-space
>> through ELF auxiliary vector entries.
>>
>> This is part of the extensible rseq ABI.
>>
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> ---
>>   fs/binfmt_elf.c             | 5 +++++
>>   include/uapi/linux/auxvec.h | 2 ++
>>   include/uapi/linux/rseq.h   | 5 +++++
>>   3 files changed, 12 insertions(+)
>>
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index 63c7ebb0da89..04fca1e4cbd2 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -46,6 +46,7 @@
>>   #include <linux/cred.h>
>>   #include <linux/dax.h>
>>   #include <linux/uaccess.h>
>> +#include <linux/rseq.h>
>>   #include <asm/param.h>
>>   #include <asm/page.h>
>>   
>> @@ -288,6 +289,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>>   	if (bprm->have_execfd) {
>>   		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
>>   	}
>> +#ifdef CONFIG_RSEQ
>> +	NEW_AUX_ENT(AT_RSEQ_FEATURE_SIZE, offsetof(struct rseq, end));
>> +	NEW_AUX_ENT(AT_RSEQ_ALIGN, __alignof__(struct rseq));
>> +#endif
>>   #undef NEW_AUX_ENT
>>   	/* AT_NULL is zero; clear the rest too */
>>   	memset(elf_info, 0, (char *)mm->saved_auxv +
>> diff --git a/include/uapi/linux/auxvec.h b/include/uapi/linux/auxvec.h
>> index c7e502bf5a6f..6991c4b8ab18 100644
>> --- a/include/uapi/linux/auxvec.h
>> +++ b/include/uapi/linux/auxvec.h
>> @@ -30,6 +30,8 @@
>>   				 * differ from AT_PLATFORM. */
>>   #define AT_RANDOM 25	/* address of 16 random bytes */
>>   #define AT_HWCAP2 26	/* extension of AT_HWCAP */
>> +#define AT_RSEQ_FEATURE_SIZE	27	/* rseq supported feature size */
>> +#define AT_RSEQ_ALIGN		28	/* rseq allocation alignment */
>>   
>>   #define AT_EXECFN  31	/* filename of program */
> 
> Do we need the alignment?  Or can we keep it perpetually at 32?  Or we
> could steal some bits from AT_RSEQ_FEATURE_SIZE?  (Not the lower
> bits—they aren't unused due to the way the feature size works.)

I cannot imagine a use-case that would require us to bump the alignment 
requirement over 32 bytes, so we may very well leave it at 32. But 
perhaps someone else has a better imagination than mine ?

Thanks,

Mathieu

> 
> Thanks,
> Florian
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [PATCH v4 01/25] rseq: Introduce feature size and alignment ELF auxiliary vector entries
Posted by Mathieu Desnoyers 3 years, 5 months ago
On 2022-10-17 12:09, Mathieu Desnoyers wrote:
> On 2022-10-10 08:42, Florian Weimer wrote:
>> * Mathieu Desnoyers:
>>
>>> Export the rseq feature size supported by the kernel as well as the
>>> required allocation alignment for the rseq per-thread area to user-space
>>> through ELF auxiliary vector entries.
>>>
>>> This is part of the extensible rseq ABI.
>>>
>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> ---
>>>   fs/binfmt_elf.c             | 5 +++++
>>>   include/uapi/linux/auxvec.h | 2 ++
>>>   include/uapi/linux/rseq.h   | 5 +++++
>>>   3 files changed, 12 insertions(+)
>>>
>>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>>> index 63c7ebb0da89..04fca1e4cbd2 100644
>>> --- a/fs/binfmt_elf.c
>>> +++ b/fs/binfmt_elf.c
>>> @@ -46,6 +46,7 @@
>>>   #include <linux/cred.h>
>>>   #include <linux/dax.h>
>>>   #include <linux/uaccess.h>
>>> +#include <linux/rseq.h>
>>>   #include <asm/param.h>
>>>   #include <asm/page.h>
>>> @@ -288,6 +289,10 @@ create_elf_tables(struct linux_binprm *bprm, 
>>> const struct elfhdr *exec,
>>>       if (bprm->have_execfd) {
>>>           NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
>>>       }
>>> +#ifdef CONFIG_RSEQ
>>> +    NEW_AUX_ENT(AT_RSEQ_FEATURE_SIZE, offsetof(struct rseq, end));
>>> +    NEW_AUX_ENT(AT_RSEQ_ALIGN, __alignof__(struct rseq));
>>> +#endif
>>>   #undef NEW_AUX_ENT
>>>       /* AT_NULL is zero; clear the rest too */
>>>       memset(elf_info, 0, (char *)mm->saved_auxv +
>>> diff --git a/include/uapi/linux/auxvec.h b/include/uapi/linux/auxvec.h
>>> index c7e502bf5a6f..6991c4b8ab18 100644
>>> --- a/include/uapi/linux/auxvec.h
>>> +++ b/include/uapi/linux/auxvec.h
>>> @@ -30,6 +30,8 @@
>>>                    * differ from AT_PLATFORM. */
>>>   #define AT_RANDOM 25    /* address of 16 random bytes */
>>>   #define AT_HWCAP2 26    /* extension of AT_HWCAP */
>>> +#define AT_RSEQ_FEATURE_SIZE    27    /* rseq supported feature size */
>>> +#define AT_RSEQ_ALIGN        28    /* rseq allocation alignment */
>>>   #define AT_EXECFN  31    /* filename of program */
>>
>> Do we need the alignment?  Or can we keep it perpetually at 32?  Or we
>> could steal some bits from AT_RSEQ_FEATURE_SIZE?  (Not the lower
>> bits—they aren't unused due to the way the feature size works.)
> 
> I cannot imagine a use-case that would require us to bump the alignment 
> requirement over 32 bytes, so we may very well leave it at 32. But 
> perhaps someone else has a better imagination than mine ?

Actually, here is a scenario that warrants exposing the required alignment:

Note that struct rseq is *not* packed.

If we extend struct rseq to a size that makes the compiler use an 
alignment larger than 32 bytes in the future, and if the compiler uses 
that larger alignment knowledge to issue instructions that require the 
larger alignment, then it would be incorrect for user-space to allocate 
the struct rseq on an alignment lower than the required alignment.

Indeed, on rseq registration, we have the following check:

if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
[...]
    return -EINVAL;

Which would break if the size of struct rseq is large enough that the 
alignment grows larger than 32 bytes.

You mentioned we could steal some high bits from AT_RSEQ_FEATURE_SIZE to 
put the alignment. What is the issue with exposing an explicit 
AT_RSEQ_ALIGN ? It's just a auxv entry, so I don't see it as a huge 
performance concern to access 2 entries rather than one.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
>>
>> Thanks,
>> Florian
>>
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [PATCH v4 01/25] rseq: Introduce feature size and alignment ELF auxiliary vector entries
Posted by Florian Weimer 3 years, 5 months ago
* Mathieu Desnoyers:

> If we extend struct rseq to a size that makes the compiler use an
> alignment larger than 32 bytes in the future, and if the compiler uses 
> that larger alignment knowledge to issue instructions that require the
> larger alignment, then it would be incorrect for user-space to
> allocate the struct rseq on an alignment lower than the required
> alignment.
>
> Indeed, on rseq registration, we have the following check:
>
> if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
> [...]
>    return -EINVAL;
>
> Which would break if the size of struct rseq is large enough that the
> alignment grows larger than 32 bytes.

I never quite understood the reason for that check, it certainly made
the glibc implementation more complicated.  But to support variable
sizes internally, we'll have to put in some extra effort anyway, so that
it won't matter much in the end.  As long as the required alignment
isn't larger than the page size. 8-/

> You mentioned we could steal some high bits from AT_RSEQ_FEATURE_SIZE
> to put the alignment. What is the issue with exposing an explicit 
> AT_RSEQ_ALIGN ? It's just a auxv entry, so I don't see it as a huge
> performance concern to access 2 entries rather than one.

I don't mind too much, we already have a large on-stack array in the
loader so that we can decode the auxiliary vector without a humongous
switch statement.  But eventually that approach will stop working if the
set of interesting AT_* values become too large and discontinuous.

Thanks,
Florian
Re: [PATCH v4 01/25] rseq: Introduce feature size and alignment ELF auxiliary vector entries
Posted by Mathieu Desnoyers 3 years, 5 months ago
On 2022-10-18 11:34, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
>> If we extend struct rseq to a size that makes the compiler use an
>> alignment larger than 32 bytes in the future, and if the compiler uses
>> that larger alignment knowledge to issue instructions that require the
>> larger alignment, then it would be incorrect for user-space to
>> allocate the struct rseq on an alignment lower than the required
>> alignment.
>>
>> Indeed, on rseq registration, we have the following check:
>>
>> if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq))
>> [...]
>>     return -EINVAL;
>>
>> Which would break if the size of struct rseq is large enough that the
>> alignment grows larger than 32 bytes.
> 
> I never quite understood the reason for that check, it certainly made
> the glibc implementation more complicated.  But to support variable
> sizes internally, we'll have to put in some extra effort anyway, so that
> it won't matter much in the end.  As long as the required alignment
> isn't larger than the page size. 8-/

I don't expect it to grow so large.

There is one more reason why increasing the alignment of struct rseq may 
become useful as the structure grows: it would guarantee that it fits in 
a single lower level cache line as its size increases. It's not 
something I expect would break if not properly aligned, but it's a nice 
optimization.

I see two possible approaches here:

1) We expose the rseq alignment explicitly through auxv, and we can keep 
the IS_ALIGNED validation on rseq registration. This "IS_ALIGNED" check 
would probably have to be tweaked though, because if the registered
rseq size is 32, then an alignment of 32 is all we require. It's only if 
the rseq_len is different from 32 that we need to validate that the 
alignment matches the alignment of struct rseq.

2) We don't expose the rseq alignment through auxv, effectively fixing 
it at 32. We would need to modify the IS_ALIGNED check on rseq 
registration so it validates an alignment of 32 rather than using the 
alignment of struct rseq.

> 
>> You mentioned we could steal some high bits from AT_RSEQ_FEATURE_SIZE
>> to put the alignment. What is the issue with exposing an explicit
>> AT_RSEQ_ALIGN ? It's just a auxv entry, so I don't see it as a huge
>> performance concern to access 2 entries rather than one.
> 
> I don't mind too much, we already have a large on-stack array in the
> loader so that we can decode the auxiliary vector without a humongous
> switch statement.  But eventually that approach will stop working if the
> set of interesting AT_* values become too large and discontinuous.

OK. So I guess the main question here is whether we want fixed-32-bytes 
alignment, or do we want to be able to increase the mandated alignment 
in the future as struct rseq expands ?

The possible reasons for increasing the alignment over 32-bytes would be:

- Unforeseen compiler requirement on a structure alignment larger than 
32-bytes as we extend the size of struct rseq.
- Optimization to fit within a single LLC cache line as struct rseq grows.

Thoughts ?

Thanks,

Mathieu

> 
> Thanks,
> Florian
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com