[PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys

Dmitry Vyukov posted 4 patches 11 months, 2 weeks ago
There is a newer version of this series
[PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Dmitry Vyukov 11 months, 2 weeks ago
Add a test that ensures that PKEY-protected struct rseq_cs
works and does not lead to process kills.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")

---
Changes in v5:
 - Use static for variables/functions
 - Use RSEQ_READ/WRITE_ONCE instead of volatile

Changes in v4:
 - Added Fixes tag

Changes in v3:
 - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
 - rework the test to work when only pkey 0 is supported for rseq

Changes in v2:
 - change test to install protected rseq_cs instead of rseq
---
 tools/testing/selftests/rseq/Makefile    |  2 +-
 tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
 tools/testing/selftests/rseq/rseq.h      |  1 +
 3 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..9111d25fea3af 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
 
 TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
 		param_test_benchmark param_test_compare_twice param_test_mm_cid \
-		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
 
 TEST_GEN_PROGS_EXTENDED = librseq.so
 
diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
new file mode 100644
index 0000000000000..cc4dd98190942
--- /dev/null
+++ b/tools/testing/selftests/rseq/pkey_test.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: LGPL-2.1
+/*
+ * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <ucontext.h>
+#include <unistd.h>
+
+#include "rseq.h"
+#include "rseq-abi.h"
+
+static int pkey;
+static ucontext_t ucp0, ucp1;
+
+static void coroutine(void)
+{
+	int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
+	/*
+	 * When we disable access to pkey 0, globals and TLS become
+	 * inaccessible too, so we need to tread carefully.
+	 * Pkey is global so we need to copy it onto the stack.
+	 */
+	int pk = RSEQ_READ_ONCE(pkey);
+	struct timespec ts;
+
+	orig_pk0 = pkey_get(0);
+	if (pkey_set(0, PKEY_DISABLE_ACCESS))
+		err(1, "pkey_set failed");
+	old_pk0 = pkey_get(0);
+	old_pk1 = pkey_get(pk);
+
+	/*
+	 * Prevent compiler from initializing it by loading a 16-global.
+	 */
+	RSEQ_WRITE_ONCE(ts.tv_sec, 0);
+	RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
+	/*
+	 * If the kernel misbehaves, context switches in the following loop
+	 * will terminate the process with SIGSEGV.
+	 * Trigger preemption w/o accessing TLS.
+	 * Note that glibc's usleep touches errno always.
+	 */
+	for (i = 0; i < 10; i++)
+		syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
+
+	pk0 = pkey_get(0);
+	pk1 = pkey_get(pk);
+	if (pkey_set(0, orig_pk0))
+		err(1, "pkey_set failed");
+
+	/*
+	 * Ensure that the kernel has restored the previous value of pkeys
+	 * register after changing them.
+	 */
+	if (old_pk0 != pk0)
+		errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
+	if (old_pk1 != pk1)
+		errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
+
+	swapcontext(&ucp1, &ucp0);
+	abort();
+}
+
+int main(int argc, char **argv)
+{
+	pkey = pkey_alloc(0, 0);
+	if (pkey == -1) {
+		printf("[SKIP]\tKernel does not support PKEYs: %s\n",
+			strerror(errno));
+		return 0;
+	}
+
+	if (rseq_register_current_thread())
+		err(1, "rseq_register_current_thread failed");
+
+	if (getcontext(&ucp1))
+		err(1, "getcontext failed");
+	ucp1.uc_stack.ss_size = getpagesize() * 4;
+	ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
+		PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
+	if (ucp1.uc_stack.ss_sp == MAP_FAILED)
+		err(1, "mmap failed");
+	if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
+			PROT_READ | PROT_WRITE, pkey))
+		err(1, "pkey_mprotect failed");
+	makecontext(&ucp1, coroutine, 0);
+	if (swapcontext(&ucp0, &ucp1))
+		err(1, "swapcontext failed");
+	return 0;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index ba424ce80a719..65da4a727c550 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -8,6 +8,7 @@
 #ifndef RSEQ_H
 #define RSEQ_H
 
+#include <assert.h>
 #include <stdint.h>
 #include <stdbool.h>
 #include <pthread.h>
-- 
2.48.1.658.g4767266eb4-goog
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Mathieu Desnoyers 11 months ago
On 2025-02-27 09:03, Dmitry Vyukov wrote:
> Add a test that ensures that PKEY-protected struct rseq_cs
> works and does not lead to process kills.
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> 
> ---
> Changes in v5:
>   - Use static for variables/functions
>   - Use RSEQ_READ/WRITE_ONCE instead of volatile
> 
> Changes in v4:
>   - Added Fixes tag
> 
> Changes in v3:
>   - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>   - rework the test to work when only pkey 0 is supported for rseq
> 
> Changes in v2:
>   - change test to install protected rseq_cs instead of rseq
> ---
>   tools/testing/selftests/rseq/Makefile    |  2 +-
>   tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>   tools/testing/selftests/rseq/rseq.h      |  1 +
>   3 files changed, 100 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> index 5a3432fceb586..9111d25fea3af 100644
> --- a/tools/testing/selftests/rseq/Makefile
> +++ b/tools/testing/selftests/rseq/Makefile
> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>   
>   TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>   		param_test_benchmark param_test_compare_twice param_test_mm_cid \
> -		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> +		param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>   
>   TEST_GEN_PROGS_EXTENDED = librseq.so
>   
> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> new file mode 100644
> index 0000000000000..cc4dd98190942
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/pkey_test.c
> @@ -0,0 +1,98 @@
> +// SPDX-License-Identifier: LGPL-2.1
> +/*
> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> + */
> +
> +#define _GNU_SOURCE
> +#include <err.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <sys/syscall.h>
> +#include <ucontext.h>
> +#include <unistd.h>
> +
> +#include "rseq.h"
> +#include "rseq-abi.h"
> +
> +static int pkey;
> +static ucontext_t ucp0, ucp1;
> +
> +static void coroutine(void)
> +{
> +	int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> +	/*
> +	 * When we disable access to pkey 0, globals and TLS become
> +	 * inaccessible too, so we need to tread carefully.
> +	 * Pkey is global so we need to copy it onto the stack.
> +	 */
> +	int pk = RSEQ_READ_ONCE(pkey);
> +	struct timespec ts;
> +
> +	orig_pk0 = pkey_get(0);
> +	if (pkey_set(0, PKEY_DISABLE_ACCESS))
> +		err(1, "pkey_set failed");
> +	old_pk0 = pkey_get(0);
> +	old_pk1 = pkey_get(pk);
> +
> +	/*
> +	 * Prevent compiler from initializing it by loading a 16-global.
> +	 */
> +	RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> +	RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> +	/*
> +	 * If the kernel misbehaves, context switches in the following loop
> +	 * will terminate the process with SIGSEGV.
> +	 * Trigger preemption w/o accessing TLS.
> +	 * Note that glibc's usleep touches errno always.
> +	 */
> +	for (i = 0; i < 10; i++)
> +		syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> +
> +	pk0 = pkey_get(0);
> +	pk1 = pkey_get(pk);
> +	if (pkey_set(0, orig_pk0))
> +		err(1, "pkey_set failed");
> +
> +	/*
> +	 * Ensure that the kernel has restored the previous value of pkeys
> +	 * register after changing them.
> +	 */
> +	if (old_pk0 != pk0)
> +		errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> +	if (old_pk1 != pk1)
> +		errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> +
> +	swapcontext(&ucp1, &ucp0);
> +	abort();
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	pkey = pkey_alloc(0, 0);
> +	if (pkey == -1) {
> +		printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> +			strerror(errno));
> +		return 0;
> +	}
> +
> +	if (rseq_register_current_thread())
> +		err(1, "rseq_register_current_thread failed");
> +
> +	if (getcontext(&ucp1))
> +		err(1, "getcontext failed");
> +	ucp1.uc_stack.ss_size = getpagesize() * 4;
> +	ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> +		PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> +	if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> +		err(1, "mmap failed");
> +	if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> +			PROT_READ | PROT_WRITE, pkey))
> +		err(1, "pkey_mprotect failed");
> +	makecontext(&ucp1, coroutine, 0);
> +	if (swapcontext(&ucp0, &ucp1))
> +		err(1, "swapcontext failed");

Should the rseq register be paired with a rseq unregister here ?

Thanks,

Mathieu

> +	return 0;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> index ba424ce80a719..65da4a727c550 100644
> --- a/tools/testing/selftests/rseq/rseq.h
> +++ b/tools/testing/selftests/rseq/rseq.h
> @@ -8,6 +8,7 @@
>   #ifndef RSEQ_H
>   #define RSEQ_H
>   
> +#include <assert.h>
>   #include <stdint.h>
>   #include <stdbool.h>
>   #include <pthread.h>


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Dmitry Vyukov 11 months ago
On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> > Add a test that ensures that PKEY-protected struct rseq_cs
> > works and does not lead to process kills.
> >
> > Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >
> > ---
> > Changes in v5:
> >   - Use static for variables/functions
> >   - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >
> > Changes in v4:
> >   - Added Fixes tag
> >
> > Changes in v3:
> >   - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >   - rework the test to work when only pkey 0 is supported for rseq
> >
> > Changes in v2:
> >   - change test to install protected rseq_cs instead of rseq
> > ---
> >   tools/testing/selftests/rseq/Makefile    |  2 +-
> >   tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >   tools/testing/selftests/rseq/rseq.h      |  1 +
> >   3 files changed, 100 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> > index 5a3432fceb586..9111d25fea3af 100644
> > --- a/tools/testing/selftests/rseq/Makefile
> > +++ b/tools/testing/selftests/rseq/Makefile
> > @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >
> >   TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >               param_test_benchmark param_test_compare_twice param_test_mm_cid \
> > -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> > +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >
> >   TEST_GEN_PROGS_EXTENDED = librseq.so
> >
> > diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> > new file mode 100644
> > index 0000000000000..cc4dd98190942
> > --- /dev/null
> > +++ b/tools/testing/selftests/rseq/pkey_test.c
> > @@ -0,0 +1,98 @@
> > +// SPDX-License-Identifier: LGPL-2.1
> > +/*
> > + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> > + */
> > +
> > +#define _GNU_SOURCE
> > +#include <err.h>
> > +#include <errno.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <sys/mman.h>
> > +#include <sys/syscall.h>
> > +#include <ucontext.h>
> > +#include <unistd.h>
> > +
> > +#include "rseq.h"
> > +#include "rseq-abi.h"
> > +
> > +static int pkey;
> > +static ucontext_t ucp0, ucp1;
> > +
> > +static void coroutine(void)
> > +{
> > +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> > +     /*
> > +      * When we disable access to pkey 0, globals and TLS become
> > +      * inaccessible too, so we need to tread carefully.
> > +      * Pkey is global so we need to copy it onto the stack.
> > +      */
> > +     int pk = RSEQ_READ_ONCE(pkey);
> > +     struct timespec ts;
> > +
> > +     orig_pk0 = pkey_get(0);
> > +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
> > +             err(1, "pkey_set failed");
> > +     old_pk0 = pkey_get(0);
> > +     old_pk1 = pkey_get(pk);
> > +
> > +     /*
> > +      * Prevent compiler from initializing it by loading a 16-global.
> > +      */
> > +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> > +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> > +     /*
> > +      * If the kernel misbehaves, context switches in the following loop
> > +      * will terminate the process with SIGSEGV.
> > +      * Trigger preemption w/o accessing TLS.
> > +      * Note that glibc's usleep touches errno always.
> > +      */
> > +     for (i = 0; i < 10; i++)
> > +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> > +
> > +     pk0 = pkey_get(0);
> > +     pk1 = pkey_get(pk);
> > +     if (pkey_set(0, orig_pk0))
> > +             err(1, "pkey_set failed");
> > +
> > +     /*
> > +      * Ensure that the kernel has restored the previous value of pkeys
> > +      * register after changing them.
> > +      */
> > +     if (old_pk0 != pk0)
> > +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> > +     if (old_pk1 != pk1)
> > +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> > +
> > +     swapcontext(&ucp1, &ucp0);
> > +     abort();
> > +}
> > +
> > +int main(int argc, char **argv)
> > +{
> > +     pkey = pkey_alloc(0, 0);
> > +     if (pkey == -1) {
> > +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> > +                     strerror(errno));
> > +             return 0;
> > +     }
> > +
> > +     if (rseq_register_current_thread())
> > +             err(1, "rseq_register_current_thread failed");
> > +
> > +     if (getcontext(&ucp1))
> > +             err(1, "getcontext failed");
> > +     ucp1.uc_stack.ss_size = getpagesize() * 4;
> > +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> > +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> > +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> > +             err(1, "mmap failed");
> > +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> > +                     PROT_READ | PROT_WRITE, pkey))
> > +             err(1, "pkey_mprotect failed");
> > +     makecontext(&ucp1, coroutine, 0);
> > +     if (swapcontext(&ucp0, &ucp1))
> > +             err(1, "swapcontext failed");
>
> Should the rseq register be paired with a rseq unregister here ?

I dunno. It's necessary for this test and in general. Tcmalloc won't
unregister on thread exit. Even for a libc it may be a bad idea due to
signals.

> Thanks,
>
> Mathieu
>
> > +     return 0;
> > +}
> > diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> > index ba424ce80a719..65da4a727c550 100644
> > --- a/tools/testing/selftests/rseq/rseq.h
> > +++ b/tools/testing/selftests/rseq/rseq.h
> > @@ -8,6 +8,7 @@
> >   #ifndef RSEQ_H
> >   #define RSEQ_H
> >
> > +#include <assert.h>
> >   #include <stdint.h>
> >   #include <stdbool.h>
> >   #include <pthread.h>
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Mathieu Desnoyers 11 months ago
On 2025-03-10 10:36, Dmitry Vyukov wrote:
> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
>>> Add a test that ensures that PKEY-protected struct rseq_cs
>>> works and does not lead to process kills.
>>>
>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>> Cc: x86@kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>
>>> ---
>>> Changes in v5:
>>>    - Use static for variables/functions
>>>    - Use RSEQ_READ/WRITE_ONCE instead of volatile
>>>
>>> Changes in v4:
>>>    - Added Fixes tag
>>>
>>> Changes in v3:
>>>    - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>    - rework the test to work when only pkey 0 is supported for rseq
>>>
>>> Changes in v2:
>>>    - change test to install protected rseq_cs instead of rseq
>>> ---
>>>    tools/testing/selftests/rseq/Makefile    |  2 +-
>>>    tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>>>    tools/testing/selftests/rseq/rseq.h      |  1 +
>>>    3 files changed, 100 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
>>> index 5a3432fceb586..9111d25fea3af 100644
>>> --- a/tools/testing/selftests/rseq/Makefile
>>> +++ b/tools/testing/selftests/rseq/Makefile
>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>>>
>>>    TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>>>                param_test_benchmark param_test_compare_twice param_test_mm_cid \
>>> -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
>>> +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>>>
>>>    TEST_GEN_PROGS_EXTENDED = librseq.so
>>>
>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
>>> new file mode 100644
>>> index 0000000000000..cc4dd98190942
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
>>> @@ -0,0 +1,98 @@
>>> +// SPDX-License-Identifier: LGPL-2.1
>>> +/*
>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
>>> + */
>>> +
>>> +#define _GNU_SOURCE
>>> +#include <err.h>
>>> +#include <errno.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <string.h>
>>> +#include <sys/mman.h>
>>> +#include <sys/syscall.h>
>>> +#include <ucontext.h>
>>> +#include <unistd.h>
>>> +
>>> +#include "rseq.h"
>>> +#include "rseq-abi.h"
>>> +
>>> +static int pkey;
>>> +static ucontext_t ucp0, ucp1;
>>> +
>>> +static void coroutine(void)
>>> +{
>>> +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
>>> +     /*
>>> +      * When we disable access to pkey 0, globals and TLS become
>>> +      * inaccessible too, so we need to tread carefully.
>>> +      * Pkey is global so we need to copy it onto the stack.
>>> +      */
>>> +     int pk = RSEQ_READ_ONCE(pkey);
>>> +     struct timespec ts;
>>> +
>>> +     orig_pk0 = pkey_get(0);
>>> +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
>>> +             err(1, "pkey_set failed");
>>> +     old_pk0 = pkey_get(0);
>>> +     old_pk1 = pkey_get(pk);
>>> +
>>> +     /*
>>> +      * Prevent compiler from initializing it by loading a 16-global.
>>> +      */
>>> +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
>>> +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
>>> +     /*
>>> +      * If the kernel misbehaves, context switches in the following loop
>>> +      * will terminate the process with SIGSEGV.
>>> +      * Trigger preemption w/o accessing TLS.
>>> +      * Note that glibc's usleep touches errno always.
>>> +      */
>>> +     for (i = 0; i < 10; i++)
>>> +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
>>> +
>>> +     pk0 = pkey_get(0);
>>> +     pk1 = pkey_get(pk);
>>> +     if (pkey_set(0, orig_pk0))
>>> +             err(1, "pkey_set failed");
>>> +
>>> +     /*
>>> +      * Ensure that the kernel has restored the previous value of pkeys
>>> +      * register after changing them.
>>> +      */
>>> +     if (old_pk0 != pk0)
>>> +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
>>> +     if (old_pk1 != pk1)
>>> +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
>>> +
>>> +     swapcontext(&ucp1, &ucp0);
>>> +     abort();
>>> +}
>>> +
>>> +int main(int argc, char **argv)
>>> +{
>>> +     pkey = pkey_alloc(0, 0);
>>> +     if (pkey == -1) {
>>> +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
>>> +                     strerror(errno));
>>> +             return 0;
>>> +     }
>>> +
>>> +     if (rseq_register_current_thread())
>>> +             err(1, "rseq_register_current_thread failed");
>>> +
>>> +     if (getcontext(&ucp1))
>>> +             err(1, "getcontext failed");
>>> +     ucp1.uc_stack.ss_size = getpagesize() * 4;
>>> +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
>>> +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
>>> +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
>>> +             err(1, "mmap failed");
>>> +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
>>> +                     PROT_READ | PROT_WRITE, pkey))
>>> +             err(1, "pkey_mprotect failed");
>>> +     makecontext(&ucp1, coroutine, 0);
>>> +     if (swapcontext(&ucp0, &ucp1))
>>> +             err(1, "swapcontext failed");
>>
>> Should the rseq register be paired with a rseq unregister here ?
> 
> I dunno. It's necessary for this test and in general. Tcmalloc won't
> unregister on thread exit. Even for a libc it may be a bad idea due to
> signals.

The rseq register/unregister is only for the case where libc does not
support rseq. But if you want to use rseq_register_current_thread(),
then you'll want to pair it with unregister.

Thanks,

Mathieu

> 
>> Thanks,
>>
>> Mathieu
>>
>>> +     return 0;
>>> +}
>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
>>> index ba424ce80a719..65da4a727c550 100644
>>> --- a/tools/testing/selftests/rseq/rseq.h
>>> +++ b/tools/testing/selftests/rseq/rseq.h
>>> @@ -8,6 +8,7 @@
>>>    #ifndef RSEQ_H
>>>    #define RSEQ_H
>>>
>>> +#include <assert.h>
>>>    #include <stdint.h>
>>>    #include <stdbool.h>
>>>    #include <pthread.h>
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Dmitry Vyukov 11 months ago
On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>> works and does not lead to process kills.
> >>>
> >>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> >>> Cc: Boqun Feng <boqun.feng@gmail.com>
> >>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>> Cc: Ingo Molnar <mingo@redhat.com>
> >>> Cc: Borislav Petkov <bp@alien8.de>
> >>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >>> Cc: x86@kernel.org
> >>> Cc: linux-kernel@vger.kernel.org
> >>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>
> >>> ---
> >>> Changes in v5:
> >>>    - Use static for variables/functions
> >>>    - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>
> >>> Changes in v4:
> >>>    - Added Fixes tag
> >>>
> >>> Changes in v3:
> >>>    - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>    - rework the test to work when only pkey 0 is supported for rseq
> >>>
> >>> Changes in v2:
> >>>    - change test to install protected rseq_cs instead of rseq
> >>> ---
> >>>    tools/testing/selftests/rseq/Makefile    |  2 +-
> >>>    tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>>    tools/testing/selftests/rseq/rseq.h      |  1 +
> >>>    3 files changed, 100 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>> index 5a3432fceb586..9111d25fea3af 100644
> >>> --- a/tools/testing/selftests/rseq/Makefile
> >>> +++ b/tools/testing/selftests/rseq/Makefile
> >>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>
> >>>    TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>>                param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>> -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>> +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>
> >>>    TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>
> >>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>> new file mode 100644
> >>> index 0000000000000..cc4dd98190942
> >>> --- /dev/null
> >>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>> @@ -0,0 +1,98 @@
> >>> +// SPDX-License-Identifier: LGPL-2.1
> >>> +/*
> >>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>> + */
> >>> +
> >>> +#define _GNU_SOURCE
> >>> +#include <err.h>
> >>> +#include <errno.h>
> >>> +#include <stdio.h>
> >>> +#include <stdlib.h>
> >>> +#include <string.h>
> >>> +#include <sys/mman.h>
> >>> +#include <sys/syscall.h>
> >>> +#include <ucontext.h>
> >>> +#include <unistd.h>
> >>> +
> >>> +#include "rseq.h"
> >>> +#include "rseq-abi.h"
> >>> +
> >>> +static int pkey;
> >>> +static ucontext_t ucp0, ucp1;
> >>> +
> >>> +static void coroutine(void)
> >>> +{
> >>> +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>> +     /*
> >>> +      * When we disable access to pkey 0, globals and TLS become
> >>> +      * inaccessible too, so we need to tread carefully.
> >>> +      * Pkey is global so we need to copy it onto the stack.
> >>> +      */
> >>> +     int pk = RSEQ_READ_ONCE(pkey);
> >>> +     struct timespec ts;
> >>> +
> >>> +     orig_pk0 = pkey_get(0);
> >>> +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>> +             err(1, "pkey_set failed");
> >>> +     old_pk0 = pkey_get(0);
> >>> +     old_pk1 = pkey_get(pk);
> >>> +
> >>> +     /*
> >>> +      * Prevent compiler from initializing it by loading a 16-global.
> >>> +      */
> >>> +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>> +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>> +     /*
> >>> +      * If the kernel misbehaves, context switches in the following loop
> >>> +      * will terminate the process with SIGSEGV.
> >>> +      * Trigger preemption w/o accessing TLS.
> >>> +      * Note that glibc's usleep touches errno always.
> >>> +      */
> >>> +     for (i = 0; i < 10; i++)
> >>> +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>> +
> >>> +     pk0 = pkey_get(0);
> >>> +     pk1 = pkey_get(pk);
> >>> +     if (pkey_set(0, orig_pk0))
> >>> +             err(1, "pkey_set failed");
> >>> +
> >>> +     /*
> >>> +      * Ensure that the kernel has restored the previous value of pkeys
> >>> +      * register after changing them.
> >>> +      */
> >>> +     if (old_pk0 != pk0)
> >>> +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>> +     if (old_pk1 != pk1)
> >>> +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>> +
> >>> +     swapcontext(&ucp1, &ucp0);
> >>> +     abort();
> >>> +}
> >>> +
> >>> +int main(int argc, char **argv)
> >>> +{
> >>> +     pkey = pkey_alloc(0, 0);
> >>> +     if (pkey == -1) {
> >>> +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>> +                     strerror(errno));
> >>> +             return 0;
> >>> +     }
> >>> +
> >>> +     if (rseq_register_current_thread())
> >>> +             err(1, "rseq_register_current_thread failed");
> >>> +
> >>> +     if (getcontext(&ucp1))
> >>> +             err(1, "getcontext failed");
> >>> +     ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>> +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>> +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>> +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>> +             err(1, "mmap failed");
> >>> +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>> +                     PROT_READ | PROT_WRITE, pkey))
> >>> +             err(1, "pkey_mprotect failed");
> >>> +     makecontext(&ucp1, coroutine, 0);
> >>> +     if (swapcontext(&ucp0, &ucp1))
> >>> +             err(1, "swapcontext failed");
> >>
> >> Should the rseq register be paired with a rseq unregister here ?
> >
> > I dunno. It's necessary for this test and in general. Tcmalloc won't
> > unregister on thread exit. Even for a libc it may be a bad idea due to
> > signals.
>
> The rseq register/unregister is only for the case where libc does not
> support rseq. But if you want to use rseq_register_current_thread(),
> then you'll want to pair it with unregister.

Why? Isn't it better to have tests more realitic?


> Thanks,
>
> Mathieu
>
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>> +     return 0;
> >>> +}
> >>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>> index ba424ce80a719..65da4a727c550 100644
> >>> --- a/tools/testing/selftests/rseq/rseq.h
> >>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>> @@ -8,6 +8,7 @@
> >>>    #ifndef RSEQ_H
> >>>    #define RSEQ_H
> >>>
> >>> +#include <assert.h>
> >>>    #include <stdint.h>
> >>>    #include <stdbool.h>
> >>>    #include <pthread.h>
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Mathieu Desnoyers 11 months ago
On 2025-03-10 10:43, Dmitry Vyukov wrote:
> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
>>> <mathieu.desnoyers@efficios.com> wrote:
>>>>
>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
>>>>> works and does not lead to process kills.
>>>>>
>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>>>> Cc: x86@kernel.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>>>
>>>>> ---
>>>>> Changes in v5:
>>>>>     - Use static for variables/functions
>>>>>     - Use RSEQ_READ/WRITE_ONCE instead of volatile
>>>>>
>>>>> Changes in v4:
>>>>>     - Added Fixes tag
>>>>>
>>>>> Changes in v3:
>>>>>     - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>     - rework the test to work when only pkey 0 is supported for rseq
>>>>>
>>>>> Changes in v2:
>>>>>     - change test to install protected rseq_cs instead of rseq
>>>>> ---
>>>>>     tools/testing/selftests/rseq/Makefile    |  2 +-
>>>>>     tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>>>>>     tools/testing/selftests/rseq/rseq.h      |  1 +
>>>>>     3 files changed, 100 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
>>>>> index 5a3432fceb586..9111d25fea3af 100644
>>>>> --- a/tools/testing/selftests/rseq/Makefile
>>>>> +++ b/tools/testing/selftests/rseq/Makefile
>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>>>>>
>>>>>     TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>>>>>                 param_test_benchmark param_test_compare_twice param_test_mm_cid \
>>>>> -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
>>>>> +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>>>>>
>>>>>     TEST_GEN_PROGS_EXTENDED = librseq.so
>>>>>
>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
>>>>> new file mode 100644
>>>>> index 0000000000000..cc4dd98190942
>>>>> --- /dev/null
>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
>>>>> @@ -0,0 +1,98 @@
>>>>> +// SPDX-License-Identifier: LGPL-2.1
>>>>> +/*
>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
>>>>> + */
>>>>> +
>>>>> +#define _GNU_SOURCE
>>>>> +#include <err.h>
>>>>> +#include <errno.h>
>>>>> +#include <stdio.h>
>>>>> +#include <stdlib.h>
>>>>> +#include <string.h>
>>>>> +#include <sys/mman.h>
>>>>> +#include <sys/syscall.h>
>>>>> +#include <ucontext.h>
>>>>> +#include <unistd.h>
>>>>> +
>>>>> +#include "rseq.h"
>>>>> +#include "rseq-abi.h"
>>>>> +
>>>>> +static int pkey;
>>>>> +static ucontext_t ucp0, ucp1;
>>>>> +
>>>>> +static void coroutine(void)
>>>>> +{
>>>>> +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
>>>>> +     /*
>>>>> +      * When we disable access to pkey 0, globals and TLS become
>>>>> +      * inaccessible too, so we need to tread carefully.
>>>>> +      * Pkey is global so we need to copy it onto the stack.
>>>>> +      */
>>>>> +     int pk = RSEQ_READ_ONCE(pkey);
>>>>> +     struct timespec ts;
>>>>> +
>>>>> +     orig_pk0 = pkey_get(0);
>>>>> +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
>>>>> +             err(1, "pkey_set failed");
>>>>> +     old_pk0 = pkey_get(0);
>>>>> +     old_pk1 = pkey_get(pk);
>>>>> +
>>>>> +     /*
>>>>> +      * Prevent compiler from initializing it by loading a 16-global.
>>>>> +      */
>>>>> +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
>>>>> +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
>>>>> +     /*
>>>>> +      * If the kernel misbehaves, context switches in the following loop
>>>>> +      * will terminate the process with SIGSEGV.
>>>>> +      * Trigger preemption w/o accessing TLS.
>>>>> +      * Note that glibc's usleep touches errno always.
>>>>> +      */
>>>>> +     for (i = 0; i < 10; i++)
>>>>> +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
>>>>> +
>>>>> +     pk0 = pkey_get(0);
>>>>> +     pk1 = pkey_get(pk);
>>>>> +     if (pkey_set(0, orig_pk0))
>>>>> +             err(1, "pkey_set failed");
>>>>> +
>>>>> +     /*
>>>>> +      * Ensure that the kernel has restored the previous value of pkeys
>>>>> +      * register after changing them.
>>>>> +      */
>>>>> +     if (old_pk0 != pk0)
>>>>> +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
>>>>> +     if (old_pk1 != pk1)
>>>>> +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
>>>>> +
>>>>> +     swapcontext(&ucp1, &ucp0);
>>>>> +     abort();
>>>>> +}
>>>>> +
>>>>> +int main(int argc, char **argv)
>>>>> +{
>>>>> +     pkey = pkey_alloc(0, 0);
>>>>> +     if (pkey == -1) {
>>>>> +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
>>>>> +                     strerror(errno));
>>>>> +             return 0;
>>>>> +     }
>>>>> +
>>>>> +     if (rseq_register_current_thread())
>>>>> +             err(1, "rseq_register_current_thread failed");
>>>>> +
>>>>> +     if (getcontext(&ucp1))
>>>>> +             err(1, "getcontext failed");
>>>>> +     ucp1.uc_stack.ss_size = getpagesize() * 4;
>>>>> +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
>>>>> +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
>>>>> +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
>>>>> +             err(1, "mmap failed");
>>>>> +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
>>>>> +                     PROT_READ | PROT_WRITE, pkey))
>>>>> +             err(1, "pkey_mprotect failed");
>>>>> +     makecontext(&ucp1, coroutine, 0);
>>>>> +     if (swapcontext(&ucp0, &ucp1))
>>>>> +             err(1, "swapcontext failed");
>>>>
>>>> Should the rseq register be paired with a rseq unregister here ?
>>>
>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
>>> unregister on thread exit. Even for a libc it may be a bad idea due to
>>> signals.
>>
>> The rseq register/unregister is only for the case where libc does not
>> support rseq. But if you want to use rseq_register_current_thread(),
>> then you'll want to pair it with unregister.
> 
> Why? Isn't it better to have tests more realitic?

If you use rseq.c rseq_register_current_thread without
rseq_unregister_current_thread, then you rely on implicit
unregistration done by the kernel at thread exit.

Then you need to ensure your userspace runtime keep the TLS
that holds the rseq area allocated for the entire execution
of the thread until it exits in the kernel. Note that
disabling signals is not sufficient to prevent the kernel
from accessing the rseq area.

GNU libc gets away with automatic unregistration at
thread exit because it completely controls the lifetime
of the registered rseq area, keeping it allocated for as
long as the thread is executing.

So in order to minimize the dependency on specific libc
behavior in the kernel sefltests, the selftests rseq.h
requires explicit registration *and* unregistration.

Thanks,

Mathieu


> 
> 
>> Thanks,
>>
>> Mathieu
>>
>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>> +     return 0;
>>>>> +}
>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
>>>>> index ba424ce80a719..65da4a727c550 100644
>>>>> --- a/tools/testing/selftests/rseq/rseq.h
>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
>>>>> @@ -8,6 +8,7 @@
>>>>>     #ifndef RSEQ_H
>>>>>     #define RSEQ_H
>>>>>
>>>>> +#include <assert.h>
>>>>>     #include <stdint.h>
>>>>>     #include <stdbool.h>
>>>>>     #include <pthread.h>
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> https://www.efficios.com
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Dmitry Vyukov 11 months ago
On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-03-10 10:43, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> >>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> >>> <mathieu.desnoyers@efficios.com> wrote:
> >>>>
> >>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>>>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>>>> works and does not lead to process kills.
> >>>>>
> >>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >>>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> >>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
> >>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>>> Cc: Ingo Molnar <mingo@redhat.com>
> >>>>> Cc: Borislav Petkov <bp@alien8.de>
> >>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >>>>> Cc: x86@kernel.org
> >>>>> Cc: linux-kernel@vger.kernel.org
> >>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>>>
> >>>>> ---
> >>>>> Changes in v5:
> >>>>>     - Use static for variables/functions
> >>>>>     - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>>>
> >>>>> Changes in v4:
> >>>>>     - Added Fixes tag
> >>>>>
> >>>>> Changes in v3:
> >>>>>     - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>>     - rework the test to work when only pkey 0 is supported for rseq
> >>>>>
> >>>>> Changes in v2:
> >>>>>     - change test to install protected rseq_cs instead of rseq
> >>>>> ---
> >>>>>     tools/testing/selftests/rseq/Makefile    |  2 +-
> >>>>>     tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>>>>     tools/testing/selftests/rseq/rseq.h      |  1 +
> >>>>>     3 files changed, 100 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>>>> index 5a3432fceb586..9111d25fea3af 100644
> >>>>> --- a/tools/testing/selftests/rseq/Makefile
> >>>>> +++ b/tools/testing/selftests/rseq/Makefile
> >>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>>>
> >>>>>     TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>>>>                 param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>>>> -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>>>> +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>>>
> >>>>>     TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>>>> new file mode 100644
> >>>>> index 0000000000000..cc4dd98190942
> >>>>> --- /dev/null
> >>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>>>> @@ -0,0 +1,98 @@
> >>>>> +// SPDX-License-Identifier: LGPL-2.1
> >>>>> +/*
> >>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>>>> + */
> >>>>> +
> >>>>> +#define _GNU_SOURCE
> >>>>> +#include <err.h>
> >>>>> +#include <errno.h>
> >>>>> +#include <stdio.h>
> >>>>> +#include <stdlib.h>
> >>>>> +#include <string.h>
> >>>>> +#include <sys/mman.h>
> >>>>> +#include <sys/syscall.h>
> >>>>> +#include <ucontext.h>
> >>>>> +#include <unistd.h>
> >>>>> +
> >>>>> +#include "rseq.h"
> >>>>> +#include "rseq-abi.h"
> >>>>> +
> >>>>> +static int pkey;
> >>>>> +static ucontext_t ucp0, ucp1;
> >>>>> +
> >>>>> +static void coroutine(void)
> >>>>> +{
> >>>>> +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>>>> +     /*
> >>>>> +      * When we disable access to pkey 0, globals and TLS become
> >>>>> +      * inaccessible too, so we need to tread carefully.
> >>>>> +      * Pkey is global so we need to copy it onto the stack.
> >>>>> +      */
> >>>>> +     int pk = RSEQ_READ_ONCE(pkey);
> >>>>> +     struct timespec ts;
> >>>>> +
> >>>>> +     orig_pk0 = pkey_get(0);
> >>>>> +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>>>> +             err(1, "pkey_set failed");
> >>>>> +     old_pk0 = pkey_get(0);
> >>>>> +     old_pk1 = pkey_get(pk);
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Prevent compiler from initializing it by loading a 16-global.
> >>>>> +      */
> >>>>> +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>>>> +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>>>> +     /*
> >>>>> +      * If the kernel misbehaves, context switches in the following loop
> >>>>> +      * will terminate the process with SIGSEGV.
> >>>>> +      * Trigger preemption w/o accessing TLS.
> >>>>> +      * Note that glibc's usleep touches errno always.
> >>>>> +      */
> >>>>> +     for (i = 0; i < 10; i++)
> >>>>> +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>>>> +
> >>>>> +     pk0 = pkey_get(0);
> >>>>> +     pk1 = pkey_get(pk);
> >>>>> +     if (pkey_set(0, orig_pk0))
> >>>>> +             err(1, "pkey_set failed");
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Ensure that the kernel has restored the previous value of pkeys
> >>>>> +      * register after changing them.
> >>>>> +      */
> >>>>> +     if (old_pk0 != pk0)
> >>>>> +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>>>> +     if (old_pk1 != pk1)
> >>>>> +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>>>> +
> >>>>> +     swapcontext(&ucp1, &ucp0);
> >>>>> +     abort();
> >>>>> +}
> >>>>> +
> >>>>> +int main(int argc, char **argv)
> >>>>> +{
> >>>>> +     pkey = pkey_alloc(0, 0);
> >>>>> +     if (pkey == -1) {
> >>>>> +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>>>> +                     strerror(errno));
> >>>>> +             return 0;
> >>>>> +     }
> >>>>> +
> >>>>> +     if (rseq_register_current_thread())
> >>>>> +             err(1, "rseq_register_current_thread failed");
> >>>>> +
> >>>>> +     if (getcontext(&ucp1))
> >>>>> +             err(1, "getcontext failed");
> >>>>> +     ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>>>> +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>>>> +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>>>> +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>>>> +             err(1, "mmap failed");
> >>>>> +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>>>> +                     PROT_READ | PROT_WRITE, pkey))
> >>>>> +             err(1, "pkey_mprotect failed");
> >>>>> +     makecontext(&ucp1, coroutine, 0);
> >>>>> +     if (swapcontext(&ucp0, &ucp1))
> >>>>> +             err(1, "swapcontext failed");
> >>>>
> >>>> Should the rseq register be paired with a rseq unregister here ?
> >>>
> >>> I dunno. It's necessary for this test and in general. Tcmalloc won't
> >>> unregister on thread exit. Even for a libc it may be a bad idea due to
> >>> signals.
> >>
> >> The rseq register/unregister is only for the case where libc does not
> >> support rseq. But if you want to use rseq_register_current_thread(),
> >> then you'll want to pair it with unregister.
> >
> > Why? Isn't it better to have tests more realitic?
>
> If you use rseq.c rseq_register_current_thread without
> rseq_unregister_current_thread, then you rely on implicit
> unregistration done by the kernel at thread exit.
>
> Then you need to ensure your userspace runtime keep the TLS
> that holds the rseq area allocated for the entire execution
> of the thread until it exits in the kernel. Note that
> disabling signals is not sufficient to prevent the kernel
> from accessing the rseq area.
>
> GNU libc gets away with automatic unregistration at
> thread exit because it completely controls the lifetime
> of the registered rseq area, keeping it allocated for as
> long as the thread is executing.
>
> So in order to minimize the dependency on specific libc
> behavior in the kernel sefltests, the selftests rseq.h
> requires explicit registration *and* unregistration.

If libc registers rseq (!rseq_ownership), then
rseq_unregister_current_thread is a no-op. And if libc has not
registered rseq, then we are not relying on any libc behavior. I don't
see how calling rseq_unregister_current_thread helps to rely less on
libc. Am I missing something?




> Thanks,
>
> Mathieu
>
>
> >
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Mathieu
> >>>>
> >>>>> +     return 0;
> >>>>> +}
> >>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>>>> index ba424ce80a719..65da4a727c550 100644
> >>>>> --- a/tools/testing/selftests/rseq/rseq.h
> >>>>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>>>> @@ -8,6 +8,7 @@
> >>>>>     #ifndef RSEQ_H
> >>>>>     #define RSEQ_H
> >>>>>
> >>>>> +#include <assert.h>
> >>>>>     #include <stdint.h>
> >>>>>     #include <stdbool.h>
> >>>>>     #include <pthread.h>
> >>>>
> >>>>
> >>>> --
> >>>> Mathieu Desnoyers
> >>>> EfficiOS Inc.
> >>>> https://www.efficios.com
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Mathieu Desnoyers 11 months ago
On 2025-03-10 12:31, Dmitry Vyukov wrote:
> On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2025-03-10 10:43, Dmitry Vyukov wrote:
>>> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
>>> <mathieu.desnoyers@efficios.com> wrote:
>>>>
>>>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
>>>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
>>>>> <mathieu.desnoyers@efficios.com> wrote:
>>>>>>
>>>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
>>>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
>>>>>>> works and does not lead to process kills.
>>>>>>>
>>>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>>>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>>>>>> Cc: x86@kernel.org
>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>>>>>
>>>>>>> ---
>>>>>>> Changes in v5:
>>>>>>>      - Use static for variables/functions
>>>>>>>      - Use RSEQ_READ/WRITE_ONCE instead of volatile
>>>>>>>
>>>>>>> Changes in v4:
>>>>>>>      - Added Fixes tag
>>>>>>>
>>>>>>> Changes in v3:
>>>>>>>      - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>>>      - rework the test to work when only pkey 0 is supported for rseq
>>>>>>>
>>>>>>> Changes in v2:
>>>>>>>      - change test to install protected rseq_cs instead of rseq
>>>>>>> ---
>>>>>>>      tools/testing/selftests/rseq/Makefile    |  2 +-
>>>>>>>      tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>>>>>>>      tools/testing/selftests/rseq/rseq.h      |  1 +
>>>>>>>      3 files changed, 100 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
>>>>>>> index 5a3432fceb586..9111d25fea3af 100644
>>>>>>> --- a/tools/testing/selftests/rseq/Makefile
>>>>>>> +++ b/tools/testing/selftests/rseq/Makefile
>>>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>>>>>>>
>>>>>>>      TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>>>>>>>                  param_test_benchmark param_test_compare_twice param_test_mm_cid \
>>>>>>> -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
>>>>>>> +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>>>>>>>
>>>>>>>      TEST_GEN_PROGS_EXTENDED = librseq.so
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
>>>>>>> new file mode 100644
>>>>>>> index 0000000000000..cc4dd98190942
>>>>>>> --- /dev/null
>>>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
>>>>>>> @@ -0,0 +1,98 @@
>>>>>>> +// SPDX-License-Identifier: LGPL-2.1
>>>>>>> +/*
>>>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
>>>>>>> + */
>>>>>>> +
>>>>>>> +#define _GNU_SOURCE
>>>>>>> +#include <err.h>
>>>>>>> +#include <errno.h>
>>>>>>> +#include <stdio.h>
>>>>>>> +#include <stdlib.h>
>>>>>>> +#include <string.h>
>>>>>>> +#include <sys/mman.h>
>>>>>>> +#include <sys/syscall.h>
>>>>>>> +#include <ucontext.h>
>>>>>>> +#include <unistd.h>
>>>>>>> +
>>>>>>> +#include "rseq.h"
>>>>>>> +#include "rseq-abi.h"
>>>>>>> +
>>>>>>> +static int pkey;
>>>>>>> +static ucontext_t ucp0, ucp1;
>>>>>>> +
>>>>>>> +static void coroutine(void)
>>>>>>> +{
>>>>>>> +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
>>>>>>> +     /*
>>>>>>> +      * When we disable access to pkey 0, globals and TLS become
>>>>>>> +      * inaccessible too, so we need to tread carefully.
>>>>>>> +      * Pkey is global so we need to copy it onto the stack.
>>>>>>> +      */
>>>>>>> +     int pk = RSEQ_READ_ONCE(pkey);
>>>>>>> +     struct timespec ts;
>>>>>>> +
>>>>>>> +     orig_pk0 = pkey_get(0);
>>>>>>> +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
>>>>>>> +             err(1, "pkey_set failed");
>>>>>>> +     old_pk0 = pkey_get(0);
>>>>>>> +     old_pk1 = pkey_get(pk);
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Prevent compiler from initializing it by loading a 16-global.
>>>>>>> +      */
>>>>>>> +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
>>>>>>> +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
>>>>>>> +     /*
>>>>>>> +      * If the kernel misbehaves, context switches in the following loop
>>>>>>> +      * will terminate the process with SIGSEGV.
>>>>>>> +      * Trigger preemption w/o accessing TLS.
>>>>>>> +      * Note that glibc's usleep touches errno always.
>>>>>>> +      */
>>>>>>> +     for (i = 0; i < 10; i++)
>>>>>>> +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
>>>>>>> +
>>>>>>> +     pk0 = pkey_get(0);
>>>>>>> +     pk1 = pkey_get(pk);
>>>>>>> +     if (pkey_set(0, orig_pk0))
>>>>>>> +             err(1, "pkey_set failed");
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Ensure that the kernel has restored the previous value of pkeys
>>>>>>> +      * register after changing them.
>>>>>>> +      */
>>>>>>> +     if (old_pk0 != pk0)
>>>>>>> +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
>>>>>>> +     if (old_pk1 != pk1)
>>>>>>> +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
>>>>>>> +
>>>>>>> +     swapcontext(&ucp1, &ucp0);
>>>>>>> +     abort();
>>>>>>> +}
>>>>>>> +
>>>>>>> +int main(int argc, char **argv)
>>>>>>> +{
>>>>>>> +     pkey = pkey_alloc(0, 0);
>>>>>>> +     if (pkey == -1) {
>>>>>>> +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
>>>>>>> +                     strerror(errno));
>>>>>>> +             return 0;
>>>>>>> +     }
>>>>>>> +
>>>>>>> +     if (rseq_register_current_thread())
>>>>>>> +             err(1, "rseq_register_current_thread failed");
>>>>>>> +
>>>>>>> +     if (getcontext(&ucp1))
>>>>>>> +             err(1, "getcontext failed");
>>>>>>> +     ucp1.uc_stack.ss_size = getpagesize() * 4;
>>>>>>> +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
>>>>>>> +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
>>>>>>> +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
>>>>>>> +             err(1, "mmap failed");
>>>>>>> +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
>>>>>>> +                     PROT_READ | PROT_WRITE, pkey))
>>>>>>> +             err(1, "pkey_mprotect failed");
>>>>>>> +     makecontext(&ucp1, coroutine, 0);
>>>>>>> +     if (swapcontext(&ucp0, &ucp1))
>>>>>>> +             err(1, "swapcontext failed");
>>>>>>
>>>>>> Should the rseq register be paired with a rseq unregister here ?
>>>>>
>>>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
>>>>> unregister on thread exit. Even for a libc it may be a bad idea due to
>>>>> signals.
>>>>
>>>> The rseq register/unregister is only for the case where libc does not
>>>> support rseq. But if you want to use rseq_register_current_thread(),
>>>> then you'll want to pair it with unregister.
>>>
>>> Why? Isn't it better to have tests more realitic?
>>
>> If you use rseq.c rseq_register_current_thread without
>> rseq_unregister_current_thread, then you rely on implicit
>> unregistration done by the kernel at thread exit.
>>
>> Then you need to ensure your userspace runtime keep the TLS
>> that holds the rseq area allocated for the entire execution
>> of the thread until it exits in the kernel. Note that
>> disabling signals is not sufficient to prevent the kernel
>> from accessing the rseq area.
>>
>> GNU libc gets away with automatic unregistration at
>> thread exit because it completely controls the lifetime
>> of the registered rseq area, keeping it allocated for as
>> long as the thread is executing.
>>
>> So in order to minimize the dependency on specific libc
>> behavior in the kernel sefltests, the selftests rseq.h
>> requires explicit registration *and* unregistration.
> 
> If libc registers rseq (!rseq_ownership), then
> rseq_unregister_current_thread is a no-op. And if libc has not
> registered rseq, then we are not relying on any libc behavior. I don't
> see how calling rseq_unregister_current_thread helps to rely less on
> libc. Am I missing something?

When libc does not support rseq (either because of the
glibc tunable, or having an old glibc, or running another
libc), rseq.c in the selftests registers an initial-exec
TLS rseq area. This TLS rseq area's lifetime is handled
by the libc. I don't want to depend on implementation-specific
libc behavior, unless they are documented and part of the
ABI.

In your case the area won't be re-used because the program
exits, but I'd rather use the same approach everywhere,
which is to unregister explicitly.

Note that in the librseq project, we are currently planning
to remove the explicit thread registration/unregistration
from the API, and will exclusively depend on having rseq support
provided by the libc. It simplifies the implementation, the API,
and will make it OK to link a dlopen'd .so against librseq.
There has been no official release of librseq yet, so now is
a good time to simplify the API and aim at what is becoming
the primary use-case.

Thanks,

Mathieu

> 
> 
> 
> 
>> Thanks,
>>
>> Mathieu
>>
>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mathieu
>>>>>>
>>>>>>> +     return 0;
>>>>>>> +}
>>>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
>>>>>>> index ba424ce80a719..65da4a727c550 100644
>>>>>>> --- a/tools/testing/selftests/rseq/rseq.h
>>>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
>>>>>>> @@ -8,6 +8,7 @@
>>>>>>>      #ifndef RSEQ_H
>>>>>>>      #define RSEQ_H
>>>>>>>
>>>>>>> +#include <assert.h>
>>>>>>>      #include <stdint.h>
>>>>>>>      #include <stdbool.h>
>>>>>>>      #include <pthread.h>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mathieu Desnoyers
>>>>>> EfficiOS Inc.
>>>>>> https://www.efficios.com
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> https://www.efficios.com
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
Posted by Dmitry Vyukov 8 months, 3 weeks ago
On Mon, 10 Mar 2025 at 18:26, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-03-10 12:31, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> On 2025-03-10 10:43, Dmitry Vyukov wrote:
> >>> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> >>> <mathieu.desnoyers@efficios.com> wrote:
> >>>>
> >>>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> >>>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> >>>>> <mathieu.desnoyers@efficios.com> wrote:
> >>>>>>
> >>>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>>>>>> works and does not lead to process kills.
> >>>>>>>
> >>>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> >>>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
> >>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
> >>>>>>> Cc: Borislav Petkov <bp@alien8.de>
> >>>>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >>>>>>> Cc: x86@kernel.org
> >>>>>>> Cc: linux-kernel@vger.kernel.org
> >>>>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>>>>>
> >>>>>>> ---
> >>>>>>> Changes in v5:
> >>>>>>>      - Use static for variables/functions
> >>>>>>>      - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>>>>>
> >>>>>>> Changes in v4:
> >>>>>>>      - Added Fixes tag
> >>>>>>>
> >>>>>>> Changes in v3:
> >>>>>>>      - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>>>>      - rework the test to work when only pkey 0 is supported for rseq
> >>>>>>>
> >>>>>>> Changes in v2:
> >>>>>>>      - change test to install protected rseq_cs instead of rseq
> >>>>>>> ---
> >>>>>>>      tools/testing/selftests/rseq/Makefile    |  2 +-
> >>>>>>>      tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>>>>>>      tools/testing/selftests/rseq/rseq.h      |  1 +
> >>>>>>>      3 files changed, 100 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>>>>>> index 5a3432fceb586..9111d25fea3af 100644
> >>>>>>> --- a/tools/testing/selftests/rseq/Makefile
> >>>>>>> +++ b/tools/testing/selftests/rseq/Makefile
> >>>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>>>>>
> >>>>>>>      TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>>>>>>                  param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>>>>>> -             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>>>>>> +             param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>>>>>
> >>>>>>>      TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>>>>>
> >>>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>>>>>> new file mode 100644
> >>>>>>> index 0000000000000..cc4dd98190942
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>>>>>> @@ -0,0 +1,98 @@
> >>>>>>> +// SPDX-License-Identifier: LGPL-2.1
> >>>>>>> +/*
> >>>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>>>>>> + */
> >>>>>>> +
> >>>>>>> +#define _GNU_SOURCE
> >>>>>>> +#include <err.h>
> >>>>>>> +#include <errno.h>
> >>>>>>> +#include <stdio.h>
> >>>>>>> +#include <stdlib.h>
> >>>>>>> +#include <string.h>
> >>>>>>> +#include <sys/mman.h>
> >>>>>>> +#include <sys/syscall.h>
> >>>>>>> +#include <ucontext.h>
> >>>>>>> +#include <unistd.h>
> >>>>>>> +
> >>>>>>> +#include "rseq.h"
> >>>>>>> +#include "rseq-abi.h"
> >>>>>>> +
> >>>>>>> +static int pkey;
> >>>>>>> +static ucontext_t ucp0, ucp1;
> >>>>>>> +
> >>>>>>> +static void coroutine(void)
> >>>>>>> +{
> >>>>>>> +     int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>>>>>> +     /*
> >>>>>>> +      * When we disable access to pkey 0, globals and TLS become
> >>>>>>> +      * inaccessible too, so we need to tread carefully.
> >>>>>>> +      * Pkey is global so we need to copy it onto the stack.
> >>>>>>> +      */
> >>>>>>> +     int pk = RSEQ_READ_ONCE(pkey);
> >>>>>>> +     struct timespec ts;
> >>>>>>> +
> >>>>>>> +     orig_pk0 = pkey_get(0);
> >>>>>>> +     if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>>>>>> +             err(1, "pkey_set failed");
> >>>>>>> +     old_pk0 = pkey_get(0);
> >>>>>>> +     old_pk1 = pkey_get(pk);
> >>>>>>> +
> >>>>>>> +     /*
> >>>>>>> +      * Prevent compiler from initializing it by loading a 16-global.
> >>>>>>> +      */
> >>>>>>> +     RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>>>>>> +     RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>>>>>> +     /*
> >>>>>>> +      * If the kernel misbehaves, context switches in the following loop
> >>>>>>> +      * will terminate the process with SIGSEGV.
> >>>>>>> +      * Trigger preemption w/o accessing TLS.
> >>>>>>> +      * Note that glibc's usleep touches errno always.
> >>>>>>> +      */
> >>>>>>> +     for (i = 0; i < 10; i++)
> >>>>>>> +             syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>>>>>> +
> >>>>>>> +     pk0 = pkey_get(0);
> >>>>>>> +     pk1 = pkey_get(pk);
> >>>>>>> +     if (pkey_set(0, orig_pk0))
> >>>>>>> +             err(1, "pkey_set failed");
> >>>>>>> +
> >>>>>>> +     /*
> >>>>>>> +      * Ensure that the kernel has restored the previous value of pkeys
> >>>>>>> +      * register after changing them.
> >>>>>>> +      */
> >>>>>>> +     if (old_pk0 != pk0)
> >>>>>>> +             errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>>>>>> +     if (old_pk1 != pk1)
> >>>>>>> +             errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>>>>>> +
> >>>>>>> +     swapcontext(&ucp1, &ucp0);
> >>>>>>> +     abort();
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +int main(int argc, char **argv)
> >>>>>>> +{
> >>>>>>> +     pkey = pkey_alloc(0, 0);
> >>>>>>> +     if (pkey == -1) {
> >>>>>>> +             printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>>>>>> +                     strerror(errno));
> >>>>>>> +             return 0;
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>> +     if (rseq_register_current_thread())
> >>>>>>> +             err(1, "rseq_register_current_thread failed");
> >>>>>>> +
> >>>>>>> +     if (getcontext(&ucp1))
> >>>>>>> +             err(1, "getcontext failed");
> >>>>>>> +     ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>>>>>> +     ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>>>>>> +             PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>>>>>> +     if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>>>>>> +             err(1, "mmap failed");
> >>>>>>> +     if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>>>>>> +                     PROT_READ | PROT_WRITE, pkey))
> >>>>>>> +             err(1, "pkey_mprotect failed");
> >>>>>>> +     makecontext(&ucp1, coroutine, 0);
> >>>>>>> +     if (swapcontext(&ucp0, &ucp1))
> >>>>>>> +             err(1, "swapcontext failed");
> >>>>>>
> >>>>>> Should the rseq register be paired with a rseq unregister here ?
> >>>>>
> >>>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
> >>>>> unregister on thread exit. Even for a libc it may be a bad idea due to
> >>>>> signals.
> >>>>
> >>>> The rseq register/unregister is only for the case where libc does not
> >>>> support rseq. But if you want to use rseq_register_current_thread(),
> >>>> then you'll want to pair it with unregister.
> >>>
> >>> Why? Isn't it better to have tests more realitic?
> >>
> >> If you use rseq.c rseq_register_current_thread without
> >> rseq_unregister_current_thread, then you rely on implicit
> >> unregistration done by the kernel at thread exit.
> >>
> >> Then you need to ensure your userspace runtime keep the TLS
> >> that holds the rseq area allocated for the entire execution
> >> of the thread until it exits in the kernel. Note that
> >> disabling signals is not sufficient to prevent the kernel
> >> from accessing the rseq area.
> >>
> >> GNU libc gets away with automatic unregistration at
> >> thread exit because it completely controls the lifetime
> >> of the registered rseq area, keeping it allocated for as
> >> long as the thread is executing.
> >>
> >> So in order to minimize the dependency on specific libc
> >> behavior in the kernel sefltests, the selftests rseq.h
> >> requires explicit registration *and* unregistration.
> >
> > If libc registers rseq (!rseq_ownership), then
> > rseq_unregister_current_thread is a no-op. And if libc has not
> > registered rseq, then we are not relying on any libc behavior. I don't
> > see how calling rseq_unregister_current_thread helps to rely less on
> > libc. Am I missing something?
>
> When libc does not support rseq (either because of the
> glibc tunable, or having an old glibc, or running another
> libc), rseq.c in the selftests registers an initial-exec
> TLS rseq area. This TLS rseq area's lifetime is handled
> by the libc. I don't want to depend on implementation-specific
> libc behavior, unless they are documented and part of the
> ABI.
>
> In your case the area won't be re-used because the program
> exits, but I'd rather use the same approach everywhere,
> which is to unregister explicitly.
>
> Note that in the librseq project, we are currently planning
> to remove the explicit thread registration/unregistration
> from the API, and will exclusively depend on having rseq support
> provided by the libc. It simplifies the implementation, the API,
> and will make it OK to link a dlopen'd .so against librseq.
> There has been no official release of librseq yet, so now is
> a good time to simplify the API and aim at what is becoming
> the primary use-case.

Hi,

I was distracted by other things for a while.
But now resent rebased series v7 with this fix.




> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>
> >>>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Mathieu
> >>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Mathieu
> >>>>>>
> >>>>>>> +     return 0;
> >>>>>>> +}
> >>>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>>>>>> index ba424ce80a719..65da4a727c550 100644
> >>>>>>> --- a/tools/testing/selftests/rseq/rseq.h
> >>>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>>>>>> @@ -8,6 +8,7 @@
> >>>>>>>      #ifndef RSEQ_H
> >>>>>>>      #define RSEQ_H
> >>>>>>>
> >>>>>>> +#include <assert.h>
> >>>>>>>      #include <stdint.h>
> >>>>>>>      #include <stdbool.h>
> >>>>>>>      #include <pthread.h>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Mathieu Desnoyers
> >>>>>> EfficiOS Inc.
> >>>>>> https://www.efficios.com
> >>>>
> >>>>
> >>>> --
> >>>> Mathieu Desnoyers
> >>>> EfficiOS Inc.
> >>>> https://www.efficios.com
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com