[PATCH 00/35] Shadow stacks for userspace

Rick Edgecombe posted 35 patches 3 years, 10 months ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         |   4 +
Documentation/filesystems/proc.rst            |   1 +
Documentation/x86/cet.rst                     | 145 ++++++
Documentation/x86/index.rst                   |   1 +
arch/arm/kernel/signal.c                      |   2 +-
arch/arm64/kernel/signal.c                    |   2 +-
arch/arm64/kernel/signal32.c                  |   2 +-
arch/sparc/kernel/signal32.c                  |   2 +-
arch/sparc/kernel/signal_64.c                 |   2 +-
arch/x86/Kconfig                              |  22 +
arch/x86/Kconfig.assembler                    |   5 +
arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
arch/x86/ia32/ia32_signal.c                   |  25 +-
arch/x86/include/asm/cet.h                    |  54 +++
arch/x86/include/asm/cpufeatures.h            |   1 +
arch/x86/include/asm/disabled-features.h      |   8 +-
arch/x86/include/asm/fpu/api.h                |   8 +
arch/x86/include/asm/fpu/types.h              |  23 +-
arch/x86/include/asm/fpu/xstate.h             |   6 +-
arch/x86/include/asm/idtentry.h               |   4 +
arch/x86/include/asm/mman.h                   |  24 +
arch/x86/include/asm/mmu_context.h            |   2 +
arch/x86/include/asm/msr-index.h              |  20 +
arch/x86/include/asm/page_types.h             |   7 +
arch/x86/include/asm/pgtable.h                | 302 ++++++++++--
arch/x86/include/asm/pgtable_types.h          |  48 +-
arch/x86/include/asm/processor.h              |   6 +
arch/x86/include/asm/special_insns.h          |  30 ++
arch/x86/include/asm/trap_pf.h                |   2 +
arch/x86/include/uapi/asm/mman.h              |   8 +-
arch/x86/include/uapi/asm/prctl.h             |  10 +
arch/x86/include/uapi/asm/processor-flags.h   |   2 +
arch/x86/kernel/Makefile                      |   1 +
arch/x86/kernel/cpu/common.c                  |  20 +
arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
arch/x86/kernel/elf_feature_prctl.c           |  72 +++
arch/x86/kernel/fpu/xstate.c                  | 167 ++++++-
arch/x86/kernel/idt.c                         |   4 +
arch/x86/kernel/process.c                     |  17 +-
arch/x86/kernel/process_64.c                  |   2 +
arch/x86/kernel/shstk.c                       | 446 ++++++++++++++++++
arch/x86/kernel/signal.c                      |  13 +
arch/x86/kernel/signal_compat.c               |   2 +-
arch/x86/kernel/traps.c                       |  62 +++
arch/x86/mm/fault.c                           |  19 +
arch/x86/mm/mmap.c                            |  48 ++
arch/x86/mm/pat/set_memory.c                  |   2 +-
arch/x86/mm/pgtable.c                         |  25 +
drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
fs/aio.c                                      |   2 +-
fs/proc/task_mmu.c                            |   3 +
include/linux/mm.h                            |  19 +-
include/linux/pgtable.h                       |   8 +
include/linux/syscalls.h                      |   1 +
include/uapi/asm-generic/siginfo.h            |   3 +-
include/uapi/asm-generic/unistd.h             |   2 +-
ipc/shm.c                                     |   2 +-
kernel/sys_ni.c                               |   1 +
mm/gup.c                                      |  16 +-
mm/huge_memory.c                              |  27 +-
mm/memory.c                                   |   5 +-
mm/migrate.c                                  |   3 +-
mm/mmap.c                                     |  15 +-
mm/mprotect.c                                 |   9 +-
mm/nommu.c                                    |   4 +-
mm/util.c                                     |   2 +-
tools/testing/selftests/x86/Makefile          |   9 +-
.../selftests/x86/test_map_shadow_stack.c     |  75 +++
69 files changed, 1797 insertions(+), 92 deletions(-)
create mode 100644 Documentation/x86/cet.rst
create mode 100644 arch/x86/include/asm/cet.h
create mode 100644 arch/x86/include/asm/mman.h
create mode 100644 arch/x86/kernel/elf_feature_prctl.c
create mode 100644 arch/x86/kernel/shstk.c
create mode 100644 tools/testing/selftests/x86/test_map_shadow_stack.c
[PATCH 00/35] Shadow stacks for userspace
Posted by Rick Edgecombe 3 years, 10 months ago
Hi,

This is a slight reboot of the userspace CET series. I will be taking over the 
series from Yu-cheng. Per some internal recommendations, I’ve reset the version
number and am calling it a new series. Hopefully, it doesn’t cause confusion.

The new plan is to upstream only userspace Shadow Stack support at this point. 
IBT can follow later, but for now I’ll focus solely on the most in-demand and
widely available (with the feature on AMD CPUs now) part of CET.

I thought as part of this reset, it might be useful to more fully write-up the 
design and summarize the history of the previous CET series. So this slightly
long cover letter does that. The "Updates" section has the changes, if anyone
doesn't want the history.


Why is Shadow Stack Wanted
==========================
The main use case for userspace shadow stack is providing protection against 
return oriented programming attacks. Fedora and Ubuntu already have many/most 
packages enabled for shadow stack. The main missing piece is Linux kernel 
support and there seems to be a high amount of interest in the ecosystem for
getting this feature supported. Besides security, Google has also done some
work on using shadow stack to improve performance and reliability of tracing.


Userspace Shadow Stack Implementation
=====================================
Shadow stack works by maintaining a secondary (shadow) stack that cannot be 
directly modified by applications. When executing a CALL instruction, the 
processor pushes the return address to both the normal stack and to the special 
permissioned shadow stack. Upon ret, the processor pops the shadow stack copy 
and compares it to the normal stack copy. If the two differ, the processor 
raises a control protection fault. This implementation supports shadow stack on 
64 bit kernels only, with support for 32 bit only via IA32 emulation.

	Shadow Stack Memory
	-------------------
	The majority of this series deals with changes for handling the special 
	shadow stack memory permissions. This memory is specified by the 
	Dirty+RO PTE bits. A tricky aspect of this is that this combination was 
	previously used to specify COW memory. So Linux needs to handle COW 
	differently when shadow stack is in use. The solution is to use a 
	software PTE bit to denote COW memory, and take care to clear the dirty
	bit when setting the memory RO.

	Setup and Upkeep of HW Registers
	--------------------------------
	Using userspace CET requires a CR4 bit set, and also the manipulation 
	of two xsave managed MSRs. The kernel needs to modify these registers 
	during various operations like clone and signal handling. These 
	operations may happen when the registers are restored to the CPU, or 
	saved in an xsave buffer. Since the recent AMX triggered FPU overhaul 
	removed direct access to the xsave buffer, this series adds an 
	interface to operate on the supervisor xstate.

	New ABIs
	--------
	This series introduces some new ABIs. The primary one is the shadow 
	stack itself. Since it is readable and the shadow stack pointer is 
	exposed to user space, applications can easily read and process the 
	shadow stack. And in fact the tracing usages plan to do exactly that.

	Most of the shadow stack contents are written by HW, but some of the 
	entries are added by the kernel. The main place for this is signals. As 
	part of handling the signal the kernel does some manual adjustment of 
	the shadow stack that userspace depends on.

	In addition to the contents of the shadow stack there is also user 
	visible behavior around when new shadow stacks are created and set in 
	the shadow stack pointer (SSP) register. This is relatively 
	straightforward – shadow stacks are created when new stacks are created 
	(thread creation, fork, etc). It is more or less what is required to 
	keep apps working.

	For situations when userspace creates a new stack (i.e. makecontext(), 
	fibers, etc), a new syscall is provided for creating shadow stack 
	memory. To make the shadow stack usable, it needs to have a restore 
	token written to the protected memory. So the syscall provides a way to 
	specificity this should be done by the kernel.

	When a shadow stack violation happens (when the return address of stack 
	not matching return address in shadow stack), a segfault is generated 
	with a new si_code specific to CET violations.

	Lastly, a new arch_prctl interface is created for controlling the 
	enablement of CET-like features. It is intended to also be used for 
	LAM. It operates on the feature status per-thread, so for process wide 
	enabling it is intended to be used early in things like dynamic 
	linker/loaders. However, it can be used later for per-thread enablement 
	of features like WRSS.

	WRSS
	----
	WRSS is an instruction that can write to shadow stacks. The HW provides 
	a way to enable this instruction for userspace use. Since shadow 
	stack’s are created initially protected, enabling WRSS allows any apps 
	that want to do unusual things with their stacks to have a way to 
	weaken protection and make things more flexible. A new feature bit is 
	defined to control enabling/disabling of WRSS.


History
=======
The branding “CET” really consists of two features: “Shadow Stack” and 
“Indirect Branch Tracking”. They both restrict previously allowed, but rarely 
valid behaviors and require userspace to change to avoid these behaviors before 
enabling the protection. These raw HW features need to be assembled into a 
software solution across userspace and kernel in order to add security value.
The kernel part of this solution has evolved iteratively starting with a lengthy
RFC period. 

Until now, the enabling effort was trying to support both Shadow Stack and IBT. 
This history will focus on a few areas of the shadow stack development history 
that I thought stood out.

	Signals
	-------
	Originally signals placed the location of the shadow stack restore 
	token inside the saved state on the stack. This was problematic from a 
	past ABI promises perspective. So the restore location was instead just 
	assumed from the shadow stack pointer. This works because in normal 
	allowed cases of calling sigreturn, the shadow stack pointer should be 
	right at the restore token at that time. There is no alternate shadow 
	stack support. If an alt shadow stack is added later we would need to 
	find a place to store the regular shadow stack token location. Options 
	could be to push something on the alt shadow stack, or to keep 
	something on the kernel side. So the current design keeps things simple 
	while slightly kicking the can down the road if alt shadow stacks 
	become a thing later. Siglongjmp is handled in glibc, using the incssp 
	instruction to unwind the shadow stack over the token.

	Shadow Stack Allocation
	-----------------------
	makecontext() implementations need a way to create new shadow stacks 
	with restore token’s such that they can be pivoted to from userspace. 
	The first interface to do this was an arch_prctl(). It created a shadow 
	stack with a restore token pre-setup, since the kernel has an 
	instruction that can write to user shadow stacks. However, this 
	interface was abandoned for being strange.

	The next version created PROT_SHADOW_STACK. This interface had two 
	problems. One, it left no options but for userspace to create writable 
	memory, write a restore token, then mproctect() it PROT_SHADOW_STACK. 
	The writable window left the shadow stack exposed, weakening the 
	security. Second, it caused problems with the guard pages. Since the 
	memory was initially created writable it did not have a guard page, but 
	then was mprotected later to a type of memory that should have one. 
	This resulted in missing guard pages and confused rb_subtree_gap’s.

	This version introduces a new syscall that behaves similarly to the 
	initial arch_prctl() interface in that it has the kernel write the 
	restore token.

	Enabling Interface
	------------------
	For the entire history of the original CET series, the design was to 
	enable shadow stack automatically if the feature bit was detected in 
	the elf header. Then it was userspace’s responsibility to turn it off 
	via an arch_prctl() if it was not desired, and this was handled by the 
	glibc dynamic loader. Glibc’s standard behavior (when CET if configured 
	is to leave shadow stack enabled if the executable and all linked 
	libraries are marked with shadow stacks.

	Many distros (Fedora and others) have binaries already marked with 
	shadow stack, waiting for kernel support. Unfortunately their glibc 
	binaries expect the original arch_prctl() interface for allocating 
	shadow stacks, as those changes were pushed ahead of kernel support. 
	The net result of it all is, when updating to a kernel with shadow 
	stack these binaries would suddenly get shadow stack enabled and expect 
	the arch_prctl() interface to be there. And so calls to makecontext() 
	will fail, resulting in visible breakages. This series deals with this 
	problem as described below in "Updates".


Updates
=======
These updates were mostly driven by public comments, but a lot of the design 
elements are new. I would like some extra scrutiny on the updates.

	New syscall for Shadow Stack Allocation
	---------------------------------------
	A new syscall is added for allocating shadow stacks to replace 
	PROT_SHADOW_STACK. Several options were considered, as described in the 
	“x86/cet/shstk: Introduce map_shadow_stack syscall”.

	Xsave Managed Supervisor State Modifications
	--------------------------------------------
	The shadow stack feature requires the kernel to modify xsaves managed 
	state. On one of the last versions of Yu-cheng’s series Boris had 
	commented on the pattern it was using to do this not necessarily being 
	ideal. The pattern was to force a restore to the registers and always 
	do the modification there. Then Thomas did an overhaul of the fpu code, 
	part of which consisted of making raw access to the xsave buffer 
	private to the fpu code. So this series tries to expose access again, 
	and in a way that addresses Boris’ comments.

	The method is to provide functions like wmsrl/rdmsrl, but that can 
	direct the operation to the correct location (registers or buffer), 
	while giving the proper notice to the fpu subsystem so things don’t get 
	clobbered or corrupted.

	In the past a solution like this was discussed as part of the PASID 
	series, and Thomas was not in favor. In CET’s case there is a more 
	logic around the CET MSR’s than in PASID's, and wrapping this logic 
	minimizes near identical open coded logic needed to do this more 
	efficiently. In addition it resolves the above described problem of 
	having no access to the xsave buffer. So it is being put forward here 
	under the supposition that CET’s usage may lead to a different 
	conclusion, not to try to ignore past direction.

	The user interrupt series has similar needs as CET, and will also use
	this internal interface if it’s found acceptable.

	Support for WRSS
	----------------
	Andy Lutomirski had asked if we change the shadow stack allocation API 
	such that userspace cannot create arbitrary shadow stacks, then we look 
	at exposing an interface to enable the WRSS instruction for userspace. 
	This way app’s that want to do unexpected things with shadow stacks 
	would still have the option to create shadow stacks with arbitrary 
	data.

	Switch Enabling Interface
	-------------------------
	As described above there is a problem with userspace binaries waiting 
	to break as soon as the kernel supports CET. This needs to be prevented 
	by changing the interface such that the old binaries will not enable 
	shadow stack AND behave as if shadow stack is not enabled. They should 
	run normally without shadow stack protection. Creating a new feature 
	(SHSTK2) for shadow stack was explored. SHSTK would never be supported 
	by the kernel, and all the userspace build tools would be updated to 
	target SHSTK2 instead of SHSTK. So old SHSTK binaries would be cleanly
	disabled.

	But there are existing downsides to automatic elf header processing 
	based enabling. The elf header feature spec is not defined by the 
	kernel and there are proposals to expand it to describe additional 
	logic. A simpler interface where the kernel is simply told what to 
	enable, and leaves all the decision making to userspace, is more 
	flexible for userspace and simpler for the kernel. There also already 
	needs to be an ARCH_X86_FEATURE_ENABLE arch_prctl() for WRSS (and 
	likely LAM will use it too), so it avoids there being two ways to turn 
	on these types of features. The only tricky part for shadow stack, is 
	that it has to be enabled very early. Wherever the shadow stack is 
	enabled, the app cannot return from that point, otherwise there will be 
	a shadow stack violation. It turns out glibc can enable shadow stack 
	this early, so it works nicely. So not automatically enabling any 
	features in the elf header will cleanly disable all old binaries, which 
	expect the kernel to enable CET features automatically. Then after the 
	kernel changes are upstream, glibc can be updated to use the new
	interface. This is the solution implemented in this series.

	Expand Commit Logs
	------------------
	As part of spinning up on this series, I found some of the commit logs 
	did not describe the changes in enough detail for me understand their 
	purpose. I tried to expand the logs and comments, where I had to go 
	digging. Hopefully it’s useful.
	
	Limit to only Intel Processors
	------------------------------
	Shadow stack is supported on some AMD processors, but this revision 
	(with expanded HW usage and xsaves changes) has only has been tested on 
	Intel ones. So this series has a patch to limit shadow stack support to 
	Intel processors. Ideally the patch would not even make it to mainline, 
	and should be dropped as soon as this testing is done. It's included 
	just in case.


Future Work
===========
Even though this is now exclusively a shadow stack series, there is still some 
remaining shadow stack work to be done.

	Ptrace
	------
	Early in the series, there was a patch to allow IA32_U_CET and
	IA32_PL3_SSP to be set. This patch was dropped and planned as a follow
	up to basic support, and it remains the plan. It will be needed for
	in-progress gdb support.

	CRIU Support
	------------
	In the past there was some speculation on the mailing list about 
	whether CRIU would need to be taught about CET. It turns out, it does. 
	The first issue hit is that CRIU calls sigreturn directly from its 
	“parasite code” that it injects into the dumper process. This violates
	this shadow stack implementation’s protection that intends to prevent
	attackers from doing this.

	With so many packages already enabled with shadow stack, there is 
	probably desire to make it work seamlessly. But in the meantime if 
	distros want to support shadow stack and CRIU, users could manually 
	disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for 
	a process they will wants to dump. It’s not ideal.

	I’d like to hear what people think about having shadow stack in the 
	kernel without this resolved. Nothing would change for any users until 
	they enable shadow stack in the kernel and update to a glibc configured
	with CET. Should CRIU userspace be solved before kernel support?

	Selftests
	---------
	There are some CET selftests being worked on and they are not included
	here.

Thanks,

Rick

Rick Edgecombe (7):
  x86/mm: Prevent VM_WRITE shadow stacks
  x86/fpu: Add helpers for modifying supervisor xstate
  x86/fpu: Add unsafe xsave buffer helpers
  x86/cet/shstk: Introduce map_shadow_stack syscall
  selftests/x86: Add map_shadow_stack syscall test
  x86/cet/shstk: Support wrss for userspace
  x86/cpufeatures: Limit shadow stack to Intel CPUs

Yu-cheng Yu (28):
  Documentation/x86: Add CET description
  x86/cet/shstk: Add Kconfig option for Shadow Stack
  x86/cpufeatures: Add CET CPU feature flags for Control-flow
    Enforcement Technology (CET)
  x86/cpufeatures: Introduce CPU setup and option parsing for CET
  x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  x86/cet: Add control-protection fault handler
  x86/mm: Remove _PAGE_DIRTY from kernel RO pages
  x86/mm: Move pmd_write(), pud_write() up in the file
  x86/mm: Introduce _PAGE_COW
  drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS
  x86/mm: Update pte_modify for _PAGE_COW
  x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for
    transition from _PAGE_DIRTY to _PAGE_COW
  mm: Move VM_UFFD_MINOR_BIT from 37 to 38
  mm: Introduce VM_SHADOW_STACK for shadow stack memory
  x86/mm: Check Shadow Stack page fault errors
  x86/mm: Update maybe_mkwrite() for shadow stack
  mm: Fixup places that call pte_mkwrite() directly
  mm: Add guard pages around a shadow stack.
  mm/mmap: Add shadow stack pages to memory accounting
  mm: Update can_follow_write_pte() for shadow stack
  mm/mprotect: Exclude shadow stack from preserve_write
  mm: Re-introduce vm_flags to do_mmap()
  x86/cet/shstk: Add user-mode shadow stack support
  x86/process: Change copy_thread() argument 'arg' to 'stack_size'
  x86/cet/shstk: Handle thread shadow stack
  x86/cet/shstk: Introduce shadow stack token setup/verify routines
  x86/cet/shstk: Handle signals for shadow stack
  x86/cet/shstk: Add arch_prctl elf feature functions

 .../admin-guide/kernel-parameters.txt         |   4 +
 Documentation/filesystems/proc.rst            |   1 +
 Documentation/x86/cet.rst                     | 145 ++++++
 Documentation/x86/index.rst                   |   1 +
 arch/arm/kernel/signal.c                      |   2 +-
 arch/arm64/kernel/signal.c                    |   2 +-
 arch/arm64/kernel/signal32.c                  |   2 +-
 arch/sparc/kernel/signal32.c                  |   2 +-
 arch/sparc/kernel/signal_64.c                 |   2 +-
 arch/x86/Kconfig                              |  22 +
 arch/x86/Kconfig.assembler                    |   5 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/x86/ia32/ia32_signal.c                   |  25 +-
 arch/x86/include/asm/cet.h                    |  54 +++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/fpu/api.h                |   8 +
 arch/x86/include/asm/fpu/types.h              |  23 +-
 arch/x86/include/asm/fpu/xstate.h             |   6 +-
 arch/x86/include/asm/idtentry.h               |   4 +
 arch/x86/include/asm/mman.h                   |  24 +
 arch/x86/include/asm/mmu_context.h            |   2 +
 arch/x86/include/asm/msr-index.h              |  20 +
 arch/x86/include/asm/page_types.h             |   7 +
 arch/x86/include/asm/pgtable.h                | 302 ++++++++++--
 arch/x86/include/asm/pgtable_types.h          |  48 +-
 arch/x86/include/asm/processor.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  30 ++
 arch/x86/include/asm/trap_pf.h                |   2 +
 arch/x86/include/uapi/asm/mman.h              |   8 +-
 arch/x86/include/uapi/asm/prctl.h             |  10 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  20 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/elf_feature_prctl.c           |  72 +++
 arch/x86/kernel/fpu/xstate.c                  | 167 ++++++-
 arch/x86/kernel/idt.c                         |   4 +
 arch/x86/kernel/process.c                     |  17 +-
 arch/x86/kernel/process_64.c                  |   2 +
 arch/x86/kernel/shstk.c                       | 446 ++++++++++++++++++
 arch/x86/kernel/signal.c                      |  13 +
 arch/x86/kernel/signal_compat.c               |   2 +-
 arch/x86/kernel/traps.c                       |  62 +++
 arch/x86/mm/fault.c                           |  19 +
 arch/x86/mm/mmap.c                            |  48 ++
 arch/x86/mm/pat/set_memory.c                  |   2 +-
 arch/x86/mm/pgtable.c                         |  25 +
 drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
 fs/aio.c                                      |   2 +-
 fs/proc/task_mmu.c                            |   3 +
 include/linux/mm.h                            |  19 +-
 include/linux/pgtable.h                       |   8 +
 include/linux/syscalls.h                      |   1 +
 include/uapi/asm-generic/siginfo.h            |   3 +-
 include/uapi/asm-generic/unistd.h             |   2 +-
 ipc/shm.c                                     |   2 +-
 kernel/sys_ni.c                               |   1 +
 mm/gup.c                                      |  16 +-
 mm/huge_memory.c                              |  27 +-
 mm/memory.c                                   |   5 +-
 mm/migrate.c                                  |   3 +-
 mm/mmap.c                                     |  15 +-
 mm/mprotect.c                                 |   9 +-
 mm/nommu.c                                    |   4 +-
 mm/util.c                                     |   2 +-
 tools/testing/selftests/x86/Makefile          |   9 +-
 .../selftests/x86/test_map_shadow_stack.c     |  75 +++
 69 files changed, 1797 insertions(+), 92 deletions(-)
 create mode 100644 Documentation/x86/cet.rst
 create mode 100644 arch/x86/include/asm/cet.h
 create mode 100644 arch/x86/include/asm/mman.h
 create mode 100644 arch/x86/kernel/elf_feature_prctl.c
 create mode 100644 arch/x86/kernel/shstk.c
 create mode 100644 tools/testing/selftests/x86/test_map_shadow_stack.c


base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
-- 
2.17.1
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Thomas Gleixner 3 years, 10 months ago
Rick,

On Sun, Jan 30 2022 at 13:18, Rick Edgecombe wrote:
> This is a slight reboot of the userspace CET series. I will be taking over the 
> series from Yu-cheng. Per some internal recommendations, I’ve reset the version
> number and am calling it a new series. Hopefully, it doesn’t cause
> confusion.

That's fine as it seems to be a major change in course, so a reset to V1
is justified. Don't worry about confusion, we can easily confuse ourself
with minor things than that version reset :)

> The new plan is to upstream only userspace Shadow Stack support at this point. 
> IBT can follow later, but for now I’ll focus solely on the most in-demand and
> widely available (with the feature on AMD CPUs now) part of CET.

We just have to keep in IBT mind so that we don't add roadblocks which
we regret some time later.

> I thought as part of this reset, it might be useful to more fully write-up the 
> design and summarize the history of the previous CET series. So this slightly
> long cover letter does that. The "Updates" section has the changes, if anyone
> doesn't want the history.

Thanks for that lengthy writeup. It's appreciated. There is too much
confusion already so a coherent summary is helpful.

> Why is Shadow Stack Wanted
> ==========================
> The main use case for userspace shadow stack is providing protection against 
> return oriented programming attacks. Fedora and Ubuntu already have many/most 
> packages enabled for shadow stack.

Which is unfortunately part of the overall problem ...

> History
> =======
> The branding “CET” really consists of two features: “Shadow Stack” and 
> “Indirect Branch Tracking”. They both restrict previously allowed, but rarely 
> valid behaviors and require userspace to change to avoid these behaviors before 
> enabling the protection. These raw HW features need to be assembled into a 
> software solution across userspace and kernel in order to add security value.
> The kernel part of this solution has evolved iteratively starting with a lengthy
> RFC period. 
>
> Until now, the enabling effort was trying to support both Shadow Stack and IBT. 
> This history will focus on a few areas of the shadow stack development history 
> that I thought stood out.
>
> 	Signals
> 	-------
> 	Originally signals placed the location of the shadow stack restore 
> 	token inside the saved state on the stack. This was problematic from a 
> 	past ABI promises perspective. So the restore location was instead just 
> 	assumed from the shadow stack pointer. This works because in normal 
> 	allowed cases of calling sigreturn, the shadow stack pointer should be 
> 	right at the restore token at that time. There is no alternate shadow 
> 	stack support. If an alt shadow stack is added later we would
> 	need to

So how is that going to work? altstack is not an esoteric corner case.

> 	Enabling Interface
> 	------------------
> 	For the entire history of the original CET series, the design was to 
> 	enable shadow stack automatically if the feature bit was detected in 
> 	the elf header. Then it was userspace’s responsibility to turn it off 
> 	via an arch_prctl() if it was not desired, and this was handled by the 
> 	glibc dynamic loader. Glibc’s standard behavior (when CET if configured 
> 	is to leave shadow stack enabled if the executable and all linked 
> 	libraries are marked with shadow stacks.
>
> 	Many distros (Fedora and others) have binaries already marked with 
> 	shadow stack, waiting for kernel support. Unfortunately their glibc 
> 	binaries expect the original arch_prctl() interface for allocating 
> 	shadow stacks, as those changes were pushed ahead of kernel support. 
> 	The net result of it all is, when updating to a kernel with shadow 
> 	stack these binaries would suddenly get shadow stack enabled and expect 
> 	the arch_prctl() interface to be there. And so calls to makecontext() 
> 	will fail, resulting in visible breakages. This series deals with this 
> 	problem as described below in "Updates".

I'm really impressed by the well thought out coordination on the glibc and
distro side. Designed by committee never worked ...

> Updates
> =======
> These updates were mostly driven by public comments, but a lot of the design 
> elements are new. I would like some extra scrutiny on the updates.
>
> 	New syscall for Shadow Stack Allocation
> 	---------------------------------------
> 	A new syscall is added for allocating shadow stacks to replace 
> 	PROT_SHADOW_STACK. Several options were considered, as described in the 
> 	“x86/cet/shstk: Introduce map_shadow_stack syscall”.
>
> 	Xsave Managed Supervisor State Modifications
> 	--------------------------------------------
> 	The shadow stack feature requires the kernel to modify xsaves managed 
> 	state. On one of the last versions of Yu-cheng’s series Boris had 
> 	commented on the pattern it was using to do this not necessarily being 
> 	ideal. The pattern was to force a restore to the registers and always 
> 	do the modification there. Then Thomas did an overhaul of the fpu code, 
> 	part of which consisted of making raw access to the xsave buffer 
> 	private to the fpu code. So this series tries to expose access again, 
> 	and in a way that addresses Boris’ comments.
>
> 	The method is to provide functions like wmsrl/rdmsrl, but that can 
> 	direct the operation to the correct location (registers or buffer), 
> 	while giving the proper notice to the fpu subsystem so things don’t get 
> 	clobbered or corrupted.
>
> 	In the past a solution like this was discussed as part of the PASID 
> 	series, and Thomas was not in favor. In CET’s case there is a more 
> 	logic around the CET MSR’s than in PASID's, and wrapping this logic 
> 	minimizes near identical open coded logic needed to do this more 
> 	efficiently. In addition it resolves the above described problem of 
> 	having no access to the xsave buffer. So it is being put forward here 
> 	under the supposition that CET’s usage may lead to a different 
> 	conclusion, not to try to ignore past direction.
>
> 	The user interrupt series has similar needs as CET, and will also use
> 	this internal interface if it’s found acceptable.

I'll have a look.

> 	Switch Enabling Interface
> 	-------------------------
> 	But there are existing downsides to automatic elf header processing 
> 	based enabling. The elf header feature spec is not defined by the 
> 	kernel and there are proposals to expand it to describe additional 
> 	logic. A simpler interface where the kernel is simply told what to 
> 	enable, and leaves all the decision making to userspace, is more 
> 	flexible for userspace and simpler for the kernel. There also already 
> 	needs to be an ARCH_X86_FEATURE_ENABLE arch_prctl() for WRSS (and 
> 	likely LAM will use it too), so it avoids there being two ways to turn 
> 	on these types of features. The only tricky part for shadow stack, is 
> 	that it has to be enabled very early. Wherever the shadow stack is 
> 	enabled, the app cannot return from that point, otherwise there will be 
> 	a shadow stack violation. It turns out glibc can enable shadow stack 
> 	this early, so it works nicely. So not automatically enabling any 
> 	features in the elf header will cleanly disable all old binaries, which 
> 	expect the kernel to enable CET features automatically. Then after the 
> 	kernel changes are upstream, glibc can be updated to use the new
> 	interface. This is the solution implemented in this series.

Makes sense.

> 	Expand Commit Logs
> 	------------------
> 	As part of spinning up on this series, I found some of the commit logs 
> 	did not describe the changes in enough detail for me understand their 
> 	purpose. I tried to expand the logs and comments, where I had to go 
> 	digging. Hopefully it’s useful.

Proper changelogs are always appreciated.
	
> 	Limit to only Intel Processors
> 	------------------------------
> 	Shadow stack is supported on some AMD processors, but this revision 
> 	(with expanded HW usage and xsaves changes) has only has been tested on 
> 	Intel ones. So this series has a patch to limit shadow stack support to 
> 	Intel processors. Ideally the patch would not even make it to mainline, 
> 	and should be dropped as soon as this testing is done. It's included 
> 	just in case.

Ha. I can give you access to an AMD machine with CET SS supported :)

> Future Work
> ===========
> Even though this is now exclusively a shadow stack series, there is still some 
> remaining shadow stack work to be done.
>
> 	Ptrace
> 	------
> 	Early in the series, there was a patch to allow IA32_U_CET and
> 	IA32_PL3_SSP to be set. This patch was dropped and planned as a follow
> 	up to basic support, and it remains the plan. It will be needed for
> 	in-progress gdb support.

It's pretty much a prerequisite for enabling it, right?

> 	CRIU Support
> 	------------
> 	In the past there was some speculation on the mailing list about 
> 	whether CRIU would need to be taught about CET. It turns out, it does. 
> 	The first issue hit is that CRIU calls sigreturn directly from its 
> 	“parasite code” that it injects into the dumper process. This violates
> 	this shadow stack implementation’s protection that intends to prevent
> 	attackers from doing this.
>
> 	With so many packages already enabled with shadow stack, there is 
> 	probably desire to make it work seamlessly. But in the meantime if 
> 	distros want to support shadow stack and CRIU, users could manually 
> 	disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for 
> 	a process they will wants to dump. It’s not ideal.
>
> 	I’d like to hear what people think about having shadow stack in the 
> 	kernel without this resolved. Nothing would change for any users until 
> 	they enable shadow stack in the kernel and update to a glibc configured
> 	with CET. Should CRIU userspace be solved before kernel support?

Definitely yes. Making CRIU users add a glibc tunable is not really an
option. We can't break CRIU systems with a kernel upgrade.

Thanks,

        tglx


Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Edgecombe, Rick P 3 years, 10 months ago
Hi Thomas,

Thanks for feedback on the plan.

On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
> > Until now, the enabling effort was trying to support both Shadow
> > Stack and IBT. 
> > This history will focus on a few areas of the shadow stack
> > development history 
> > that I thought stood out.
> > 
> >        Signals
> >        -------
> >        Originally signals placed the location of the shadow stack
> > restore 
> >        token inside the saved state on the stack. This was
> > problematic from a 
> >        past ABI promises perspective. So the restore location was
> > instead just 
> >        assumed from the shadow stack pointer. This works because in
> > normal 
> >        allowed cases of calling sigreturn, the shadow stack pointer
> > should be 
> >        right at the restore token at that time. There is no
> > alternate shadow 
> >        stack support. If an alt shadow stack is added later we
> > would
> >        need to
> 
> So how is that going to work? altstack is not an esoteric corner
> case.

My understanding is that the main usages for the signal stack were
handling stack overflows and corruption. Since the shadow stack only
contains return addresses rather than large stack allocations, and is
not generally writable or pivotable, I thought there was a good
possibility an alt shadow stack would not end up being especially
useful. Does it seem like reasonable guesswork?

If it does seems likely, I'll give it some more thought than that hand
wavy plan.

>         
> >        Limit to only Intel Processors
> >        ------------------------------
> >        Shadow stack is supported on some AMD processors, but this
> > revision 
> >        (with expanded HW usage and xsaves changes) has only has
> > been tested on 
> >        Intel ones. So this series has a patch to limit shadow stack
> > support to 
> >        Intel processors. Ideally the patch would not even make it
> > to mainline, 
> >        and should be dropped as soon as this testing is done. It's
> > included 
> >        just in case.
> 
> Ha. I can give you access to an AMD machine with CET SS supported :)

Thanks for the offer. It sounds like John Allen can do this testing.

> 
> > Future Work
> > ===========
> > Even though this is now exclusively a shadow stack series, there is
> > still some 
> > remaining shadow stack work to be done.
> > 
> >        Ptrace
> >        ------
> >        Early in the series, there was a patch to allow IA32_U_CET
> > and
> >        IA32_PL3_SSP to be set. This patch was dropped and planned
> > as a follow
> >        up to basic support, and it remains the plan. It will be
> > needed for
> >        in-progress gdb support.
> 
> It's pretty much a prerequisite for enabling it, right?

Yes.

> 
> >        CRIU Support
> >        ------------
> >        In the past there was some speculation on the mailing list
> > about 
> >        whether CRIU would need to be taught about CET. It turns
> > out, it does. 
> >        The first issue hit is that CRIU calls sigreturn directly
> > from its 
> >        “parasite code” that it injects into the dumper process.
> > This violates
> >        this shadow stack implementation’s protection that intends
> > to prevent
> >        attackers from doing this.
> > 
> >        With so many packages already enabled with shadow stack,
> > there is 
> >        probably desire to make it work seamlessly. But in the
> > meantime if 
> >        distros want to support shadow stack and CRIU, users could
> > manually 
> >        disabled shadow stack via
> > “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for 
> >        a process they will wants to dump. It’s not ideal.
> > 
> >        I’d like to hear what people think about having shadow stack
> > in the 
> >        kernel without this resolved. Nothing would change for any
> > users until 
> >        they enable shadow stack in the kernel and update to a glibc
> > configured
> >        with CET. Should CRIU userspace be solved before kernel
> > support?
> 
> Definitely yes. Making CRIU users add a glibc tunable is not really
> an
> option. We can't break CRIU systems with a kernel upgrade.

Ok got it, thanks. Just to be clear though, existing distros/binaries
out there will not have shadow stack enabled with just an updated
kernel (due to the enabling changes). So the CRIU tools would only
break after future glibc binaries enable CET, which users/distros would
have to do specifically (glibc doesn't even enable cet by default).

Since the purpose of this feature is to restrict previously allowed
behaviors, and it’s apparently getting enabled by default in some
distro's packages, I guess there is a decent chance that once a system
is updated with a future glibc some app somewhere will break. I was
under the impression that as long as there were no breakages under a
current set of binaries (including glibc), this was not considered a
kernel regression. Please correct me if this is wrong. I think there
are other options if we want to make this softer.

Of course none of that prevents known breakages from being fixed for
normal reasons and I’ll look into that for CRIU.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Andy Lutomirski 3 years, 10 months ago

On Thu, Feb 3, 2022, at 5:08 PM, Edgecombe, Rick P wrote:
> Hi Thomas,

>> >        Signals
>> >        -------
>> >        Originally signals placed the location of the shadow stack
>> > restore 
>> >        token inside the saved state on the stack. This was
>> > problematic from a 
>> >        past ABI promises perspective.

What was the actual problem?

>> > So the restore location was
>> > instead just 
>> >        assumed from the shadow stack pointer. This works because in
>> > normal 
>> >        allowed cases of calling sigreturn, the shadow stack pointer
>> > should be 
>> >        right at the restore token at that time. There is no
>> > alternate shadow 
>> >        stack support. If an alt shadow stack is added later we
>> > would
>> >        need to
>> 
>> So how is that going to work? altstack is not an esoteric corner
>> case.
>
> My understanding is that the main usages for the signal stack were
> handling stack overflows and corruption. Since the shadow stack only
> contains return addresses rather than large stack allocations, and is
> not generally writable or pivotable, I thought there was a good
> possibility an alt shadow stack would not end up being especially
> useful. Does it seem like reasonable guesswork?

It's also used for things like DOSEMU that execute in a weird context and then trap back out to the outer program using a signal handler and an altstack.  Also, imagine someone writing a SIGSEGV handler specifically intended to handle shadow stack overflow.

The shadow stack can be pivoted using RSTORSSP.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Edgecombe, Rick P 3 years, 10 months ago
On Thu, 2022-02-03 at 21:20 -0800, Andy Lutomirski wrote:
> On Thu, Feb 3, 2022, at 5:08 PM, Edgecombe, Rick P wrote:
> > Hi Thomas,
> > > >         Signals
> > > >         -------
> > > >         Originally signals placed the location of the shadow
> > > > stack
> > > > restore 
> > > >         token inside the saved state on the stack. This was
> > > > problematic from a 
> > > >         past ABI promises perspective.
> 
> What was the actual problem?

The meat of the discussion that I saw was this thread:

https://lore.kernel.org/lkml/CALCETrVTeYfzO-XWh+VwTuKCyPyp-oOMGH=QR_msG9tPQ4xPmA@mail.gmail.com/

The problem was that the saved shadow stack pointer was placed in a
location unexpected by userspace (at the end of the floating point
state), which apparently can be relocated by userspace that would not
be aware of the shadow stack state at the end. I think an earlier
version was not extendable either.

It does not seem to be a fully dead end, but I have to admit I didn’t
fully internalize the limits imposed by the userspace expectations of
the sigframe because the current solution seemed good. I’ll have to dig
into it a little more because alt stack, CRIU and these expectations
all seem intertwined.

Here is a question for you guys though – is a general ucontext
extension solution a nice-to-have on its own?

I was thinking that it would be better to keep CET stuff out of the
sigframe related structures if it can be avoided. One reason is that it
keeps security related register values out of writable locations. I’m
not thinking of any specific problem, but just a general idea of not
opening that stuff up if it’s not needed. An example is in the IBT
series, the wait-for-endbranch state was saved in a ucontext flag. Some
usages may want to keep it from being cleared in a signal and the
endbranch check skipped. So for shadow stack, just the general notion
that this is not ideal state to open up.

On where to keep the wait-for-endbranch state, PeterZ had suggested the
possibility of having a per-mm hashmap of (userspace stack addresses)-> 
(kernel side saved state), limited to some sensible limit of items.
This could be extendable to other state besides CET stuff. I was
thinking to look in that direction if it’s needed for the alt shadow
stack.

But, is it swimming against the place where saved state is "supposed"
to be? There is some optimum of compatibility (more apps able to opt-
in) and security. I guess it's probably not good to have the kernel
bend over backwards trying to get both.

> > > > So the restore location was
> > > > instead just 
> > > >         assumed from the shadow stack pointer. This works
> > > > because in
> > > > normal 
> > > >         allowed cases of calling sigreturn, the shadow stack
> > > > pointer
> > > > should be 
> > > >         right at the restore token at that time. There is no
> > > > alternate shadow 
> > > >         stack support. If an alt shadow stack is added later we
> > > > would
> > > >         need to
> > > 
> > > So how is that going to work? altstack is not an esoteric corner
> > > case.
> > 
> > My understanding is that the main usages for the signal stack were
> > handling stack overflows and corruption. Since the shadow stack
> > only
> > contains return addresses rather than large stack allocations, and
> > is
> > not generally writable or pivotable, I thought there was a good
> > possibility an alt shadow stack would not end up being especially
> > useful. Does it seem like reasonable guesswork?
> 
> It's also used for things like DOSEMU that execute in a weird context
> and then trap back out to the outer program using a signal handler
> and an altstack.  Also, imagine someone writing a SIGSEGV handler
> specifically intended to handle shadow stack overflow.

Interesting, thanks. I had been thinking that an alt shadow stack would
require a new interface that would mostly just sit dormant taking up
space. But probably an (obvious) better way would be to just have the
signalstack() syscall automatically create a new shadow stack, like the
rest of the series does automatically for new threads. I’ll think I’ll
see how that would look.

> 
> The shadow stack can be pivoted using RSTORSSP.

Yes, I just meant that the ability to pivot or modify is restricted (in
RSTORSSP's case by restore token checks) and so with less ability to
interact with it, it could be less likely for there to be corruptions.
This if of course just speculation.
RE: [PATCH 00/35] Shadow stacks for userspace
Posted by David Laight 3 years, 10 months ago
From: Edgecombe, Rick P
> Sent: 04 February 2022 01:08
> Hi Thomas,
> 
> Thanks for feedback on the plan.
> 
> On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
> > > Until now, the enabling effort was trying to support both Shadow
> > > Stack and IBT.
> > > This history will focus on a few areas of the shadow stack
> > > development history
> > > that I thought stood out.
> > >
> > >        Signals
> > >        -------
> > >        Originally signals placed the location of the shadow stack
> > > restore
> > >        token inside the saved state on the stack. This was
> > > problematic from a
> > >        past ABI promises perspective. So the restore location was
> > > instead just
> > >        assumed from the shadow stack pointer. This works because in
> > > normal
> > >        allowed cases of calling sigreturn, the shadow stack pointer
> > > should be
> > >        right at the restore token at that time. There is no
> > > alternate shadow
> > >        stack support. If an alt shadow stack is added later we
> > > would
> > >        need to
> >
> > So how is that going to work? altstack is not an esoteric corner
> > case.
> 
> My understanding is that the main usages for the signal stack were
> handling stack overflows and corruption. Since the shadow stack only
> contains return addresses rather than large stack allocations, and is
> not generally writable or pivotable, I thought there was a good
> possibility an alt shadow stack would not end up being especially
> useful. Does it seem like reasonable guesswork?

The other 'problem' is that it is valid to longjump out of a signal handler.
These days you have to use siglongjmp() not longjmp() but it is still used.

It is probably also valid to use siglongjmp() to jump from a nested
signal handler into the outer handler.
Given both signal handlers can have their own stack, there can be three
stacks involved.

I think the shadow stack pointer has to be in ucontext - which also
means the application can change it before returning from a signal.
In much the same way as all the segment registers can be changed
leading to all the nasty bugs when the final 'return to user' code
traps in kernel when loading invalid segment registers or executing iret.

Hmmm... do shadow stacks mean that longjmp() has to be a system call?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by H.J. Lu 3 years, 10 months ago
On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Edgecombe, Rick P
> > Sent: 04 February 2022 01:08
> > Hi Thomas,
> >
> > Thanks for feedback on the plan.
> >
> > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
> > > > Until now, the enabling effort was trying to support both Shadow
> > > > Stack and IBT.
> > > > This history will focus on a few areas of the shadow stack
> > > > development history
> > > > that I thought stood out.
> > > >
> > > >        Signals
> > > >        -------
> > > >        Originally signals placed the location of the shadow stack
> > > > restore
> > > >        token inside the saved state on the stack. This was
> > > > problematic from a
> > > >        past ABI promises perspective. So the restore location was
> > > > instead just
> > > >        assumed from the shadow stack pointer. This works because in
> > > > normal
> > > >        allowed cases of calling sigreturn, the shadow stack pointer
> > > > should be
> > > >        right at the restore token at that time. There is no
> > > > alternate shadow
> > > >        stack support. If an alt shadow stack is added later we
> > > > would
> > > >        need to
> > >
> > > So how is that going to work? altstack is not an esoteric corner
> > > case.
> >
> > My understanding is that the main usages for the signal stack were
> > handling stack overflows and corruption. Since the shadow stack only
> > contains return addresses rather than large stack allocations, and is
> > not generally writable or pivotable, I thought there was a good
> > possibility an alt shadow stack would not end up being especially
> > useful. Does it seem like reasonable guesswork?
>
> The other 'problem' is that it is valid to longjump out of a signal handler.
> These days you have to use siglongjmp() not longjmp() but it is still used.
>
> It is probably also valid to use siglongjmp() to jump from a nested
> signal handler into the outer handler.
> Given both signal handlers can have their own stack, there can be three
> stacks involved.
>
> I think the shadow stack pointer has to be in ucontext - which also
> means the application can change it before returning from a signal.
> In much the same way as all the segment registers can be changed
> leading to all the nasty bugs when the final 'return to user' code
> traps in kernel when loading invalid segment registers or executing iret.
>
> Hmmm... do shadow stacks mean that longjmp() has to be a system call?

No.  setjmp/longjmp save and restore shadow stack pointer.

--
H.J.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Edgecombe, Rick P 3 years, 10 months ago
On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote:
> On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com>
> wrote:
> > 
> > From: Edgecombe, Rick P
> > > Sent: 04 February 2022 01:08
> > > Hi Thomas,
> > > 
> > > Thanks for feedback on the plan.
> > > 
> > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
> > > > > Until now, the enabling effort was trying to support both
> > > > > Shadow
> > > > > Stack and IBT.
> > > > > This history will focus on a few areas of the shadow stack
> > > > > development history
> > > > > that I thought stood out.
> > > > > 
> > > > >        Signals
> > > > >        -------
> > > > >        Originally signals placed the location of the shadow
> > > > > stack
> > > > > restore
> > > > >        token inside the saved state on the stack. This was
> > > > > problematic from a
> > > > >        past ABI promises perspective. So the restore location
> > > > > was
> > > > > instead just
> > > > >        assumed from the shadow stack pointer. This works
> > > > > because in
> > > > > normal
> > > > >        allowed cases of calling sigreturn, the shadow stack
> > > > > pointer
> > > > > should be
> > > > >        right at the restore token at that time. There is no
> > > > > alternate shadow
> > > > >        stack support. If an alt shadow stack is added later
> > > > > we
> > > > > would
> > > > >        need to
> > > > 
> > > > So how is that going to work? altstack is not an esoteric
> > > > corner
> > > > case.
> > > 
> > > My understanding is that the main usages for the signal stack
> > > were
> > > handling stack overflows and corruption. Since the shadow stack
> > > only
> > > contains return addresses rather than large stack allocations,
> > > and is
> > > not generally writable or pivotable, I thought there was a good
> > > possibility an alt shadow stack would not end up being especially
> > > useful. Does it seem like reasonable guesswork?
> > 
> > The other 'problem' is that it is valid to longjump out of a signal
> > handler.
> > These days you have to use siglongjmp() not longjmp() but it is
> > still used.
> > 
> > It is probably also valid to use siglongjmp() to jump from a nested
> > signal handler into the outer handler.
> > Given both signal handlers can have their own stack, there can be
> > three
> > stacks involved.

So the scenario is?

1. Handle signal 1
2. sigsetjmp()
3. signalstack()
4. Handle signal 2 on alt stack
5. siglongjmp()

I'll check that it is covered by the tests, but I think it should work
in this series that has no alt shadow stack. I have only done a high
level overview of how the shadow stack stuff, that doesn't involve the
kernel, works in glibc. Sounds like I'll need to do a deeper dive.

> > 
> > I think the shadow stack pointer has to be in ucontext - which also
> > means the application can change it before returning from a signal.

Yes we might need to change it to support alt shadow stacks. Can you
elaborate why you think it has to be in ucontext? I was thinking of
looking at three options for storing the ssp:
 - Stored in the shadow stack like a token using WRUSS from the kernel.
 - Stored on the kernel side using a hashmap that maps ucontext or
   sigframe userspace address to ssp (this is of course similar to 
   storing in ucontext, except that the user can’t change the ssp).
 - Stored writable in userspace in ucontext.

But in this version, without alt shadow stacks, the shadow stack
pointer is not stored in ucontext. This causes the limitation that
userspace can only call sigreturn when it has returned back to a point
where there is a restore token on the shadow stack (which was placed
there by the kernel). This doesn’t mean it can’t switch to a different
shadow stack or handle a nested signal, but it limits the possibility
for calling sigreturn with a totally different sigframe (like CRIU and
SROP attacks do). It should hopefully be a helpful, protective
limitation for most apps and I'm hoping CRIU can be fixed without
removing it.

I am not aware of other limitations to signals (besides normal shadow
stack enforcement), but I could be missing it. And people's skepticism
is making me want to go back over it with more scrutiny.

> > In much the same way as all the segment registers can be changed
> > leading to all the nasty bugs when the final 'return to user' code
> > traps in kernel when loading invalid segment registers or executing
> > iret.

I don't think this is as difficult to avoid because userspace ssp has
its own register that should not be accessed at that point, but I have
not given this aspect enough analysis. Thanks for bringing it up.

> > 
> > Hmmm... do shadow stacks mean that longjmp() has to be a system
> > call?
> 
> No.  setjmp/longjmp save and restore shadow stack pointer.
> 

It sounds like it would help to write up in a lot more detail exactly
how all the signal and specialer stack manipulation scenarios work in
glibc.

Re: [PATCH 00/35] Shadow stacks for userspace
Posted by H.J. Lu 3 years, 10 months ago
On Sat, Feb 5, 2022 at 12:15 PM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote:
> > On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com>
> > wrote:
> > >
> > > From: Edgecombe, Rick P
> > > > Sent: 04 February 2022 01:08
> > > > Hi Thomas,
> > > >
> > > > Thanks for feedback on the plan.
> > > >
> > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
> > > > > > Until now, the enabling effort was trying to support both
> > > > > > Shadow
> > > > > > Stack and IBT.
> > > > > > This history will focus on a few areas of the shadow stack
> > > > > > development history
> > > > > > that I thought stood out.
> > > > > >
> > > > > >        Signals
> > > > > >        -------
> > > > > >        Originally signals placed the location of the shadow
> > > > > > stack
> > > > > > restore
> > > > > >        token inside the saved state on the stack. This was
> > > > > > problematic from a
> > > > > >        past ABI promises perspective. So the restore location
> > > > > > was
> > > > > > instead just
> > > > > >        assumed from the shadow stack pointer. This works
> > > > > > because in
> > > > > > normal
> > > > > >        allowed cases of calling sigreturn, the shadow stack
> > > > > > pointer
> > > > > > should be
> > > > > >        right at the restore token at that time. There is no
> > > > > > alternate shadow
> > > > > >        stack support. If an alt shadow stack is added later
> > > > > > we
> > > > > > would
> > > > > >        need to
> > > > >
> > > > > So how is that going to work? altstack is not an esoteric
> > > > > corner
> > > > > case.
> > > >
> > > > My understanding is that the main usages for the signal stack
> > > > were
> > > > handling stack overflows and corruption. Since the shadow stack
> > > > only
> > > > contains return addresses rather than large stack allocations,
> > > > and is
> > > > not generally writable or pivotable, I thought there was a good
> > > > possibility an alt shadow stack would not end up being especially
> > > > useful. Does it seem like reasonable guesswork?
> > >
> > > The other 'problem' is that it is valid to longjump out of a signal
> > > handler.
> > > These days you have to use siglongjmp() not longjmp() but it is
> > > still used.
> > >
> > > It is probably also valid to use siglongjmp() to jump from a nested
> > > signal handler into the outer handler.
> > > Given both signal handlers can have their own stack, there can be
> > > three
> > > stacks involved.
>
> So the scenario is?
>
> 1. Handle signal 1
> 2. sigsetjmp()
> 3. signalstack()
> 4. Handle signal 2 on alt stack
> 5. siglongjmp()
>
> I'll check that it is covered by the tests, but I think it should work
> in this series that has no alt shadow stack. I have only done a high
> level overview of how the shadow stack stuff, that doesn't involve the
> kernel, works in glibc. Sounds like I'll need to do a deeper dive.
>
> > >
> > > I think the shadow stack pointer has to be in ucontext - which also
> > > means the application can change it before returning from a signal.
>
> Yes we might need to change it to support alt shadow stacks. Can you
> elaborate why you think it has to be in ucontext? I was thinking of
> looking at three options for storing the ssp:
>  - Stored in the shadow stack like a token using WRUSS from the kernel.
>  - Stored on the kernel side using a hashmap that maps ucontext or
>    sigframe userspace address to ssp (this is of course similar to
>    storing in ucontext, except that the user can’t change the ssp).
>  - Stored writable in userspace in ucontext.
>
> But in this version, without alt shadow stacks, the shadow stack
> pointer is not stored in ucontext. This causes the limitation that
> userspace can only call sigreturn when it has returned back to a point
> where there is a restore token on the shadow stack (which was placed
> there by the kernel). This doesn’t mean it can’t switch to a different
> shadow stack or handle a nested signal, but it limits the possibility
> for calling sigreturn with a totally different sigframe (like CRIU and
> SROP attacks do). It should hopefully be a helpful, protective
> limitation for most apps and I'm hoping CRIU can be fixed without
> removing it.
>
> I am not aware of other limitations to signals (besides normal shadow
> stack enforcement), but I could be missing it. And people's skepticism
> is making me want to go back over it with more scrutiny.
>
> > > In much the same way as all the segment registers can be changed
> > > leading to all the nasty bugs when the final 'return to user' code
> > > traps in kernel when loading invalid segment registers or executing
> > > iret.
>
> I don't think this is as difficult to avoid because userspace ssp has
> its own register that should not be accessed at that point, but I have
> not given this aspect enough analysis. Thanks for bringing it up.
>
> > >
> > > Hmmm... do shadow stacks mean that longjmp() has to be a system
> > > call?
> >
> > No.  setjmp/longjmp save and restore shadow stack pointer.
> >
>
> It sounds like it would help to write up in a lot more detail exactly
> how all the signal and specialer stack manipulation scenarios work in
> glibc.
>

setjmp/longjmp work on the same sigjmp_buf.  Shadow stack pointer
is saved and restored, just like any other callee-saved registers.


-- 
H.J.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Andy Lutomirski 3 years, 10 months ago
On 2/5/22 12:15, Edgecombe, Rick P wrote:
> On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote:
>> On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com>
>> wrote:
>>>
>>> From: Edgecombe, Rick P
>>>> Sent: 04 February 2022 01:08
>>>> Hi Thomas,
>>>>
>>>> Thanks for feedback on the plan.
>>>>
>>>> On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
>>>>>> Until now, the enabling effort was trying to support both
>>>>>> Shadow
>>>>>> Stack and IBT.
>>>>>> This history will focus on a few areas of the shadow stack
>>>>>> development history
>>>>>> that I thought stood out.
>>>>>>
>>>>>>         Signals
>>>>>>         -------
>>>>>>         Originally signals placed the location of the shadow
>>>>>> stack
>>>>>> restore
>>>>>>         token inside the saved state on the stack. This was
>>>>>> problematic from a
>>>>>>         past ABI promises perspective. So the restore location
>>>>>> was
>>>>>> instead just
>>>>>>         assumed from the shadow stack pointer. This works
>>>>>> because in
>>>>>> normal
>>>>>>         allowed cases of calling sigreturn, the shadow stack
>>>>>> pointer
>>>>>> should be
>>>>>>         right at the restore token at that time. There is no
>>>>>> alternate shadow
>>>>>>         stack support. If an alt shadow stack is added later
>>>>>> we
>>>>>> would
>>>>>>         need to
>>>>>
>>>>> So how is that going to work? altstack is not an esoteric
>>>>> corner
>>>>> case.
>>>>
>>>> My understanding is that the main usages for the signal stack
>>>> were
>>>> handling stack overflows and corruption. Since the shadow stack
>>>> only
>>>> contains return addresses rather than large stack allocations,
>>>> and is
>>>> not generally writable or pivotable, I thought there was a good
>>>> possibility an alt shadow stack would not end up being especially
>>>> useful. Does it seem like reasonable guesswork?
>>>
>>> The other 'problem' is that it is valid to longjump out of a signal
>>> handler.
>>> These days you have to use siglongjmp() not longjmp() but it is
>>> still used.
>>>
>>> It is probably also valid to use siglongjmp() to jump from a nested
>>> signal handler into the outer handler.
>>> Given both signal handlers can have their own stack, there can be
>>> three
>>> stacks involved.
> 
> So the scenario is?
> 
> 1. Handle signal 1
> 2. sigsetjmp()
> 3. signalstack()
> 4. Handle signal 2 on alt stack
> 5. siglongjmp()
> 
> I'll check that it is covered by the tests, but I think it should work
> in this series that has no alt shadow stack. I have only done a high
> level overview of how the shadow stack stuff, that doesn't involve the
> kernel, works in glibc. Sounds like I'll need to do a deeper dive.
> 
>>>
>>> I think the shadow stack pointer has to be in ucontext - which also
>>> means the application can change it before returning from a signal.
> 
> Yes we might need to change it to support alt shadow stacks. Can you
> elaborate why you think it has to be in ucontext? I was thinking of
> looking at three options for storing the ssp:
>   - Stored in the shadow stack like a token using WRUSS from the kernel.
>   - Stored on the kernel side using a hashmap that maps ucontext or
>     sigframe userspace address to ssp (this is of course similar to
>     storing in ucontext, except that the user can’t change the ssp).
>   - Stored writable in userspace in ucontext.
> 
> But in this version, without alt shadow stacks, the shadow stack
> pointer is not stored in ucontext. This causes the limitation that
> userspace can only call sigreturn when it has returned back to a point
> where there is a restore token on the shadow stack (which was placed
> there by the kernel).



I'll reply here and maybe cover multiple things.


User code already needs to rewind the regular stack to call sigreturn -- 
sigreturn find the signal frame based on ESP/RSP.  So if you call it 
from the wrong place, you go boom.  I think that the Linux SHSTK ABI 
should have the property that no amount of tampering with just the 
ucontext and associated structures can cause sigreturn to redirect to 
the wrong IP -- there should be something on the shadow stack that also 
gets verified in sigreturn.  IIRC the series does this, but it's been a 
while.  The post-sigreturn SSP should be entirely implied by 
pre-sigreturn SSP (or perhaps something on the shadow stack), so, in the 
absence of an altshadowstack feature, no ucontext changes should be needed.

We can also return from a signal or from more than one signal at once, 
as above, using siglongjmp.  It seems like this should Just Work (tm), 
at least in the absence of altshadowstack.

So this leaves altshadowstack.  If we want to allow userspace to handle 
a shstk overflow, I think we need altshadowstack.  And I can easily 
imagine signal handling in a coroutine or user-threading evironment (Go? 
UMCG or whatever it's called?) wanting this.  As noted, this obnoxious 
Andy person didn't like putting any shstk-related extensions in the FPU 
state.

For better or for worse, altshadowstack is (I think) fundamentally a new 
API.  No amount of ucontext magic is going to materialize an entire 
shadow stack out of nowhere when someone calls sigaltstack().  So the 
questions are: should we support altshadowstack from day one and, if so, 
what should it look like?

If we want to be clever, we could attempt to make altstadowstack 
compatible with RSTORSSP.  Signal delivery pushes a restore token to the 
old stack (hah!  what if the old stack is full?) and pushes the RSTORSSP 
busy magic to the new stack, and sigreturn inverts it.  Code that wants 
to return without sigreturn does it manually with RSTORSSP.  (Assuming 
that I've understood the arcane RSTORSSP sequence right.  Intel wins 
major points for documentation quality here.)  Or we could invent our 
own scheme.  In either case, I don't immediately see any reason that the 
ucontext needs to contain a shadow stack pointer.

There's a delightful wart to consider, though.  siglongjmp, at least as 
currently envisioned, can't return off an altshadowstack: the whole 
point of the INCSSP distance restrictions to to avoid incrementing right 
off the top of the current stack, but siglongjmp off an altshadowstack 
fundamentally switches stacks.  So either siglongjmp off an 
altshadowstack needs to be illegal or it needs to work differently.  (By 
incssp-ing to the top of the altshadowstack, then switching, then 
incssp-ing some more?  How does it even find the top of the current 
altshadowstack?)  And the plot thickens if one tries to siglongjmp off 
two nested altshadowstack-using signals in a single call.   Fortunately, 
since altshadowstack is a new API, it's not entirely crazy to have 
different rules.

So I don't have a complete or even almost complete design in mind, but I 
think we do need to make a conscious decision either to design this 
right or to skip it for v1.

As for CRIU, I don't think anyone really expects a new kernel, running 
new userspace that takes advantage of features in the new kernel, to 
work with old CRIU.  Upgrading to a SHSTK kernel should still allow 
using CRIU with non-SHSTK userspace, but I don't see how it's possible 
for CRIU to handle SHSTK without updates.  We should certainly do our 
best to make CRIU's life easy, though.

  This doesn’t mean it can’t switch to a different
> shadow stack or handle a nested signal, but it limits the possibility
> for calling sigreturn with a totally different sigframe (like CRIU and
> SROP attacks do). It should hopefully be a helpful, protective
> limitation for most apps and I'm hoping CRIU can be fixed without
> removing it.
> 
> I am not aware of other limitations to signals (besides normal shadow
> stack enforcement), but I could be missing it. And people's skepticism
> is making me want to go back over it with more scrutiny.
> 
>>> In much the same way as all the segment registers can be changed
>>> leading to all the nasty bugs when the final 'return to user' code
>>> traps in kernel when loading invalid segment registers or executing
>>> iret.
> 
> I don't think this is as difficult to avoid because userspace ssp has
> its own register that should not be accessed at that point, but I have
> not given this aspect enough analysis. Thanks for bringing it up.
> 
>>>
>>> Hmmm... do shadow stacks mean that longjmp() has to be a system
>>> call?
>>
>> No.  setjmp/longjmp save and restore shadow stack pointer.
>>
> 
> It sounds like it would help to write up in a lot more detail exactly
> how all the signal and specialer stack manipulation scenarios work in
> glibc.
> 

Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Thomas Gleixner 3 years, 10 months ago
On Mon, Feb 07 2022 at 17:31, Andy Lutomirski wrote:
> So this leaves altshadowstack.  If we want to allow userspace to handle 
> a shstk overflow, I think we need altshadowstack.  And I can easily 
> imagine signal handling in a coroutine or user-threading evironment (Go? 
> UMCG or whatever it's called?) wanting this.  As noted, this obnoxious 
> Andy person didn't like putting any shstk-related extensions in the FPU 
> state.
>
> For better or for worse, altshadowstack is (I think) fundamentally a new 
> API.  No amount of ucontext magic is going to materialize an entire 
> shadow stack out of nowhere when someone calls sigaltstack().  So the 
> questions are: should we support altshadowstack from day one and, if so, 
> what should it look like?

I think we should support them from day one.

> So I don't have a complete or even almost complete design in mind, but I 
> think we do need to make a conscious decision either to design this 
> right or to skip it for v1.

Skipping it might create a fundamental design fail situation as it might
require changes to the shadow stack signal handling in general which
becomes a nightmare once a non-altstack API is exposed.

> As for CRIU, I don't think anyone really expects a new kernel, running 
> new userspace that takes advantage of features in the new kernel, to 
> work with old CRIU.

Yes, CRIU needs updates, but what ensures that CRIU managed user space
does not use SHSTK if CRIU is not updated yet?

> Upgrading to a SHSTK kernel should still allow using CRIU with
> non-SHSTK userspace, but I don't see how it's possible for CRIU to
> handle SHSTK without updates.  We should certainly do our best to make
> CRIU's life easy, though.

Handling CRIU with SHSTK enabled has to be part of the overall design
otherwise we'll either end up with horrible hacks or with a requirement
to change the V1 UAPI....

Thanks,

        tglx
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Andy Lutomirski 3 years, 10 months ago
On Tue, Feb 8, 2022, at 1:31 AM, Thomas Gleixner wrote:
> On Mon, Feb 07 2022 at 17:31, Andy Lutomirski wrote:
>> So this leaves altshadowstack.  If we want to allow userspace to handle 
>> a shstk overflow, I think we need altshadowstack.  And I can easily 
>> imagine signal handling in a coroutine or user-threading evironment (Go? 
>> UMCG or whatever it's called?) wanting this.  As noted, this obnoxious 
>> Andy person didn't like putting any shstk-related extensions in the FPU 
>> state.
>>
>> For better or for worse, altshadowstack is (I think) fundamentally a new 
>> API.  No amount of ucontext magic is going to materialize an entire 
>> shadow stack out of nowhere when someone calls sigaltstack().  So the 
>> questions are: should we support altshadowstack from day one and, if so, 
>> what should it look like?
>
> I think we should support them from day one.
>
>> So I don't have a complete or even almost complete design in mind, but I 
>> think we do need to make a conscious decision either to design this 
>> right or to skip it for v1.
>
> Skipping it might create a fundamental design fail situation as it might
> require changes to the shadow stack signal handling in general which
> becomes a nightmare once a non-altstack API is exposed.

It would also expose a range of kernels in which shstk is on but programs that want altshadowstack don't have it.  That would be annoying.

>
>> As for CRIU, I don't think anyone really expects a new kernel, running 
>> new userspace that takes advantage of features in the new kernel, to 
>> work with old CRIU.
>
> Yes, CRIU needs updates, but what ensures that CRIU managed user space
> does not use SHSTK if CRIU is not updated yet?

In some sense this is like any other feature.  If a program uses timerfd but CRIU doesn't support timerfd, then it won't work.  SHSTK is a bit unique because it's likely that all programs on a system will start using it all at once.
RE: [PATCH 00/35] Shadow stacks for userspace
Posted by David Laight 3 years, 10 months ago
From: Edgecombe, Rick P
> Sent: 05 February 2022 20:15
> 
> On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote:
> > On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com>
> > wrote:
> > >
> > > From: Edgecombe, Rick P
> > > > Sent: 04 February 2022 01:08
> > > > Hi Thomas,
> > > >
> > > > Thanks for feedback on the plan.
> > > >
> > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
> > > > > > Until now, the enabling effort was trying to support both
> > > > > > Shadow
> > > > > > Stack and IBT.
> > > > > > This history will focus on a few areas of the shadow stack
> > > > > > development history
> > > > > > that I thought stood out.
> > > > > >
> > > > > >        Signals
> > > > > >        -------
> > > > > >        Originally signals placed the location of the shadow
> > > > > > stack
> > > > > > restore
> > > > > >        token inside the saved state on the stack. This was
> > > > > > problematic from a
> > > > > >        past ABI promises perspective. So the restore location
> > > > > > was
> > > > > > instead just
> > > > > >        assumed from the shadow stack pointer. This works
> > > > > > because in
> > > > > > normal
> > > > > >        allowed cases of calling sigreturn, the shadow stack
> > > > > > pointer
> > > > > > should be
> > > > > >        right at the restore token at that time. There is no
> > > > > > alternate shadow
> > > > > >        stack support. If an alt shadow stack is added later
> > > > > > we
> > > > > > would
> > > > > >        need to
> > > > >
> > > > > So how is that going to work? altstack is not an esoteric
> > > > > corner
> > > > > case.
> > > >
> > > > My understanding is that the main usages for the signal stack
> > > > were
> > > > handling stack overflows and corruption. Since the shadow stack
> > > > only
> > > > contains return addresses rather than large stack allocations,
> > > > and is
> > > > not generally writable or pivotable, I thought there was a good
> > > > possibility an alt shadow stack would not end up being especially
> > > > useful. Does it seem like reasonable guesswork?
> > >
> > > The other 'problem' is that it is valid to longjump out of a signal
> > > handler.
> > > These days you have to use siglongjmp() not longjmp() but it is
> > > still used.
> > >
> > > It is probably also valid to use siglongjmp() to jump from a nested
> > > signal handler into the outer handler.
> > > Given both signal handlers can have their own stack, there can be
> > > three
> > > stacks involved.
> 
> So the scenario is?
> 
> 1. Handle signal 1
> 2. sigsetjmp()
> 3. signalstack()
> 4. Handle signal 2 on alt stack
> 5. siglongjmp()
> 
> I'll check that it is covered by the tests, but I think it should work
> in this series that has no alt shadow stack. I have only done a high
> level overview of how the shadow stack stuff, that doesn't involve the
> kernel, works in glibc. Sounds like I'll need to do a deeper dive.

The posix/xopen definition for setjmp/longjmp doesn't require such
longjmp requests to work.

Although they still have to do something that doesn't break badly.
Aborting the process is probably fine!

> > > I think the shadow stack pointer has to be in ucontext - which also
> > > means the application can change it before returning from a signal.
> 
> Yes we might need to change it to support alt shadow stacks. Can you
> elaborate why you think it has to be in ucontext? I was thinking of
> looking at three options for storing the ssp:
>  - Stored in the shadow stack like a token using WRUSS from the kernel.
>  - Stored on the kernel side using a hashmap that maps ucontext or
>    sigframe userspace address to ssp (this is of course similar to
>    storing in ucontext, except that the user can’t change the ssp).
>  - Stored writable in userspace in ucontext.
> 
> But in this version, without alt shadow stacks, the shadow stack
> pointer is not stored in ucontext. This causes the limitation that
> userspace can only call sigreturn when it has returned back to a point
> where there is a restore token on the shadow stack (which was placed
> there by the kernel). This doesn’t mean it can’t switch to a different
> shadow stack or handle a nested signal, but it limits the possibility
> for calling sigreturn with a totally different sigframe (like CRIU and
> SROP attacks do). It should hopefully be a helpful, protective
> limitation for most apps and I'm hoping CRIU can be fixed without
> removing it.
> 
> I am not aware of other limitations to signals (besides normal shadow
> stack enforcement), but I could be missing it. And people's skepticism
> is making me want to go back over it with more scrutiny.
> 
> > > In much the same way as all the segment registers can be changed
> > > leading to all the nasty bugs when the final 'return to user' code
> > > traps in kernel when loading invalid segment registers or executing
> > > iret.
> 
> I don't think this is as difficult to avoid because userspace ssp has
> its own register that should not be accessed at that point, but I have
> not given this aspect enough analysis. Thanks for bringing it up.

So the user ssp isn't saved (or restored) by the trap entry/exit.
So it needs to be saved by the context switch code?
Much like the user segment registers?
So you are likely to get the same problems if restoring it can fault
in kernel (eg for a non-canonical address).

> > > Hmmm... do shadow stacks mean that longjmp() has to be a system
> > > call?
> >
> > No.  setjmp/longjmp save and restore shadow stack pointer.

Ok, I was thinking that direct access to the user ssp would be
a privileged operation.
If it can be written you don't really have to worry about what code
is trying to do - it can actually do what it likes.
It just catches unintentional operations (like buffer overflows).

Was there any 'spare' space in struct jmpbuf ?
Otherwise you can only enable shadow stacks if everything has been
recompiled - including any shared libraries the might be dlopen()ed.
(or does the compiler invent an alloca() call somehow for a
size that comes back from glibc?)

I've never really considered how setjmp/longjmp handle callee saved
register variables (apart from it being hard).
The original pdp11 implementation probably only needed to save r6 and r7.

What does happen to all the 'extended state' that XSAVE handles?
IIRC all the AVX registers are caller saved (so should probably
be zerod), but some of the SSE ones are callee saved, and one or
two of the fpu flags are sticky and annoying enough the save/restore
at the best of times.

> It sounds like it would help to write up in a lot more detail exactly
> how all the signal and specialer stack manipulation scenarios work in
> glibc.

Some cross references might have made people notice that the ucontext
extensions for AVX512 (if not earlier ones) broke the minimal/default
signal stack size.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by H.J. Lu 3 years, 10 months ago
On Sun, Feb 6, 2022 at 5:42 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Edgecombe, Rick P
> > Sent: 05 February 2022 20:15
> >
> > On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote:
> > > On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com>
> > > wrote:
> > > >
> > > > From: Edgecombe, Rick P
> > > > > Sent: 04 February 2022 01:08
> > > > > Hi Thomas,
> > > > >
> > > > > Thanks for feedback on the plan.
> > > > >
> > > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote:
> > > > > > > Until now, the enabling effort was trying to support both
> > > > > > > Shadow
> > > > > > > Stack and IBT.
> > > > > > > This history will focus on a few areas of the shadow stack
> > > > > > > development history
> > > > > > > that I thought stood out.
> > > > > > >
> > > > > > >        Signals
> > > > > > >        -------
> > > > > > >        Originally signals placed the location of the shadow
> > > > > > > stack
> > > > > > > restore
> > > > > > >        token inside the saved state on the stack. This was
> > > > > > > problematic from a
> > > > > > >        past ABI promises perspective. So the restore location
> > > > > > > was
> > > > > > > instead just
> > > > > > >        assumed from the shadow stack pointer. This works
> > > > > > > because in
> > > > > > > normal
> > > > > > >        allowed cases of calling sigreturn, the shadow stack
> > > > > > > pointer
> > > > > > > should be
> > > > > > >        right at the restore token at that time. There is no
> > > > > > > alternate shadow
> > > > > > >        stack support. If an alt shadow stack is added later
> > > > > > > we
> > > > > > > would
> > > > > > >        need to
> > > > > >
> > > > > > So how is that going to work? altstack is not an esoteric
> > > > > > corner
> > > > > > case.
> > > > >
> > > > > My understanding is that the main usages for the signal stack
> > > > > were
> > > > > handling stack overflows and corruption. Since the shadow stack
> > > > > only
> > > > > contains return addresses rather than large stack allocations,
> > > > > and is
> > > > > not generally writable or pivotable, I thought there was a good
> > > > > possibility an alt shadow stack would not end up being especially
> > > > > useful. Does it seem like reasonable guesswork?
> > > >
> > > > The other 'problem' is that it is valid to longjump out of a signal
> > > > handler.
> > > > These days you have to use siglongjmp() not longjmp() but it is
> > > > still used.
> > > >
> > > > It is probably also valid to use siglongjmp() to jump from a nested
> > > > signal handler into the outer handler.
> > > > Given both signal handlers can have their own stack, there can be
> > > > three
> > > > stacks involved.
> >
> > So the scenario is?
> >
> > 1. Handle signal 1
> > 2. sigsetjmp()
> > 3. signalstack()
> > 4. Handle signal 2 on alt stack
> > 5. siglongjmp()
> >
> > I'll check that it is covered by the tests, but I think it should work
> > in this series that has no alt shadow stack. I have only done a high
> > level overview of how the shadow stack stuff, that doesn't involve the
> > kernel, works in glibc. Sounds like I'll need to do a deeper dive.
>
> The posix/xopen definition for setjmp/longjmp doesn't require such
> longjmp requests to work.
>
> Although they still have to do something that doesn't break badly.
> Aborting the process is probably fine!
>
> > > > I think the shadow stack pointer has to be in ucontext - which also
> > > > means the application can change it before returning from a signal.
> >
> > Yes we might need to change it to support alt shadow stacks. Can you
> > elaborate why you think it has to be in ucontext? I was thinking of
> > looking at three options for storing the ssp:
> >  - Stored in the shadow stack like a token using WRUSS from the kernel.
> >  - Stored on the kernel side using a hashmap that maps ucontext or
> >    sigframe userspace address to ssp (this is of course similar to
> >    storing in ucontext, except that the user can’t change the ssp).
> >  - Stored writable in userspace in ucontext.
> >
> > But in this version, without alt shadow stacks, the shadow stack
> > pointer is not stored in ucontext. This causes the limitation that
> > userspace can only call sigreturn when it has returned back to a point
> > where there is a restore token on the shadow stack (which was placed
> > there by the kernel). This doesn’t mean it can’t switch to a different
> > shadow stack or handle a nested signal, but it limits the possibility
> > for calling sigreturn with a totally different sigframe (like CRIU and
> > SROP attacks do). It should hopefully be a helpful, protective
> > limitation for most apps and I'm hoping CRIU can be fixed without
> > removing it.
> >
> > I am not aware of other limitations to signals (besides normal shadow
> > stack enforcement), but I could be missing it. And people's skepticism
> > is making me want to go back over it with more scrutiny.
> >
> > > > In much the same way as all the segment registers can be changed
> > > > leading to all the nasty bugs when the final 'return to user' code
> > > > traps in kernel when loading invalid segment registers or executing
> > > > iret.
> >
> > I don't think this is as difficult to avoid because userspace ssp has
> > its own register that should not be accessed at that point, but I have
> > not given this aspect enough analysis. Thanks for bringing it up.
>
> So the user ssp isn't saved (or restored) by the trap entry/exit.
> So it needs to be saved by the context switch code?
> Much like the user segment registers?
> So you are likely to get the same problems if restoring it can fault
> in kernel (eg for a non-canonical address).
>
> > > > Hmmm... do shadow stacks mean that longjmp() has to be a system
> > > > call?
> > >
> > > No.  setjmp/longjmp save and restore shadow stack pointer.
>
> Ok, I was thinking that direct access to the user ssp would be
> a privileged operation.

User space can only pop shadow stack.  longjmp does

#ifdef SHADOW_STACK_POINTER_OFFSET
# if IS_IN (libc) && defined SHARED && defined FEATURE_1_OFFSET
/* Check if Shadow Stack is enabled.  */
testl $X86_FEATURE_1_SHSTK, %fs:FEATURE_1_OFFSET
jz L(skip_ssp)
# else
xorl %eax, %eax
# endif
/* Check and adjust the Shadow-Stack-Pointer.  */
/* Get the current ssp.  */
rdsspq %rax
/* And compare it with the saved ssp value.  */
subq SHADOW_STACK_POINTER_OFFSET(%rdi), %rax
je L(skip_ssp)
/* Count the number of frames to adjust and adjust it
   with incssp instruction.  The instruction can adjust
   the ssp by [0..255] value only thus use a loop if
   the number of frames is bigger than 255.  */
negq %rax
shrq $3, %rax
/* NB: We saved Shadow-Stack-Pointer of setjmp.  Since we are
       restoring Shadow-Stack-Pointer of setjmp's caller, we
       need to unwind shadow stack by one more frame.  */
addq $1, %rax

movl $255, %ebx
L(loop):
cmpq %rbx, %rax
cmovb %rax, %rbx
incsspq %rbx
subq %rbx, %rax
ja L(loop)

L(skip_ssp):
#endif

> If it can be written you don't really have to worry about what code
> is trying to do - it can actually do what it likes.
> It just catches unintentional operations (like buffer overflows).
>
> Was there any 'spare' space in struct jmpbuf ?

By pure luck, we have ONE spare space in sigjmp_buf.

> Otherwise you can only enable shadow stacks if everything has been
> recompiled - including any shared libraries the might be dlopen()ed.
> (or does the compiler invent an alloca() call somehow for a
> size that comes back from glibc?)
>
> I've never really considered how setjmp/longjmp handle callee saved
> register variables (apart from it being hard).
> The original pdp11 implementation probably only needed to save r6 and r7.
>
> What does happen to all the 'extended state' that XSAVE handles?
> IIRC all the AVX registers are caller saved (so should probably
> be zerod), but some of the SSE ones are callee saved, and one or
> two of the fpu flags are sticky and annoying enough the save/restore
> at the best of times.
>
> > It sounds like it would help to write up in a lot more detail exactly
> > how all the signal and specialer stack manipulation scenarios work in
> > glibc.
>
> Some cross references might have made people notice that the ucontext
> extensions for AVX512 (if not earlier ones) broke the minimal/default
> signal stack size.
>
>         David
>

-- 
H.J.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Florian Weimer 3 years, 10 months ago
* David Laight:

> Was there any 'spare' space in struct jmpbuf ?

jmp_buf in glibc looks like this:

(gdb) ptype/o jmp_buf
type = struct __jmp_buf_tag {
/*      0      |      64 */    __jmp_buf __jmpbuf;
/*     64      |       4 */    int __mask_was_saved;
/* XXX  4-byte hole      */
/*     72      |     128 */    __sigset_t __saved_mask;

                               /* total size (bytes):  200 */
                             } [1]
(gdb) ptype/o __jmp_buf
type = long [8]

The glibc ABI reserves space for 1024 signals, something that Linux is
never going to implement.  We can use that space to store a few extra
registers in __save_mask.  There is a complication because the
pthread_cancel unwinding allocates only space for the __jmpbuf member.
Fortunately, we do not need to unwind the shadow stack for thread
cancellation, so we don't need that extra space in that case.

Thanks,
Florian

Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Edgecombe, Rick P 3 years, 10 months ago
On Sun, 2022-02-06 at 13:42 +0000, David Laight wrote:
> > I don't think this is as difficult to avoid because userspace ssp
> > has
> > its own register that should not be accessed at that point, but I
> > have
> > not given this aspect enough analysis. Thanks for bringing it up.
> 
> So the user ssp isn't saved (or restored) by the trap entry/exit.
> So it needs to be saved by the context switch code?
> Much like the user segment registers?
> So you are likely to get the same problems if restoring it can fault
> in kernel (eg for a non-canonical address).

PL3_SSP is lazily saved and restored by the FPU supervisor xsave code,
which has its buffer in kernel memory. For the most part it is
userspace instructions that use this register and they can only modify
it in limited ways.

It does look like IRET can cause a #CP if the PL3 SSP is not aligned,
but only after RIP and CPL are set back to userspace. I'm not confident
enough interpreting the specs to assert the specific behavior and will
follow up internally to clarify.

Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Mike Rapoport 3 years, 10 months ago
(added more CRIU people)

On Sun, Jan 30, 2022 at 01:18:03PM -0800, Rick Edgecombe wrote:
> Hi,
> 
> This is a slight reboot of the userspace CET series. I will be taking over the 
> series from Yu-cheng. Per some internal recommendations, I’ve reset the version
> number and am calling it a new series. Hopefully, it doesn’t cause confusion.
> 
> The new plan is to upstream only userspace Shadow Stack support at this point. 
> IBT can follow later, but for now I’ll focus solely on the most in-demand and
> widely available (with the feature on AMD CPUs now) part of CET.
> 
> I thought as part of this reset, it might be useful to more fully write-up the 
> design and summarize the history of the previous CET series. So this slightly
> long cover letter does that. The "Updates" section has the changes, if anyone
> doesn't want the history.
> 
> 
> Why is Shadow Stack Wanted
> ==========================
> The main use case for userspace shadow stack is providing protection against 
> return oriented programming attacks. Fedora and Ubuntu already have many/most 
> packages enabled for shadow stack. The main missing piece is Linux kernel 
> support and there seems to be a high amount of interest in the ecosystem for
> getting this feature supported. Besides security, Google has also done some
> work on using shadow stack to improve performance and reliability of tracing.
> 
> 
> Userspace Shadow Stack Implementation
> =====================================
> Shadow stack works by maintaining a secondary (shadow) stack that cannot be 
> directly modified by applications. When executing a CALL instruction, the 
> processor pushes the return address to both the normal stack and to the special 
> permissioned shadow stack. Upon ret, the processor pops the shadow stack copy 
> and compares it to the normal stack copy. If the two differ, the processor 
> raises a control protection fault. This implementation supports shadow stack on 
> 64 bit kernels only, with support for 32 bit only via IA32 emulation.
> 
> 	Shadow Stack Memory
> 	-------------------
> 	The majority of this series deals with changes for handling the special 
> 	shadow stack memory permissions. This memory is specified by the 
> 	Dirty+RO PTE bits. A tricky aspect of this is that this combination was 
> 	previously used to specify COW memory. So Linux needs to handle COW 
> 	differently when shadow stack is in use. The solution is to use a 
> 	software PTE bit to denote COW memory, and take care to clear the dirty
> 	bit when setting the memory RO.
> 
> 	Setup and Upkeep of HW Registers
> 	--------------------------------
> 	Using userspace CET requires a CR4 bit set, and also the manipulation 
> 	of two xsave managed MSRs. The kernel needs to modify these registers 
> 	during various operations like clone and signal handling. These 
> 	operations may happen when the registers are restored to the CPU, or 
> 	saved in an xsave buffer. Since the recent AMX triggered FPU overhaul 
> 	removed direct access to the xsave buffer, this series adds an 
> 	interface to operate on the supervisor xstate.
> 
> 	New ABIs
> 	--------
> 	This series introduces some new ABIs. The primary one is the shadow 
> 	stack itself. Since it is readable and the shadow stack pointer is 
> 	exposed to user space, applications can easily read and process the 
> 	shadow stack. And in fact the tracing usages plan to do exactly that.
> 
> 	Most of the shadow stack contents are written by HW, but some of the 
> 	entries are added by the kernel. The main place for this is signals. As 
> 	part of handling the signal the kernel does some manual adjustment of 
> 	the shadow stack that userspace depends on.
> 
> 	In addition to the contents of the shadow stack there is also user 
> 	visible behavior around when new shadow stacks are created and set in 
> 	the shadow stack pointer (SSP) register. This is relatively 
> 	straightforward – shadow stacks are created when new stacks are created 
> 	(thread creation, fork, etc). It is more or less what is required to 
> 	keep apps working.
> 
> 	For situations when userspace creates a new stack (i.e. makecontext(), 
> 	fibers, etc), a new syscall is provided for creating shadow stack 
> 	memory. To make the shadow stack usable, it needs to have a restore 
> 	token written to the protected memory. So the syscall provides a way to 
> 	specificity this should be done by the kernel.
> 
> 	When a shadow stack violation happens (when the return address of stack 
> 	not matching return address in shadow stack), a segfault is generated 
> 	with a new si_code specific to CET violations.
> 
> 	Lastly, a new arch_prctl interface is created for controlling the 
> 	enablement of CET-like features. It is intended to also be used for 
> 	LAM. It operates on the feature status per-thread, so for process wide 
> 	enabling it is intended to be used early in things like dynamic 
> 	linker/loaders. However, it can be used later for per-thread enablement 
> 	of features like WRSS.
> 
> 	WRSS
> 	----
> 	WRSS is an instruction that can write to shadow stacks. The HW provides 
> 	a way to enable this instruction for userspace use. Since shadow 
> 	stack’s are created initially protected, enabling WRSS allows any apps 
> 	that want to do unusual things with their stacks to have a way to 
> 	weaken protection and make things more flexible. A new feature bit is 
> 	defined to control enabling/disabling of WRSS.
> 
> 
> History
> =======
> The branding “CET” really consists of two features: “Shadow Stack” and 
> “Indirect Branch Tracking”. They both restrict previously allowed, but rarely 
> valid behaviors and require userspace to change to avoid these behaviors before 
> enabling the protection. These raw HW features need to be assembled into a 
> software solution across userspace and kernel in order to add security value.
> The kernel part of this solution has evolved iteratively starting with a lengthy
> RFC period. 
> 
> Until now, the enabling effort was trying to support both Shadow Stack and IBT. 
> This history will focus on a few areas of the shadow stack development history 
> that I thought stood out.
> 
> 	Signals
> 	-------
> 	Originally signals placed the location of the shadow stack restore 
> 	token inside the saved state on the stack. This was problematic from a 
> 	past ABI promises perspective. So the restore location was instead just 
> 	assumed from the shadow stack pointer. This works because in normal 
> 	allowed cases of calling sigreturn, the shadow stack pointer should be 
> 	right at the restore token at that time. There is no alternate shadow 
> 	stack support. If an alt shadow stack is added later we would need to 
> 	find a place to store the regular shadow stack token location. Options 
> 	could be to push something on the alt shadow stack, or to keep 
> 	something on the kernel side. So the current design keeps things simple 
> 	while slightly kicking the can down the road if alt shadow stacks 
> 	become a thing later. Siglongjmp is handled in glibc, using the incssp 
> 	instruction to unwind the shadow stack over the token.
> 
> 	Shadow Stack Allocation
> 	-----------------------
> 	makecontext() implementations need a way to create new shadow stacks 
> 	with restore token’s such that they can be pivoted to from userspace. 
> 	The first interface to do this was an arch_prctl(). It created a shadow 
> 	stack with a restore token pre-setup, since the kernel has an 
> 	instruction that can write to user shadow stacks. However, this 
> 	interface was abandoned for being strange.
> 
> 	The next version created PROT_SHADOW_STACK. This interface had two 
> 	problems. One, it left no options but for userspace to create writable 
> 	memory, write a restore token, then mproctect() it PROT_SHADOW_STACK. 
> 	The writable window left the shadow stack exposed, weakening the 
> 	security. Second, it caused problems with the guard pages. Since the 
> 	memory was initially created writable it did not have a guard page, but 
> 	then was mprotected later to a type of memory that should have one. 
> 	This resulted in missing guard pages and confused rb_subtree_gap’s.
> 
> 	This version introduces a new syscall that behaves similarly to the 
> 	initial arch_prctl() interface in that it has the kernel write the 
> 	restore token.
> 
> 	Enabling Interface
> 	------------------
> 	For the entire history of the original CET series, the design was to 
> 	enable shadow stack automatically if the feature bit was detected in 
> 	the elf header. Then it was userspace’s responsibility to turn it off 
> 	via an arch_prctl() if it was not desired, and this was handled by the 
> 	glibc dynamic loader. Glibc’s standard behavior (when CET if configured 
> 	is to leave shadow stack enabled if the executable and all linked 
> 	libraries are marked with shadow stacks.
> 
> 	Many distros (Fedora and others) have binaries already marked with 
> 	shadow stack, waiting for kernel support. Unfortunately their glibc 
> 	binaries expect the original arch_prctl() interface for allocating 
> 	shadow stacks, as those changes were pushed ahead of kernel support. 
> 	The net result of it all is, when updating to a kernel with shadow 
> 	stack these binaries would suddenly get shadow stack enabled and expect 
> 	the arch_prctl() interface to be there. And so calls to makecontext() 
> 	will fail, resulting in visible breakages. This series deals with this 
> 	problem as described below in "Updates".
> 
> 
> Updates
> =======
> These updates were mostly driven by public comments, but a lot of the design 
> elements are new. I would like some extra scrutiny on the updates.
> 
> 	New syscall for Shadow Stack Allocation
> 	---------------------------------------
> 	A new syscall is added for allocating shadow stacks to replace 
> 	PROT_SHADOW_STACK. Several options were considered, as described in the 
> 	“x86/cet/shstk: Introduce map_shadow_stack syscall”.
> 
> 	Xsave Managed Supervisor State Modifications
> 	--------------------------------------------
> 	The shadow stack feature requires the kernel to modify xsaves managed 
> 	state. On one of the last versions of Yu-cheng’s series Boris had 
> 	commented on the pattern it was using to do this not necessarily being 
> 	ideal. The pattern was to force a restore to the registers and always 
> 	do the modification there. Then Thomas did an overhaul of the fpu code, 
> 	part of which consisted of making raw access to the xsave buffer 
> 	private to the fpu code. So this series tries to expose access again, 
> 	and in a way that addresses Boris’ comments.
> 
> 	The method is to provide functions like wmsrl/rdmsrl, but that can 
> 	direct the operation to the correct location (registers or buffer), 
> 	while giving the proper notice to the fpu subsystem so things don’t get 
> 	clobbered or corrupted.
> 
> 	In the past a solution like this was discussed as part of the PASID 
> 	series, and Thomas was not in favor. In CET’s case there is a more 
> 	logic around the CET MSR’s than in PASID's, and wrapping this logic 
> 	minimizes near identical open coded logic needed to do this more 
> 	efficiently. In addition it resolves the above described problem of 
> 	having no access to the xsave buffer. So it is being put forward here 
> 	under the supposition that CET’s usage may lead to a different 
> 	conclusion, not to try to ignore past direction.
> 
> 	The user interrupt series has similar needs as CET, and will also use
> 	this internal interface if it’s found acceptable.
> 
> 	Support for WRSS
> 	----------------
> 	Andy Lutomirski had asked if we change the shadow stack allocation API 
> 	such that userspace cannot create arbitrary shadow stacks, then we look 
> 	at exposing an interface to enable the WRSS instruction for userspace. 
> 	This way app’s that want to do unexpected things with shadow stacks 
> 	would still have the option to create shadow stacks with arbitrary 
> 	data.
> 
> 	Switch Enabling Interface
> 	-------------------------
> 	As described above there is a problem with userspace binaries waiting 
> 	to break as soon as the kernel supports CET. This needs to be prevented 
> 	by changing the interface such that the old binaries will not enable 
> 	shadow stack AND behave as if shadow stack is not enabled. They should 
> 	run normally without shadow stack protection. Creating a new feature 
> 	(SHSTK2) for shadow stack was explored. SHSTK would never be supported 
> 	by the kernel, and all the userspace build tools would be updated to 
> 	target SHSTK2 instead of SHSTK. So old SHSTK binaries would be cleanly
> 	disabled.
> 
> 	But there are existing downsides to automatic elf header processing 
> 	based enabling. The elf header feature spec is not defined by the 
> 	kernel and there are proposals to expand it to describe additional 
> 	logic. A simpler interface where the kernel is simply told what to 
> 	enable, and leaves all the decision making to userspace, is more 
> 	flexible for userspace and simpler for the kernel. There also already 
> 	needs to be an ARCH_X86_FEATURE_ENABLE arch_prctl() for WRSS (and 
> 	likely LAM will use it too), so it avoids there being two ways to turn 
> 	on these types of features. The only tricky part for shadow stack, is 
> 	that it has to be enabled very early. Wherever the shadow stack is 
> 	enabled, the app cannot return from that point, otherwise there will be 
> 	a shadow stack violation. It turns out glibc can enable shadow stack 
> 	this early, so it works nicely. So not automatically enabling any 
> 	features in the elf header will cleanly disable all old binaries, which 
> 	expect the kernel to enable CET features automatically. Then after the 
> 	kernel changes are upstream, glibc can be updated to use the new
> 	interface. This is the solution implemented in this series.
> 
> 	Expand Commit Logs
> 	------------------
> 	As part of spinning up on this series, I found some of the commit logs 
> 	did not describe the changes in enough detail for me understand their 
> 	purpose. I tried to expand the logs and comments, where I had to go 
> 	digging. Hopefully it’s useful.
> 	
> 	Limit to only Intel Processors
> 	------------------------------
> 	Shadow stack is supported on some AMD processors, but this revision 
> 	(with expanded HW usage and xsaves changes) has only has been tested on 
> 	Intel ones. So this series has a patch to limit shadow stack support to 
> 	Intel processors. Ideally the patch would not even make it to mainline, 
> 	and should be dropped as soon as this testing is done. It's included 
> 	just in case.
> 
> 
> Future Work
> ===========
> Even though this is now exclusively a shadow stack series, there is still some 
> remaining shadow stack work to be done.
> 
> 	Ptrace
> 	------
> 	Early in the series, there was a patch to allow IA32_U_CET and
> 	IA32_PL3_SSP to be set. This patch was dropped and planned as a follow
> 	up to basic support, and it remains the plan. It will be needed for
> 	in-progress gdb support.
> 
> 	CRIU Support
> 	------------
> 	In the past there was some speculation on the mailing list about 
> 	whether CRIU would need to be taught about CET. It turns out, it does. 
> 	The first issue hit is that CRIU calls sigreturn directly from its 
> 	“parasite code” that it injects into the dumper process. This violates
> 	this shadow stack implementation’s protection that intends to prevent
> 	attackers from doing this.
> 
> 	With so many packages already enabled with shadow stack, there is 
> 	probably desire to make it work seamlessly. But in the meantime if 
> 	distros want to support shadow stack and CRIU, users could manually 
> 	disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for 
> 	a process they will wants to dump. It’s not ideal.
> 
> 	I’d like to hear what people think about having shadow stack in the 
> 	kernel without this resolved. Nothing would change for any users until 
> 	they enable shadow stack in the kernel and update to a glibc configured
> 	with CET. Should CRIU userspace be solved before kernel support?
> 
> 	Selftests
> 	---------
> 	There are some CET selftests being worked on and they are not included
> 	here.
> 
> Thanks,
> 
> Rick
> 
> Rick Edgecombe (7):
>   x86/mm: Prevent VM_WRITE shadow stacks
>   x86/fpu: Add helpers for modifying supervisor xstate
>   x86/fpu: Add unsafe xsave buffer helpers
>   x86/cet/shstk: Introduce map_shadow_stack syscall
>   selftests/x86: Add map_shadow_stack syscall test
>   x86/cet/shstk: Support wrss for userspace
>   x86/cpufeatures: Limit shadow stack to Intel CPUs
> 
> Yu-cheng Yu (28):
>   Documentation/x86: Add CET description
>   x86/cet/shstk: Add Kconfig option for Shadow Stack
>   x86/cpufeatures: Add CET CPU feature flags for Control-flow
>     Enforcement Technology (CET)
>   x86/cpufeatures: Introduce CPU setup and option parsing for CET
>   x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
>   x86/cet: Add control-protection fault handler
>   x86/mm: Remove _PAGE_DIRTY from kernel RO pages
>   x86/mm: Move pmd_write(), pud_write() up in the file
>   x86/mm: Introduce _PAGE_COW
>   drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS
>   x86/mm: Update pte_modify for _PAGE_COW
>   x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for
>     transition from _PAGE_DIRTY to _PAGE_COW
>   mm: Move VM_UFFD_MINOR_BIT from 37 to 38
>   mm: Introduce VM_SHADOW_STACK for shadow stack memory
>   x86/mm: Check Shadow Stack page fault errors
>   x86/mm: Update maybe_mkwrite() for shadow stack
>   mm: Fixup places that call pte_mkwrite() directly
>   mm: Add guard pages around a shadow stack.
>   mm/mmap: Add shadow stack pages to memory accounting
>   mm: Update can_follow_write_pte() for shadow stack
>   mm/mprotect: Exclude shadow stack from preserve_write
>   mm: Re-introduce vm_flags to do_mmap()
>   x86/cet/shstk: Add user-mode shadow stack support
>   x86/process: Change copy_thread() argument 'arg' to 'stack_size'
>   x86/cet/shstk: Handle thread shadow stack
>   x86/cet/shstk: Introduce shadow stack token setup/verify routines
>   x86/cet/shstk: Handle signals for shadow stack
>   x86/cet/shstk: Add arch_prctl elf feature functions
> 
>  .../admin-guide/kernel-parameters.txt         |   4 +
>  Documentation/filesystems/proc.rst            |   1 +
>  Documentation/x86/cet.rst                     | 145 ++++++
>  Documentation/x86/index.rst                   |   1 +
>  arch/arm/kernel/signal.c                      |   2 +-
>  arch/arm64/kernel/signal.c                    |   2 +-
>  arch/arm64/kernel/signal32.c                  |   2 +-
>  arch/sparc/kernel/signal32.c                  |   2 +-
>  arch/sparc/kernel/signal_64.c                 |   2 +-
>  arch/x86/Kconfig                              |  22 +
>  arch/x86/Kconfig.assembler                    |   5 +
>  arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
>  arch/x86/ia32/ia32_signal.c                   |  25 +-
>  arch/x86/include/asm/cet.h                    |  54 +++
>  arch/x86/include/asm/cpufeatures.h            |   1 +
>  arch/x86/include/asm/disabled-features.h      |   8 +-
>  arch/x86/include/asm/fpu/api.h                |   8 +
>  arch/x86/include/asm/fpu/types.h              |  23 +-
>  arch/x86/include/asm/fpu/xstate.h             |   6 +-
>  arch/x86/include/asm/idtentry.h               |   4 +
>  arch/x86/include/asm/mman.h                   |  24 +
>  arch/x86/include/asm/mmu_context.h            |   2 +
>  arch/x86/include/asm/msr-index.h              |  20 +
>  arch/x86/include/asm/page_types.h             |   7 +
>  arch/x86/include/asm/pgtable.h                | 302 ++++++++++--
>  arch/x86/include/asm/pgtable_types.h          |  48 +-
>  arch/x86/include/asm/processor.h              |   6 +
>  arch/x86/include/asm/special_insns.h          |  30 ++
>  arch/x86/include/asm/trap_pf.h                |   2 +
>  arch/x86/include/uapi/asm/mman.h              |   8 +-
>  arch/x86/include/uapi/asm/prctl.h             |  10 +
>  arch/x86/include/uapi/asm/processor-flags.h   |   2 +
>  arch/x86/kernel/Makefile                      |   1 +
>  arch/x86/kernel/cpu/common.c                  |  20 +
>  arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
>  arch/x86/kernel/elf_feature_prctl.c           |  72 +++
>  arch/x86/kernel/fpu/xstate.c                  | 167 ++++++-
>  arch/x86/kernel/idt.c                         |   4 +
>  arch/x86/kernel/process.c                     |  17 +-
>  arch/x86/kernel/process_64.c                  |   2 +
>  arch/x86/kernel/shstk.c                       | 446 ++++++++++++++++++
>  arch/x86/kernel/signal.c                      |  13 +
>  arch/x86/kernel/signal_compat.c               |   2 +-
>  arch/x86/kernel/traps.c                       |  62 +++
>  arch/x86/mm/fault.c                           |  19 +
>  arch/x86/mm/mmap.c                            |  48 ++
>  arch/x86/mm/pat/set_memory.c                  |   2 +-
>  arch/x86/mm/pgtable.c                         |  25 +
>  drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
>  fs/aio.c                                      |   2 +-
>  fs/proc/task_mmu.c                            |   3 +
>  include/linux/mm.h                            |  19 +-
>  include/linux/pgtable.h                       |   8 +
>  include/linux/syscalls.h                      |   1 +
>  include/uapi/asm-generic/siginfo.h            |   3 +-
>  include/uapi/asm-generic/unistd.h             |   2 +-
>  ipc/shm.c                                     |   2 +-
>  kernel/sys_ni.c                               |   1 +
>  mm/gup.c                                      |  16 +-
>  mm/huge_memory.c                              |  27 +-
>  mm/memory.c                                   |   5 +-
>  mm/migrate.c                                  |   3 +-
>  mm/mmap.c                                     |  15 +-
>  mm/mprotect.c                                 |   9 +-
>  mm/nommu.c                                    |   4 +-
>  mm/util.c                                     |   2 +-
>  tools/testing/selftests/x86/Makefile          |   9 +-
>  .../selftests/x86/test_map_shadow_stack.c     |  75 +++
>  69 files changed, 1797 insertions(+), 92 deletions(-)
>  create mode 100644 Documentation/x86/cet.rst
>  create mode 100644 arch/x86/include/asm/cet.h
>  create mode 100644 arch/x86/include/asm/mman.h
>  create mode 100644 arch/x86/kernel/elf_feature_prctl.c
>  create mode 100644 arch/x86/kernel/shstk.c
>  create mode 100644 tools/testing/selftests/x86/test_map_shadow_stack.c
> 
> 
> base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
> -- 
> 2.17.1

-- 
Sincerely yours,
Mike.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Adrian Reber 3 years, 10 months ago
On Sun, Feb 06, 2022 at 08:42:03PM +0200, Mike Rapoport wrote:
> (added more CRIU people)

Thanks, Mike.

> On Sun, Jan 30, 2022 at 01:18:03PM -0800, Rick Edgecombe wrote:
> > This is a slight reboot of the userspace CET series. I will be taking over the 
> > series from Yu-cheng. Per some internal recommendations, I’ve reset the version
> > number and am calling it a new series. Hopefully, it doesn’t cause confusion.
> > 
> > The new plan is to upstream only userspace Shadow Stack support at this point. 
> > IBT can follow later, but for now I’ll focus solely on the most in-demand and
> > widely available (with the feature on AMD CPUs now) part of CET.
> > 
> > I thought as part of this reset, it might be useful to more fully write-up the 
> > design and summarize the history of the previous CET series. So this slightly
> > long cover letter does that. The "Updates" section has the changes, if anyone
> > doesn't want the history.

[...]

> > 	CRIU Support
> > 	------------
> > 	In the past there was some speculation on the mailing list about 
> > 	whether CRIU would need to be taught about CET. It turns out, it does. 
> > 	The first issue hit is that CRIU calls sigreturn directly from its 
> > 	“parasite code” that it injects into the dumper process. This violates
> > 	this shadow stack implementation’s protection that intends to prevent
> > 	attackers from doing this.
> > 
> > 	With so many packages already enabled with shadow stack, there is 
> > 	probably desire to make it work seamlessly. But in the meantime if 
> > 	distros want to support shadow stack and CRIU, users could manually 
> > 	disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for 
> > 	a process they will wants to dump. It’s not ideal.
> > 
> > 	I’d like to hear what people think about having shadow stack in the 
> > 	kernel without this resolved. Nothing would change for any users until 
> > 	they enable shadow stack in the kernel and update to a glibc configured
> > 	with CET. Should CRIU userspace be solved before kernel support?

From the CRIU side I can say that I would definitely like to see this
resolved. CRIU just went through a similar exercise with rseq() being
enabled in glibc and CI broke all around for us and other projects
relying on CRIU. Although rseq() was around for a long time we were not
aware of it but luckily 5.13 introduced a way to handle it for CRIU with
ptrace. An environment variable existed but did not really help when
CRIU is called somewhere in the middle of the container software stack.

From my point of view a solution not involving an environment variable
would definitely be preferred.

		Adrian
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Dave Hansen 3 years, 10 months ago
On 2/6/22 23:20, Adrian Reber wrote:
>>> 	CRIU Support
>>> 	------------
>>> 	In the past there was some speculation on the mailing list about 
>>> 	whether CRIU would need to be taught about CET. It turns out, it does. 
>>> 	The first issue hit is that CRIU calls sigreturn directly from its 
>>> 	“parasite code” that it injects into the dumper process. This violates
>>> 	this shadow stack implementation’s protection that intends to prevent
>>> 	attackers from doing this.
...
>>From the CRIU side I can say that I would definitely like to see this
> resolved. CRIU just went through a similar exercise with rseq() being
> enabled in glibc and CI broke all around for us and other projects
> relying on CRIU. Although rseq() was around for a long time we were not
> aware of it but luckily 5.13 introduced a way to handle it for CRIU with
> ptrace. An environment variable existed but did not really help when
> CRIU is called somewhere in the middle of the container software stack.
> 
>>From my point of view a solution not involving an environment variable
> would definitely be preferred.

Have there been things like this for CRIU in the past?  Something where
CRIU needs control but that's also security-sensitive?

Any thoughts on how you would _like_ to see this resolved?
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Mike Rapoport 3 years, 10 months ago
On Mon, Feb 07, 2022 at 08:30:50AM -0800, Dave Hansen wrote:
> On 2/6/22 23:20, Adrian Reber wrote:
> >>> 	CRIU Support
> >>> 	------------
> >>> 	In the past there was some speculation on the mailing list about 
> >>> 	whether CRIU would need to be taught about CET. It turns out, it does. 
> >>> 	The first issue hit is that CRIU calls sigreturn directly from its 
> >>> 	“parasite code” that it injects into the dumper process. This violates
> >>> 	this shadow stack implementation’s protection that intends to prevent
> >>> 	attackers from doing this.
> ...
> >>From the CRIU side I can say that I would definitely like to see this
> > resolved. CRIU just went through a similar exercise with rseq() being
> > enabled in glibc and CI broke all around for us and other projects
> > relying on CRIU. Although rseq() was around for a long time we were not
> > aware of it but luckily 5.13 introduced a way to handle it for CRIU with
> > ptrace. An environment variable existed but did not really help when
> > CRIU is called somewhere in the middle of the container software stack.
> > 
> >>From my point of view a solution not involving an environment variable
> > would definitely be preferred.
> 
> Have there been things like this for CRIU in the past?  Something where
> CRIU needs control but that's also security-sensitive?

Generally CRIU requires (almost) root privileges to work, but I don't think
it handles something as security sensitive and restrictive as shadow stacks. 
 
> Any thoughts on how you would _like_ to see this resolved?

Ideally, CRIU will need a knob that will tell the kernel/CET machinery
where the next RET will jump, along the lines of
restore_signal_shadow_stack() AFAIU.

But such a knob will immediately reduce the security value of the entire
thing, and I don't have good ideas how to deal with it :(

-- 
Sincerely yours,
Mike.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Cyrill Gorcunov 3 years, 10 months ago
On Tue, Feb 08, 2022 at 11:16:51AM +0200, Mike Rapoport wrote:
>  
> > Any thoughts on how you would _like_ to see this resolved?
> 
> Ideally, CRIU will need a knob that will tell the kernel/CET machinery
> where the next RET will jump, along the lines of
> restore_signal_shadow_stack() AFAIU.
> 
> But such a knob will immediately reduce the security value of the entire
> thing, and I don't have good ideas how to deal with it :(

Probably a kind of latch in the task_struct which would trigger off once
returt to a different address happened, thus we would be able to jump inside
paratite code. Of course such trigger should be available under proper
capability only.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Andy Lutomirski 3 years, 10 months ago

On Tue, Feb 8, 2022, at 1:29 AM, Cyrill Gorcunov wrote:
> On Tue, Feb 08, 2022 at 11:16:51AM +0200, Mike Rapoport wrote:
>>  
>> > Any thoughts on how you would _like_ to see this resolved?
>> 
>> Ideally, CRIU will need a knob that will tell the kernel/CET machinery
>> where the next RET will jump, along the lines of
>> restore_signal_shadow_stack() AFAIU.
>> 
>> But such a knob will immediately reduce the security value of the entire
>> thing, and I don't have good ideas how to deal with it :(
>
> Probably a kind of latch in the task_struct which would trigger off once
> returt to a different address happened, thus we would be able to jump inside
> paratite code. Of course such trigger should be available under proper
> capability only.

I'm not fully in touch with how parasite, etc works.  Are we talking about save or restore?  If it's restore, what exactly does CRIU need to do?  Is it just that CRIU needs to return out from its resume code into the to-be-resumed program without tripping CET?  Would it be acceptable for CRIU to require that at least one shstk slot be free at save time?  Or do we need a mechanism to atomically switch to a completely full shadow stack at resume?

Off the top of my head, a sigreturn (or sigreturn-like mechanism) that is intended for use for altshadowstack could safely verify a token on the altshdowstack, possibly compare to something in ucontext (or not -- this isn't clearly necessary) and switch back to the previous stack.  CRIU could use that too.  Obviously CRIU will need a way to populate the relevant stacks, but WRUSS can be used for that, and I think this is a fundamental requirement for CRIU -- CRIU restore absolutely needs a way to write the saved shadow stack data into the shadow stack.

So I think the only special capability that CRIU really needs is WRUSS, and we need to wire that up anyway.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Cyrill Gorcunov 3 years, 10 months ago
On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
> >> But such a knob will immediately reduce the security value of the entire
> >> thing, and I don't have good ideas how to deal with it :(
> >
> > Probably a kind of latch in the task_struct which would trigger off once
> > returt to a different address happened, thus we would be able to jump inside
> > paratite code. Of course such trigger should be available under proper
> > capability only.
> 
> I'm not fully in touch with how parasite, etc works.  Are we talking about save or restore?

We use parasite code in question during checkpoint phase as far as I remember.
push addr/lret trick is used to run "injected" code (code injection itself is
done via ptrace) in compat mode at least. Dima, Andrei, I didn't look into this code
for years already, do we still need to support compat mode at all?

> If it's restore, what exactly does CRIU need to do?  Is it just that CRIU needs to return
> out from its resume code into the to-be-resumed program without tripping CET?  Would it
> be acceptable for CRIU to require that at least one shstk slot be free at save time?
> Or do we need a mechanism to atomically switch to a completely full shadow stack at resume?
> 
> Off the top of my head, a sigreturn (or sigreturn-like mechanism) that is intended for
> use for altshadowstack could safely verify a token on the altshdowstack, possibly
> compare to something in ucontext (or not -- this isn't clearly necessary) and switch
> back to the previous stack.  CRIU could use that too.  Obviously CRIU will need a way
> to populate the relevant stacks, but WRUSS can be used for that, and I think this
> is a fundamental requirement for CRIU -- CRIU restore absolutely needs a way to write
> the saved shadow stack data into the shadow stack.
> 
> So I think the only special capability that CRIU really needs is WRUSS, and
> we need to wire that up anyway.

Thanks for these notes, Andy! I can't provide any sane answer here since didn't
read tech spec for this feature yet :-)
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Dmitry Safonov 3 years, 10 months ago
[un-Cc'ed a lot of people, as the question is highly off-topic, so I
don't feel like the answer is of big interest to them, keeping x86
maintainer in]

On 2/8/22 17:02, Cyrill Gorcunov wrote:
> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
>>>> But such a knob will immediately reduce the security value of the entire
>>>> thing, and I don't have good ideas how to deal with it :(
>>>
>>> Probably a kind of latch in the task_struct which would trigger off once
>>> returt to a different address happened, thus we would be able to jump inside
>>> paratite code. Of course such trigger should be available under proper
>>> capability only.
>>
>> I'm not fully in touch with how parasite, etc works.  Are we talking about save or restore?
> 
> We use parasite code in question during checkpoint phase as far as I remember.
> push addr/lret trick is used to run "injected" code (code injection itself is
> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look into this code
> for years already, do we still need to support compat mode at all?

Cyrill, I haven't been working on/with Virtuozzo people last 5 years, so
I don't know. As you're more connected to Vz, your question seems to
imply that ia32 C/R is no longer needed by Vz customers. If it's not
needed anymore - I'm all for stopping testing of it in CRIU.

The only thing I ask before you go and remove that is to ping the person
who paid some substantial amount of money on bugsbounty to get ia32
support in CRIU. Albeit, in the end I didn't get a cent out of it (VZ
managers insisted on receiving all of the money), I still feel
responsible to that person as the amount he paid was the biggest bounty
at that moment and I was the person, who presented him ia32 C/R as
working and being tested.
If you need his contacts - ping me, I'll search and find it.

Other than that - if no one needs ia32 C/R, let's do go ahead and drop
testing (and maybe some complicated code) of it.

Thanks,
          Dmitry
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Cyrill Gorcunov 3 years, 10 months ago
On Tue, Feb 08, 2022 at 09:54:14PM +0000, Dmitry Safonov wrote:
> [un-Cc'ed a lot of people, as the question is highly off-topic, so I
> don't feel like the answer is of big interest to them, keeping x86
> maintainer in]
> 
> On 2/8/22 17:02, Cyrill Gorcunov wrote:
> >>> Probably a kind of latch in the task_struct which would trigger off once
> >>> returt to a different address happened, thus we would be able to jump inside
> >>> paratite code. Of course such trigger should be available under proper
> >>> capability only.
> >>
> >> I'm not fully in touch with how parasite, etc works.  Are we talking about save or restore?
> > 
> > We use parasite code in question during checkpoint phase as far as I remember.
> > push addr/lret trick is used to run "injected" code (code injection itself is
> > done via ptrace) in compat mode at least. Dima, Andrei, I didn't look into this code
> > for years already, do we still need to support compat mode at all?
> 
> Cyrill, I haven't been working on/with Virtuozzo people last 5 years, so
> I don't know. As you're more connected to Vz, your question seems to
> imply that ia32 C/R is no longer needed by Vz customers. If it's not
> needed anymore - I'm all for stopping testing of it in CRIU.

Nope. I didn't see any sign that Vz is intended to drop ia32 suport. But
Vz's criu instance is following vanilla's one, that is why I asked you
and Andrew about ia32 support. This ia32 code snippet with stack
manipulation simply popped out in my mind immediately when Andy asked
how we deal with stack.

Also we adjust stack in restorer code but I need some time to recall
all thses details since as I said I didn't work with criu code for
years already.

	Cyrill
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Edgecombe, Rick P 3 years, 10 months ago
On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
> > > > But such a knob will immediately reduce the security value of
> > > > the entire
> > > > thing, and I don't have good ideas how to deal with it :(
> > > 
> > > Probably a kind of latch in the task_struct which would trigger
> > > off once
> > > returt to a different address happened, thus we would be able to
> > > jump inside
> > > paratite code. Of course such trigger should be available under
> > > proper
> > > capability only.
> > 
> > I'm not fully in touch with how parasite, etc works.  Are we
> > talking about save or restore?
> 
> We use parasite code in question during checkpoint phase as far as I
> remember.
> push addr/lret trick is used to run "injected" code (code injection
> itself is
> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look
> into this code
> for years already, do we still need to support compat mode at all?
> 
> > If it's restore, what exactly does CRIU need to do?  Is it just
> > that CRIU needs to return
> > out from its resume code into the to-be-resumed program without
> > tripping CET?  Would it
> > be acceptable for CRIU to require that at least one shstk slot be
> > free at save time?
> > Or do we need a mechanism to atomically switch to a completely full
> > shadow stack at resume?
> > 
> > Off the top of my head, a sigreturn (or sigreturn-like mechanism)
> > that is intended for
> > use for altshadowstack could safely verify a token on the
> > altshdowstack, possibly
> > compare to something in ucontext (or not -- this isn't clearly
> > necessary) and switch
> > back to the previous stack.  CRIU could use that too.  Obviously
> > CRIU will need a way
> > to populate the relevant stacks, but WRUSS can be used for that,
> > and I think this
> > is a fundamental requirement for CRIU -- CRIU restore absolutely
> > needs a way to write
> > the saved shadow stack data into the shadow stack.

Still wrapping my head around the CRIU save and restore steps, but
another general approach might be to give ptrace the ability to
temporarily pause/resume/set CET enablement and SSP for a stopped
thread. Then injected code doesn't need to jump through any hoops or
possibly run into road blocks. I'm not sure how much this opens things
up if the thread has to be stopped...

Cyrill, could it fit into the CRIU pause and resume flow? What action
causes the final resuming of execution of the restored process for
checkpointing and for restore? Wondering if we could somehow make CET
re-enable exactly then.

And I guess this also needs a way to create shadow stack allocations at
a specific address to match where they were in the dumped process. That
is missing in this series.


> > 
> > So I think the only special capability that CRIU really needs is
> > WRUSS, and
> > we need to wire that up anyway.
> 
> Thanks for these notes, Andy! I can't provide any sane answer here
> since didn't
> read tech spec for this feature yet :-)




Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Cyrill Gorcunov 3 years, 10 months ago
On Wed, Feb 09, 2022 at 02:18:42AM +0000, Edgecombe, Rick P wrote:
...
> 
> Still wrapping my head around the CRIU save and restore steps, but
> another general approach might be to give ptrace the ability to
> temporarily pause/resume/set CET enablement and SSP for a stopped
> thread. Then injected code doesn't need to jump through any hoops or
> possibly run into road blocks. I'm not sure how much this opens things
> up if the thread has to be stopped...
> 
> Cyrill, could it fit into the CRIU pause and resume flow? What action
> causes the final resuming of execution of the restored process for
> checkpointing and for restore? Wondering if we could somehow make CET
> re-enable exactly then.
> 
> And I guess this also needs a way to create shadow stack allocations at
> a specific address to match where they were in the dumped process. That
> is missing in this series.

Thanks Rick! This sounds like an option. I need a couple of days to refresh
my memory about criu internals. Let me CC a few current active criu developers
(CC list is already big enough though:), maybe this will speedup the procedure.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Mike Rapoport 3 years, 10 months ago
Hi Rick,

On Wed, Feb 09, 2022 at 02:18:42AM +0000, Edgecombe, Rick P wrote:
> On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
> > On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
> > > > > But such a knob will immediately reduce the security value of
> > > > > the entire
> > > > > thing, and I don't have good ideas how to deal with it :(
> > > > 
> > > > Probably a kind of latch in the task_struct which would trigger
> > > > off once
> > > > returt to a different address happened, thus we would be able to
> > > > jump inside
> > > > paratite code. Of course such trigger should be available under
> > > > proper
> > > > capability only.
> > > 
> > > I'm not fully in touch with how parasite, etc works.  Are we
> > > talking about save or restore?
> > 
> > We use parasite code in question during checkpoint phase as far as I
> > remember.
> > push addr/lret trick is used to run "injected" code (code injection
> > itself is
> > done via ptrace) in compat mode at least. Dima, Andrei, I didn't look
> > into this code
> > for years already, do we still need to support compat mode at all?
> > 
> > > If it's restore, what exactly does CRIU need to do?  Is it just
> > > that CRIU needs to return
> > > out from its resume code into the to-be-resumed program without
> > > tripping CET?  Would it
> > > be acceptable for CRIU to require that at least one shstk slot be
> > > free at save time?
> > > Or do we need a mechanism to atomically switch to a completely full
> > > shadow stack at resume?
> > > 
> > > Off the top of my head, a sigreturn (or sigreturn-like mechanism)
> > > that is intended for
> > > use for altshadowstack could safely verify a token on the
> > > altshdowstack, possibly
> > > compare to something in ucontext (or not -- this isn't clearly
> > > necessary) and switch
> > > back to the previous stack.  CRIU could use that too.  Obviously
> > > CRIU will need a way
> > > to populate the relevant stacks, but WRUSS can be used for that,
> > > and I think this
> > > is a fundamental requirement for CRIU -- CRIU restore absolutely
> > > needs a way to write
> > > the saved shadow stack data into the shadow stack.
> 
> Still wrapping my head around the CRIU save and restore steps, but
> another general approach might be to give ptrace the ability to
> temporarily pause/resume/set CET enablement and SSP for a stopped
> thread. Then injected code doesn't need to jump through any hoops or
> possibly run into road blocks. I'm not sure how much this opens things
> up if the thread has to be stopped...
 
IIRC, criu dump does something like this:
* Stop the process being dumped (victim) with ptrace
* Inject parasite code and data into the victim, again with ptrace.
  Among other things the parasite data contains a sigreturn frame with
  saved victim state.
* Resume the victim process, which will run parasite code now.
* When parasite finishes it uses that frame to sigreturn to normal victim
  execution

So, my feeling is that for dump side WRUSS should be enough.

> Cyrill, could it fit into the CRIU pause and resume flow? What action
> causes the final resuming of execution of the restored process for
> checkpointing and for restore? Wondering if we could somehow make CET
> re-enable exactly then.
> 
> And I guess this also needs a way to create shadow stack allocations at
> a specific address to match where they were in the dumped process. That
> is missing in this series.

Yes, criu restore will need to recreate shadow stack mappings. Currently,
we recreate the restored process (target) address space based on
/proc/pid/maps and /proc/pid/smaps. CRIU preserves the virtual addresses
and VMA flags. The relevant steps of restore process can be summarised as:
* Clone() the target process tree
* Recreate VMAs with the needed size and flags, but not necessarily at the
  correct place yet
* Partially populate memory data from the saved images
* Move VMAs to their exact addresses
* Complete restoring the data
* Create a frame for sigreturn and jump to the target.

Here, the stack used after sigreturn contains the data that was captured
during dump and it entirely different from what shadow stack will contain.

There are several points when the target threads are stopped, so
pausing/resuming CET may help.
 
> > > So I think the only special capability that CRIU really needs is
> > > WRUSS, and
> > > we need to wire that up anyway.
> > 
> > Thanks for these notes, Andy! I can't provide any sane answer here
> > since didn't
> > read tech spec for this feature yet :-)
> 
> 
> 
> 

-- 
Sincerely yours,
Mike.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Andy Lutomirski 3 years, 10 months ago
On 2/8/22 18:18, Edgecombe, Rick P wrote:
> On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
>> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
>>>>> But such a knob will immediately reduce the security value of
>>>>> the entire
>>>>> thing, and I don't have good ideas how to deal with it :(
>>>>
>>>> Probably a kind of latch in the task_struct which would trigger
>>>> off once
>>>> returt to a different address happened, thus we would be able to
>>>> jump inside
>>>> paratite code. Of course such trigger should be available under
>>>> proper
>>>> capability only.
>>>
>>> I'm not fully in touch with how parasite, etc works.  Are we
>>> talking about save or restore?
>>
>> We use parasite code in question during checkpoint phase as far as I
>> remember.
>> push addr/lret trick is used to run "injected" code (code injection
>> itself is
>> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look
>> into this code
>> for years already, do we still need to support compat mode at all?
>>
>>> If it's restore, what exactly does CRIU need to do?  Is it just
>>> that CRIU needs to return
>>> out from its resume code into the to-be-resumed program without
>>> tripping CET?  Would it
>>> be acceptable for CRIU to require that at least one shstk slot be
>>> free at save time?
>>> Or do we need a mechanism to atomically switch to a completely full
>>> shadow stack at resume?
>>>
>>> Off the top of my head, a sigreturn (or sigreturn-like mechanism)
>>> that is intended for
>>> use for altshadowstack could safely verify a token on the
>>> altshdowstack, possibly
>>> compare to something in ucontext (or not -- this isn't clearly
>>> necessary) and switch
>>> back to the previous stack.  CRIU could use that too.  Obviously
>>> CRIU will need a way
>>> to populate the relevant stacks, but WRUSS can be used for that,
>>> and I think this
>>> is a fundamental requirement for CRIU -- CRIU restore absolutely
>>> needs a way to write
>>> the saved shadow stack data into the shadow stack.
> 
> Still wrapping my head around the CRIU save and restore steps, but
> another general approach might be to give ptrace the ability to
> temporarily pause/resume/set CET enablement and SSP for a stopped
> thread. Then injected code doesn't need to jump through any hoops or
> possibly run into road blocks. I'm not sure how much this opens things
> up if the thread has to be stopped...

Hmm, that's maybe not insane.

An alternative would be to add a bona fide ptrace call-a-function 
mechanism.  I can think of two potentially usable variants:

1. Straight call.  PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, 
shadow stack push and all.

2. Signal-style.  PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal 
frame just like a real signal is being delivered with the specified 
handler.  There could be a variant to opt-in to also using a specified 
altstack and altshadowstack.

2 would be more expensive but would avoid the need for much in the way 
of asm magic.  The injected code could be plain C (or Rust or Zig or 
whatever).

All of this only really handles save, not restore.  I don't understand 
restore enough to fully understand the issue.

--Andy
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by H.J. Lu 3 years, 10 months ago
On Wed, Feb 9, 2022 at 6:37 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On 2/8/22 18:18, Edgecombe, Rick P wrote:
> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
> >> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
> >>>>> But such a knob will immediately reduce the security value of
> >>>>> the entire
> >>>>> thing, and I don't have good ideas how to deal with it :(
> >>>>
> >>>> Probably a kind of latch in the task_struct which would trigger
> >>>> off once
> >>>> returt to a different address happened, thus we would be able to
> >>>> jump inside
> >>>> paratite code. Of course such trigger should be available under
> >>>> proper
> >>>> capability only.
> >>>
> >>> I'm not fully in touch with how parasite, etc works.  Are we
> >>> talking about save or restore?
> >>
> >> We use parasite code in question during checkpoint phase as far as I
> >> remember.
> >> push addr/lret trick is used to run "injected" code (code injection
> >> itself is
> >> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look
> >> into this code
> >> for years already, do we still need to support compat mode at all?
> >>
> >>> If it's restore, what exactly does CRIU need to do?  Is it just
> >>> that CRIU needs to return
> >>> out from its resume code into the to-be-resumed program without
> >>> tripping CET?  Would it
> >>> be acceptable for CRIU to require that at least one shstk slot be
> >>> free at save time?
> >>> Or do we need a mechanism to atomically switch to a completely full
> >>> shadow stack at resume?
> >>>
> >>> Off the top of my head, a sigreturn (or sigreturn-like mechanism)
> >>> that is intended for
> >>> use for altshadowstack could safely verify a token on the
> >>> altshdowstack, possibly
> >>> compare to something in ucontext (or not -- this isn't clearly
> >>> necessary) and switch
> >>> back to the previous stack.  CRIU could use that too.  Obviously
> >>> CRIU will need a way
> >>> to populate the relevant stacks, but WRUSS can be used for that,
> >>> and I think this
> >>> is a fundamental requirement for CRIU -- CRIU restore absolutely
> >>> needs a way to write
> >>> the saved shadow stack data into the shadow stack.
> >
> > Still wrapping my head around the CRIU save and restore steps, but
> > another general approach might be to give ptrace the ability to
> > temporarily pause/resume/set CET enablement and SSP for a stopped
> > thread. Then injected code doesn't need to jump through any hoops or
> > possibly run into road blocks. I'm not sure how much this opens things
> > up if the thread has to be stopped...
>
> Hmm, that's maybe not insane.
>
> An alternative would be to add a bona fide ptrace call-a-function
> mechanism.  I can think of two potentially usable variants:
>
> 1. Straight call.  PTRACE_CALL_FUNCTION(addr) just emulates CALL addr,
> shadow stack push and all.
>
> 2. Signal-style.  PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal
> frame just like a real signal is being delivered with the specified
> handler.  There could be a variant to opt-in to also using a specified
> altstack and altshadowstack.
>
> 2 would be more expensive but would avoid the need for much in the way
> of asm magic.  The injected code could be plain C (or Rust or Zig or
> whatever).
>
> All of this only really handles save, not restore.  I don't understand
> restore enough to fully understand the issue.

FWIW, CET enabled GDB can call a function in a CET enabled process.
Adding Felix who may know more about it.


-- 
H.J.
RE: [PATCH 00/35] Shadow stacks for userspace
Posted by Willgerodt, Felix 3 years, 10 months ago
> -----Original Message-----
> From: H.J. Lu <hjl.tools@gmail.com>
> Sent: Donnerstag, 10. Februar 2022 03:54
> To: Lutomirski, Andy <luto@kernel.org>; Willgerodt, Felix
> <felix.willgerodt@intel.com>
> Cc: Edgecombe, Rick P <rick.p.edgecombe@intel.com>; gorcunov@gmail.com;
> bsingharora@gmail.com; hpa@zytor.com; Syromiatnikov, Eugene
> <esyr@redhat.com>; peterz@infradead.org; rdunlap@infradead.org;
> keescook@chromium.org; 0x7f454c46@gmail.com;
> dave.hansen@linux.intel.com; kirill.shutemov@linux.intel.com; Eranian,
> Stephane <eranian@google.com>; linux-mm@kvack.org; adrian@lisas.de;
> fweimer@redhat.com; nadav.amit@gmail.com; jannh@google.com;
> avagin@gmail.com; linux-arch@vger.kernel.org; kcc@google.com;
> bp@alien8.de; oleg@redhat.com; pavel@ucw.cz; linux-doc@vger.kernel.org;
> arnd@arndb.de; Moreira, Joao <joao.moreira@intel.com>; tglx@linutronix.de;
> mike.kravetz@oracle.com; x86@kernel.org; Yang, Weijiang
> <weijiang.yang@intel.com>; rppt@kernel.org; Dave.Martin@arm.com;
> john.allen@amd.com; mingo@redhat.com; Hansen, Dave
> <dave.hansen@intel.com>; corbet@lwn.net; linux-kernel@vger.kernel.org;
> linux-api@vger.kernel.org; Shankar, Ravi V <ravi.v.shankar@intel.com>
> Subject: Re: [PATCH 00/35] Shadow stacks for userspace
> 
> On Wed, Feb 9, 2022 at 6:37 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On 2/8/22 18:18, Edgecombe, Rick P wrote:
> > > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
> > >> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
> > >>>>> But such a knob will immediately reduce the security value of
> > >>>>> the entire
> > >>>>> thing, and I don't have good ideas how to deal with it :(
> > >>>>
> > >>>> Probably a kind of latch in the task_struct which would trigger
> > >>>> off once
> > >>>> returt to a different address happened, thus we would be able to
> > >>>> jump inside
> > >>>> paratite code. Of course such trigger should be available under
> > >>>> proper
> > >>>> capability only.
> > >>>
> > >>> I'm not fully in touch with how parasite, etc works.  Are we
> > >>> talking about save or restore?
> > >>
> > >> We use parasite code in question during checkpoint phase as far as I
> > >> remember.
> > >> push addr/lret trick is used to run "injected" code (code injection
> > >> itself is
> > >> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look
> > >> into this code
> > >> for years already, do we still need to support compat mode at all?
> > >>
> > >>> If it's restore, what exactly does CRIU need to do?  Is it just
> > >>> that CRIU needs to return
> > >>> out from its resume code into the to-be-resumed program without
> > >>> tripping CET?  Would it
> > >>> be acceptable for CRIU to require that at least one shstk slot be
> > >>> free at save time?
> > >>> Or do we need a mechanism to atomically switch to a completely full
> > >>> shadow stack at resume?
> > >>>
> > >>> Off the top of my head, a sigreturn (or sigreturn-like mechanism)
> > >>> that is intended for
> > >>> use for altshadowstack could safely verify a token on the
> > >>> altshdowstack, possibly
> > >>> compare to something in ucontext (or not -- this isn't clearly
> > >>> necessary) and switch
> > >>> back to the previous stack.  CRIU could use that too.  Obviously
> > >>> CRIU will need a way
> > >>> to populate the relevant stacks, but WRUSS can be used for that,
> > >>> and I think this
> > >>> is a fundamental requirement for CRIU -- CRIU restore absolutely
> > >>> needs a way to write
> > >>> the saved shadow stack data into the shadow stack.
> > >
> > > Still wrapping my head around the CRIU save and restore steps, but
> > > another general approach might be to give ptrace the ability to
> > > temporarily pause/resume/set CET enablement and SSP for a stopped
> > > thread. Then injected code doesn't need to jump through any hoops or
> > > possibly run into road blocks. I'm not sure how much this opens things
> > > up if the thread has to be stopped...
> >
> > Hmm, that's maybe not insane.
> >
> > An alternative would be to add a bona fide ptrace call-a-function
> > mechanism.  I can think of two potentially usable variants:
> >
> > 1. Straight call.  PTRACE_CALL_FUNCTION(addr) just emulates CALL addr,
> > shadow stack push and all.
> >
> > 2. Signal-style.  PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal
> > frame just like a real signal is being delivered with the specified
> > handler.  There could be a variant to opt-in to also using a specified
> > altstack and altshadowstack.
> >
> > 2 would be more expensive but would avoid the need for much in the way
> > of asm magic.  The injected code could be plain C (or Rust or Zig or
> > whatever).
> >
> > All of this only really handles save, not restore.  I don't understand
> > restore enough to fully understand the issue.
> 
> FWIW, CET enabled GDB can call a function in a CET enabled process.
> Adding Felix who may know more about it.
> 
> 
> --
> H.J.

I don't know much about CRIU or kernel code, so I will stick to explaining
what our GDB patches for CET (not upstream yet) currently do.

GDB does inferior calls by setting the PC to the function it wants to call
and by manipulating the return address. It basically creates a dummy
frame and runs that on top of where it currently is.

To enable this for CET, our GDB CET patches push onto the shstk of the
inferior by writing to the inferiors memory, if it isn't out of range,
and by incrementing the SSP (using NT_X86_CET), both via ptrace.

(GDB also has a command called 'return', which basically returns early from
a function. Here GDB just decrements the SSP via ptrace.)

This was based on earlier versions of the kernel patches.
If the interface needs to change or if new interfaces would be available to
do this better, that is fine from our pov. We didn't upstream this yet.

Also, if you have any concerns with this approach please let me know.

Regards,
Felix

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de>
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by avagin@gmail.com 3 years, 10 months ago
On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote:
> On 2/8/22 18:18, Edgecombe, Rick P wrote:
> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
> > > On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote:
> > > > > > But such a knob will immediately reduce the security value of
> > > > > > the entire
> > > > > > thing, and I don't have good ideas how to deal with it :(
> > > > > 
> > > > > Probably a kind of latch in the task_struct which would trigger
> > > > > off once
> > > > > returt to a different address happened, thus we would be able to
> > > > > jump inside
> > > > > paratite code. Of course such trigger should be available under
> > > > > proper
> > > > > capability only.
> > > > 
> > > > I'm not fully in touch with how parasite, etc works.  Are we
> > > > talking about save or restore?
> > > 
> > > We use parasite code in question during checkpoint phase as far as I
> > > remember.
> > > push addr/lret trick is used to run "injected" code (code injection
> > > itself is
> > > done via ptrace) in compat mode at least. Dima, Andrei, I didn't look
> > > into this code
> > > for years already, do we still need to support compat mode at all?
> > > 
> > > > If it's restore, what exactly does CRIU need to do?  Is it just
> > > > that CRIU needs to return
> > > > out from its resume code into the to-be-resumed program without
> > > > tripping CET?  Would it
> > > > be acceptable for CRIU to require that at least one shstk slot be
> > > > free at save time?
> > > > Or do we need a mechanism to atomically switch to a completely full
> > > > shadow stack at resume?
> > > > 
> > > > Off the top of my head, a sigreturn (or sigreturn-like mechanism)
> > > > that is intended for
> > > > use for altshadowstack could safely verify a token on the
> > > > altshdowstack, possibly
> > > > compare to something in ucontext (or not -- this isn't clearly
> > > > necessary) and switch
> > > > back to the previous stack.  CRIU could use that too.  Obviously
> > > > CRIU will need a way
> > > > to populate the relevant stacks, but WRUSS can be used for that,
> > > > and I think this
> > > > is a fundamental requirement for CRIU -- CRIU restore absolutely
> > > > needs a way to write
> > > > the saved shadow stack data into the shadow stack.
> > 
> > Still wrapping my head around the CRIU save and restore steps, but
> > another general approach might be to give ptrace the ability to
> > temporarily pause/resume/set CET enablement and SSP for a stopped
> > thread. Then injected code doesn't need to jump through any hoops or
> > possibly run into road blocks. I'm not sure how much this opens things
> > up if the thread has to be stopped...
> 
> Hmm, that's maybe not insane.
> 
> An alternative would be to add a bona fide ptrace call-a-function mechanism.
> I can think of two potentially usable variants:
> 
> 1. Straight call.  PTRACE_CALL_FUNCTION(addr) just emulates CALL addr,
> shadow stack push and all.
> 
> 2. Signal-style.  PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal
> frame just like a real signal is being delivered with the specified handler.
> There could be a variant to opt-in to also using a specified altstack and
> altshadowstack.

I think this would be ideal. In CRIU, the parasite code is executed in
the "daemon" mode and returns back via sigreturn.  Right now, CRIU needs
to generate a signal frame. If I understand your idea right, the signal
frame will be generated by the kernel.

> 
> 2 would be more expensive but would avoid the need for much in the way of
> asm magic.  The injected code could be plain C (or Rust or Zig or whatever).
> 
> All of this only really handles save, not restore.  I don't understand
> restore enough to fully understand the issue.

In a few words, it works like this: CRIU restores all required resources
and prepares a signal frame with a target process state, then it
switches to a small PIE blob, where it restores vma-s and calls
rt_sigreturn.

> 
> --Andy
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Mike Rapoport 3 years, 10 months ago
On Thu, Feb 10, 2022 at 11:41:16PM -0800, avagin@gmail.com wrote:
> On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote:
> >
> > An alternative would be to add a bona fide ptrace call-a-function mechanism.
> > I can think of two potentially usable variants:
> > 
> > 1. Straight call.  PTRACE_CALL_FUNCTION(addr) just emulates CALL addr,
> > shadow stack push and all.
> > 
> > 2. Signal-style.  PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal
> > frame just like a real signal is being delivered with the specified handler.
> > There could be a variant to opt-in to also using a specified altstack and
> > altshadowstack.
> 
> I think this would be ideal. In CRIU, the parasite code is executed in
> the "daemon" mode and returns back via sigreturn.  Right now, CRIU needs
> to generate a signal frame. If I understand your idea right, the signal
> frame will be generated by the kernel.
> 
> > 
> > 2 would be more expensive but would avoid the need for much in the way of
> > asm magic.  The injected code could be plain C (or Rust or Zig or whatever).
> > 
> > All of this only really handles save, not restore.  I don't understand
> > restore enough to fully understand the issue.
> 
> In a few words, it works like this: CRIU restores all required resources
> and prepares a signal frame with a target process state, then it
> switches to a small PIE blob, where it restores vma-s and calls
> rt_sigreturn.

I think it's also important to note that the stack is restored as a part of
the process memory, i.e. its contents is read from the images.
 
> > 
> > --Andy

-- 
Sincerely yours,
Mike.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Mike Rapoport 3 years, 9 months ago
On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote:
> On 2/8/22 18:18, Edgecombe, Rick P wrote:
> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
> > 
> > Still wrapping my head around the CRIU save and restore steps, but
> > another general approach might be to give ptrace the ability to
> > temporarily pause/resume/set CET enablement and SSP for a stopped
> > thread. Then injected code doesn't need to jump through any hoops or
> > possibly run into road blocks. I'm not sure how much this opens things
> > up if the thread has to be stopped...
> 
> Hmm, that's maybe not insane.
> 
> An alternative would be to add a bona fide ptrace call-a-function mechanism.
> I can think of two potentially usable variants:
> 
> 1. Straight call.  PTRACE_CALL_FUNCTION(addr) just emulates CALL addr,
> shadow stack push and all.
> 
> 2. Signal-style.  PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal
> frame just like a real signal is being delivered with the specified handler.
> There could be a variant to opt-in to also using a specified altstack and
> altshadowstack.

Using ptrace() will not solve CRIU's issue with sigreturn because sigreturn
is called from the victim context rather than from the criu process that
controls the dump and uses ptrace().

Even with the current shadow stack interface Rick proposed, CRIU can restore
the victim using ptrace without any additional knobs, but we loose an
important ability to "self-cure" the victim from the parasite in case
anything goes wrong with criu control process.

Moreover, the issue with backward compatibility is not with ptrace but with
sigreturn and it seems that criu is not its only user.

So I think we need a way to allow direct calls to sigreturn that will
bypass check and restore of the shadow stack.

I only know that there are sigreturn users except criu that show up in
Debian codesearch, and I don't know how do they use it, but for full
backward compatibility we'd need to have no-CET sigreturn as default and
add a new, say UC_CHECK_SHSTK flag to rt_sigframe->uc.uc_flags or even a
new syscall for libc signal handling.
 
> 2 would be more expensive but would avoid the need for much in the way of
> asm magic.  The injected code could be plain C (or Rust or Zig or whatever).
> 
> All of this only really handles save, not restore.  I don't understand
> restore enough to fully understand the issue.

Restore is more complex, will get to it later.
 
> --Andy

-- 
Sincerely yours,
Mike.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Andy Lutomirski 3 years, 9 months ago

On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote:
> On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote:
>> On 2/8/22 18:18, Edgecombe, Rick P wrote:
>> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
>> > 
>> > Still wrapping my head around the CRIU save and restore steps, but
>> > another general approach might be to give ptrace the ability to
>> > temporarily pause/resume/set CET enablement and SSP for a stopped
>> > thread. Then injected code doesn't need to jump through any hoops or
>> > possibly run into road blocks. I'm not sure how much this opens things
>> > up if the thread has to be stopped...
>> 
>> Hmm, that's maybe not insane.
>> 
>> An alternative would be to add a bona fide ptrace call-a-function mechanism.
>> I can think of two potentially usable variants:
>> 
>> 1. Straight call.  PTRACE_CALL_FUNCTION(addr) just emulates CALL addr,
>> shadow stack push and all.
>> 
>> 2. Signal-style.  PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal
>> frame just like a real signal is being delivered with the specified handler.
>> There could be a variant to opt-in to also using a specified altstack and
>> altshadowstack.
>
> Using ptrace() will not solve CRIU's issue with sigreturn because sigreturn
> is called from the victim context rather than from the criu process that
> controls the dump and uses ptrace().

I'm not sure I follow.

>
> Even with the current shadow stack interface Rick proposed, CRIU can restore
> the victim using ptrace without any additional knobs, but we loose an
> important ability to "self-cure" the victim from the parasite in case
> anything goes wrong with criu control process.
>
> Moreover, the issue with backward compatibility is not with ptrace but with
> sigreturn and it seems that criu is not its only user.

So we need an ability for a tracer to cause the tracee to call a function and to return successfully.  Apparently a gdb branch can already do this with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the trick.  I don't see why we need a sigretur-but-dont-verify -- we just need this mechanism to create a frame such that sigreturn actually works.

--Andy
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Mike Rapoport 3 years, 9 months ago
On Mon, Feb 28, 2022 at 12:30:41PM -0800, Andy Lutomirski wrote:
> 
> 
> On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote:
> > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote:
> >> On 2/8/22 18:18, Edgecombe, Rick P wrote:
> >> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
> >> > 
> >
> > Even with the current shadow stack interface Rick proposed, CRIU can restore
> > the victim using ptrace without any additional knobs, but we loose an
> > important ability to "self-cure" the victim from the parasite in case
> > anything goes wrong with criu control process.
> >
> > Moreover, the issue with backward compatibility is not with ptrace but with
> > sigreturn and it seems that criu is not its only user.
> 
> So we need an ability for a tracer to cause the tracee to call a function
> and to return successfully.  Apparently a gdb branch can already do this
> with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the
> trick.  I don't see why we need a sigretur-but-dont-verify -- we just
> need this mechanism to create a frame such that sigreturn actually works.

If I understand correctly, PTRACE_CALL_FUNCTION_SIGFRAME() injects a frame
into the tracee and makes the tracee call sigreturn.
I.e. the tracee is stopped and this is used pretty much as PTRACE_CONT or
PTRACE_SYSCALL.

In such case this defeats the purpose of sigreturn in CRIU because it is
called asynchronously by the tracee when the tracer is about to detach or
even already detached.

For synchronous use-case PTRACE_SETREGSET will be enough, the rest of the
sigframe can be restored by other means.

And with 'criu restore' there may be even no tracer by the time sigreturn
is called.

> --Andy

-- 
Sincerely yours,
Mike.
Re: [PATCH 00/35] Shadow stacks for userspace
Posted by Andy Lutomirski 3 years, 9 months ago

On Mon, Feb 28, 2022, at 1:30 PM, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 12:30:41PM -0800, Andy Lutomirski wrote:
>> 
>> 
>> On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote:
>> > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote:
>> >> On 2/8/22 18:18, Edgecombe, Rick P wrote:
>> >> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote:
>> >> > 
>> >
>> > Even with the current shadow stack interface Rick proposed, CRIU can restore
>> > the victim using ptrace without any additional knobs, but we loose an
>> > important ability to "self-cure" the victim from the parasite in case
>> > anything goes wrong with criu control process.
>> >
>> > Moreover, the issue with backward compatibility is not with ptrace but with
>> > sigreturn and it seems that criu is not its only user.
>> 
>> So we need an ability for a tracer to cause the tracee to call a function
>> and to return successfully.  Apparently a gdb branch can already do this
>> with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the
>> trick.  I don't see why we need a sigretur-but-dont-verify -- we just
>> need this mechanism to create a frame such that sigreturn actually works.
>
> If I understand correctly, PTRACE_CALL_FUNCTION_SIGFRAME() injects a frame
> into the tracee and makes the tracee call sigreturn.
> I.e. the tracee is stopped and this is used pretty much as PTRACE_CONT or
> PTRACE_SYSCALL.
>
> In such case this defeats the purpose of sigreturn in CRIU because it is
> called asynchronously by the tracee when the tracer is about to detach or
> even already detached.

The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal frame onto the stack and call a function.  That function should then be able to call sigreturn just like any normal signal handler.  There should be no requirement that the tracer still be attached when this happens, although the code calling sigreturn still needs to be mapped.

(Specifically, on modern arches, the user runtime is expected to provide a "restorer" that calls sigreturn.  A hypotheticall PTRACE_CALL_FUNCTION_SIGFRAME would either need to call sigreturn directly or provide a restorer.)