Documentation/admin-guide/kdump/kdump.rst | 32 ++ arch/x86/kernel/crash.c | 26 +- arch/x86/kernel/kexec-bzimage64.c | 11 + arch/x86/kernel/machine_kexec_64.c | 22 ++ include/linux/crash_core.h | 7 +- include/linux/crash_dump.h | 2 + include/linux/kexec.h | 34 ++ kernel/Kconfig.kexec | 10 + kernel/Makefile | 1 + kernel/crash_dump_dm_crypt.c | 459 ++++++++++++++++++++++ kernel/kexec_file.c | 3 + 11 files changed, 604 insertions(+), 3 deletions(-) create mode 100644 kernel/crash_dump_dm_crypt.c
LUKS is the standard for Linux disk encryption, widely adopted by users,
and in some cases, such as Confidential VMs, it is a requirement. With
kdump enabled, when the first kernel crashes, the system can boot into
the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore)
to a specified target. However, there are two challenges when dumping
vmcore to a LUKS-encrypted device:
- Kdump kernel may not be able to decrypt the LUKS partition. For some
machines, a system administrator may not have a chance to enter the
password to decrypt the device in kdump initramfs after the 1st kernel
crashes; For cloud confidential VMs, depending on the policy the
kdump kernel may not be able to unseal the keys with TPM and the
console virtual keyboard is untrusted.
- LUKS2 by default use the memory-hard Argon2 key derivation function
which is quite memory-consuming compared to the limited memory reserved
for kdump. Take Fedora example, by default, only 256M is reserved for
systems having memory between 4G-64G. With LUKS enabled, ~1300M needs
to be reserved for kdump. Note if the memory reserved for kdump can't
be used by 1st kernel i.e. an user sees ~1300M memory missing in the
1st kernel.
Besides users (at least for Fedora) usually expect kdump to work out of
the box i.e. no manual password input or custom crashkernel value is
needed. And it doesn't make sense to derivate the keys again in kdump
kernel which seems to be redundant work.
This patch set addresses the above issues by making the LUKS volume keys
persistent for kdump kernel with the help of cryptsetup's new APIs
(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of
the kdump copies of LUKS volume keys,
1. After the 1st kernel loads the initramfs during boot, systemd
use an user-input passphrase to de-crypt the LUKS volume keys
or TPM-sealed key and then save the volume keys to specified keyring
(using the --link-vk-to-keyring API) and the key will expire within
specified time.
2. A user space tool (kdump initramfs loader like kdump-utils) create
key items inside /sys/kernel/config/crash_dm_crypt_keys to inform
the 1st kernel which keys are needed.
3. When the kdump initramfs is loaded by the kexec_file_load
syscall, the 1st kernel will iterate created key items, save the
keys to kdump reserved memory.
4. When the 1st kernel crashes and the kdump initramfs is booted, the
kdump initramfs asks the kdump kernel to create a user key using the
key stored in kdump reserved memory by writing yes to
/sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted
device is unlocked with libcryptsetup's --volume-key-keyring API.
5. The system gets rebooted to the 1st kernel after dumping vmcore to
the LUKS encrypted device is finished
After libcryptsetup saving the LUKS volume keys to specified keyring,
whoever takes this should be responsible for the safety of these copies
of keys. The keys will be saved in the memory area exclusively reserved
for kdump where even the 1st kernel has no direct access. And further
more, two additional protections are added,
- save the copy randomly in kdump reserved memory as suggested by Jan
- clear the _PAGE_PRESENT flag of the page that stores the copy as
suggested by Pingfan
This patch set only supports x86. There will be patches to support other
architectures once this patch set gets merged.
v8
- improve documentation [Randy]
- rebase onto 6.14.0-rc1
v7
- Baoquan
- differentiate between failing to get dm crypt keys and no dm crypt keys
- add code comments, change function name and etc. to improve code readability
- add documentation for configfs API [Dave]
- fix building error found by kernel test robot
v6
- Baoquan
- support AMD SEV
- drop uncessary keys_header_size
- improve commit message of [PATCH 4/7]
- Greg
- switch to configfs
- move ifdef from .c to .h files and rework kexec_random_start
- use tab instead of space for appended code comment
- Process key description in a more flexible way to address problems
found by Ondrej
- improve cover letter
- fix an compilation error as found by kernel test robot
v5
- Baoquan
- limit the feature of placing kexec_buf randomly to kdump (CONFIG_CRASH_DUMP)
- add documentation for added sysfs API
- allow to re-send init command to support the case of user switching to
a different LUKS-encrypted target
- make CONFIG_CRASH_DM_CRYPT depends on CONFIG_DM_CRYPT
- check if the number of keys exceed KEY_NUM_MAX
- rename (struct keys_header).key_count as (struct keys_header).total_keys
to improve code readability
- improve commit message
- fix the failure of calling crash_exclude_mem_range (there is a split
of mem_range)
- use ret instead of r as return code
- Greg
- add documentation for added sysfs API
- avoid spamming kernel logs
- fix a buffer overflow issue
- keep the state enums synced up with the string values
- use sysfs_emit other than sprintf
- explain KEY_NUM_MAX and KEY_SIZE_MAX
- s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/g
- improve code readability
- Rebase onto latest Linus tree
v4
- rebase onto latest Linus tree so Baoquan can apply the patches for
code review
- fix kernel test robot warnings
v3
- Support CPU/memory hot-plugging [Baoquan]
- Don't save the keys temporarily to simplify the implementation [Baoquan]
- Support multiple LUKS encrypted volumes
- Read logon key instead of user key to improve security [Ondrej]
- A kernel config option CRASH_DM_CRYPT for this feature (disabled by default)
- Fix warnings found by kernel test robot
- Rebase the code onto 6.9.0-rc5+
v2
- work together with libscryptsetup's --link-vk-to-keyring/--volume-key-keyring APIs [Milan and Ondrej]
- add the case where console virtual keyboard is untrusted for confidential VM
- use dm_crypt_key instead of LUKS volume key [Milan and Eric]
- fix some code format issues
- don't move "struct kexec_segment" declaration
- Rebase the code onto latest Linus tree (6.7.0)
v1
- "Put the luks key handling related to crash_dump out into a separate
file kernel/crash_dump_luks.c" [Baoquan]
- Put the generic luks handling code before the x86 specific code to
make it easier for other arches to follow suit [Baoquan]
- Use phys_to_virt instead of "pfn -> page -> vaddr" [Dave Hansen]
- Drop the RFC prefix [Dave Young]
- Rebase the code onto latest Linus tree (6.4.0-rc4)
RFC v2
- libcryptsetup interacts with the kernel via sysfs instead of "hacking"
dm-crypt
- to save a kdump copy of the LUKS volume key in 1st kernel
- to add a logon key using the copy for libcryptsetup in kdump kernel [Milan]
- to avoid the incorrect usage of LUKS master key in dm-crypt [Milan]
- save the kdump copy of LUKS volume key randomly [Jan]
- mark the kdump copy inaccessible [Pingfan]
- Miscellaneous
- explain when operations related to the LUKS volume key happen [Jan]
- s/master key/volume key/g
- use crash_ instead of kexec_ as function prefix
- fix commit subject prefixes e.g. "x86, kdump" to x86/crash
Coiby Xu (7):
kexec_file: allow to place kexec_buf randomly
crash_dump: make dm crypt keys persist for the kdump kernel
crash_dump: store dm crypt keys in kdump reserved memory
crash_dump: reuse saved dm crypt keys for CPU/memory hot-plugging
crash_dump: retrieve dm crypt keys in kdump kernel
x86/crash: pass dm crypt keys to kdump kernel
x86/crash: make the page that stores the dm crypt keys inaccessible
Documentation/admin-guide/kdump/kdump.rst | 32 ++
arch/x86/kernel/crash.c | 26 +-
arch/x86/kernel/kexec-bzimage64.c | 11 +
arch/x86/kernel/machine_kexec_64.c | 22 ++
include/linux/crash_core.h | 7 +-
include/linux/crash_dump.h | 2 +
include/linux/kexec.h | 34 ++
kernel/Kconfig.kexec | 10 +
kernel/Makefile | 1 +
kernel/crash_dump_dm_crypt.c | 459 ++++++++++++++++++++++
kernel/kexec_file.c | 3 +
11 files changed, 604 insertions(+), 3 deletions(-)
create mode 100644 kernel/crash_dump_dm_crypt.c
base-commit: bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b
--
2.48.1
On Fri, Feb 07, 2025 at 04:08:08PM +0800, Coiby Xu wrote: >LUKS is the standard for Linux disk encryption, widely adopted by users, >and in some cases, such as Confidential VMs, it is a requirement. With >kdump enabled, when the first kernel crashes, the system can boot into >the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) >to a specified target. However, there are two challenges when dumping >vmcore to a LUKS-encrypted device: > > - Kdump kernel may not be able to decrypt the LUKS partition. For some > machines, a system administrator may not have a chance to enter the > password to decrypt the device in kdump initramfs after the 1st kernel > crashes; For cloud confidential VMs, depending on the policy the > kdump kernel may not be able to unseal the keys with TPM and the > console virtual keyboard is untrusted. > > - LUKS2 by default use the memory-hard Argon2 key derivation function > which is quite memory-consuming compared to the limited memory reserved > for kdump. Take Fedora example, by default, only 256M is reserved for > systems having memory between 4G-64G. With LUKS enabled, ~1300M needs > to be reserved for kdump. Note if the memory reserved for kdump can't > be used by 1st kernel i.e. an user sees ~1300M memory missing in the > 1st kernel. > >Besides users (at least for Fedora) usually expect kdump to work out of >the box i.e. no manual password input or custom crashkernel value is >needed. And it doesn't make sense to derivate the keys again in kdump >kernel which seems to be redundant work. > >This patch set addresses the above issues by making the LUKS volume keys >persistent for kdump kernel with the help of cryptsetup's new APIs >(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of >the kdump copies of LUKS volume keys, > > 1. After the 1st kernel loads the initramfs during boot, systemd > use an user-input passphrase to de-crypt the LUKS volume keys > or TPM-sealed key and then save the volume keys to specified keyring > (using the --link-vk-to-keyring API) and the key will expire within > specified time. > > 2. A user space tool (kdump initramfs loader like kdump-utils) create > key items inside /sys/kernel/config/crash_dm_crypt_keys to inform > the 1st kernel which keys are needed. > > 3. When the kdump initramfs is loaded by the kexec_file_load > syscall, the 1st kernel will iterate created key items, save the > keys to kdump reserved memory. > > 4. When the 1st kernel crashes and the kdump initramfs is booted, the > kdump initramfs asks the kdump kernel to create a user key using the > key stored in kdump reserved memory by writing yes to > /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted > device is unlocked with libcryptsetup's --volume-key-keyring API. > > 5. The system gets rebooted to the 1st kernel after dumping vmcore to > the LUKS encrypted device is finished > >After libcryptsetup saving the LUKS volume keys to specified keyring, >whoever takes this should be responsible for the safety of these copies >of keys. The keys will be saved in the memory area exclusively reserved >for kdump where even the 1st kernel has no direct access. And further >more, two additional protections are added, > - save the copy randomly in kdump reserved memory as suggested by Jan > - clear the _PAGE_PRESENT flag of the page that stores the copy as > suggested by Pingfan > >This patch set only supports x86. There will be patches to support other >architectures once this patch set gets merged. > I'm not sure what's the problem here but I can reliably trigger a kernel panic on a qemu VM + custom kernel (compiled from bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b + your patches). When I configure the crash configfs and call kexec in a systemd service using ExecStart=, the panic occurs when I start the service: ~ # cat /etc/systemd/system/my-kexec.service [Unit] Description=kexec loading for the crash capture kernel [Service] Type=oneshot ExecStart=/usr/bin/mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey ExecStart=/usr/bin/echo cryptsetup:mykey >/sys/kernel/config/crash_dm_crypt_keys/mykey/description ExecStart=/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd KeyringMode=shared [Install] WantedBy=default.target Starting the service: ~ # systemctl start my-kexec.service kexec_file: kernel: 00000000ace85dcc kernel_size: 0x16e3000 crash_core: Crash PT_LOAD ELF header. phdr=00000000d08940fa vaddr=0xffff888000100000, paddr=0x100000, sz=0x700000 e_phnum=11 p_offset=0x100000 crash_core: Crash PT_LOAD ELF header. phdr=00000000304ef570 vaddr=0xffff888000808000, paddr=0x808000, sz=0x3000 e_phnum=12 p_offset=0x808000 crash_core: Crash PT_LOAD ELF header. phdr=000000000275e248 vaddr=0xffff88800080c000, paddr=0x80c000, sz=0x5000 e_phnum=13 p_offset=0x80c000 crash_core: Crash PT_LOAD ELF header. phdr=000000004e47ca09 vaddr=0xffff888000900000, paddr=0x900000, sz=0xa5700000 e_phnum=14 p_offset=0x900000 crash_core: Crash PT_LOAD ELF header. phdr=00000000e56c8350 vaddr=0xffff8880b6000000, paddr=0xb6000000, sz=0x7d51018 e_phnum=15 p_offset=0xb6000000 crash_core: Crash PT_LOAD ELF header. phdr=0000000099d67ff3 vaddr=0xffff8880bdd51018, paddr=0xbdd51018, sz=0x27440 e_phnum=16 p_offset=0xbdd51018 crash_core: Crash PT_LOAD ELF header. phdr=00000000461a2f21 vaddr=0xffff8880bdd78458, paddr=0xbdd78458, sz=0xbc0 e_phnum=17 p_offset=0xbdd78458 crash_core: Crash PT_LOAD ELF header. phdr=0000000058149b54 vaddr=0xffff8880bdd79018, paddr=0xbdd79018, sz=0x9a40 e_phnum=18 p_offset=0xbdd79018 crash_core: Crash PT_LOAD ELF header. phdr=000000001e30ff2c vaddr=0xffff8880bdd82a58, paddr=0xbdd82a58, sz=0xdbc5a8 e_phnum=19 p_offset=0xbdd82a58 crash_core: Crash PT_LOAD ELF header. phdr=00000000e67a9768 vaddr=0xffff8880bec00000, paddr=0xbec00000, sz=0xaed000 e_phnum=20 p_offset=0xbec00000 crash_core: Crash PT_LOAD ELF header. phdr=000000005909c4c6 vaddr=0xffff8880bf9ff000, paddr=0xbf9ff000, sz=0x453000 e_phnum=21 p_offset=0xbf9ff000 crash_core: Crash PT_LOAD ELF header. phdr=00000000473d74ef vaddr=0xffff8880bfe58000, paddr=0xbfe58000, sz=0x64000 e_phnum=22 p_offset=0xbfe58000 crash_core: Crash PT_LOAD ELF header. phdr=00000000abde8123 vaddr=0xffff888100000000, paddr=0x100000000, sz=0x23f000000 e_phnum=23 p_offset=0x100000000 crash_core: Crash PT_LOAD ELF header. phdr=00000000bda3e0bf vaddr=0xffff88843f000000, paddr=0x43f000000, sz=0x1000000 e_phnum=24 p_offset=0x43f000000 kexec: Loaded ELF headers at 0x33f000000 bufsz=0x1000 memsz=0xe1000 BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 5 UID: 0 PID: 3812 Comm: kexec Not tainted 6.14.0-rc1+ #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 2025.02-6 04/08/2025 RIP: 0010:sized_strscpy+0x71/0x150 Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb 11 48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e 49 89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080 RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808 RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080 R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000 R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000 FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x60 ? page_fault_oops+0x177/0x510 ? _prb_read_valid+0x2e7/0x370 ? exc_page_fault+0x6f/0x130 ? asm_exc_page_fault+0x26/0x30 ? sized_strscpy+0x71/0x150 crash_load_dm_crypt_keys+0x1bc/0x370 bzImage64_load+0x41b/0xa30 __do_sys_kexec_file_load+0x2af/0x8a0 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f09ea848d6d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 70 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007fff8cf979e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000140 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f09ea848d6d RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000003 RBP: 0000000000000003 R08: 000000000000000a R09: 00007fff8cf97a10 R10: 000055de70eee9a0 R11: 0000000000000206 R12: 0000000000000003 R13: 00007fff8cf97d08 R14: 000055de4c336448 R15: 0000000000000004 </TASK> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:sized_strscpy+0x71/0x150 Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb 11 48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e 49 89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080 RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808 RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080 R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000 R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000 FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Calling a script that does the same thing works fine and loads the keys correctly: [Service] ExecStart=/root/kexec.sh ~ # cat /root/kexec.sh #!/bin/bash mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey echo cryptsetup:mykey > /sys/kernel/config/crash_dm_crypt_keys/mykey/description /usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd If that's any help, my crypttab: ~ # cat /etc/crypttab root UUID=8001fca4-2e54-48e9-9235-031c19fc6e36 none luks,link-volume-key=@u::%logon:cryptsetup:mykey If you can't reproduce, I can help track this. Just let me know if you need any help.
On Thu, Apr 24, 2025 at 02:08:55AM +0200, Arnaud Lefebvre wrote: >On Fri, Feb 07, 2025 at 04:08:08PM +0800, Coiby Xu wrote: >>LUKS is the standard for Linux disk encryption, widely adopted by users, >>and in some cases, such as Confidential VMs, it is a requirement. With >>kdump enabled, when the first kernel crashes, the system can boot into >>the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) >>to a specified target. However, there are two challenges when dumping >>vmcore to a LUKS-encrypted device: >> >>- Kdump kernel may not be able to decrypt the LUKS partition. For some >> machines, a system administrator may not have a chance to enter the >> password to decrypt the device in kdump initramfs after the 1st kernel >> crashes; For cloud confidential VMs, depending on the policy the >> kdump kernel may not be able to unseal the keys with TPM and the >> console virtual keyboard is untrusted. >> >>- LUKS2 by default use the memory-hard Argon2 key derivation function >> which is quite memory-consuming compared to the limited memory reserved >> for kdump. Take Fedora example, by default, only 256M is reserved for >> systems having memory between 4G-64G. With LUKS enabled, ~1300M needs >> to be reserved for kdump. Note if the memory reserved for kdump can't >> be used by 1st kernel i.e. an user sees ~1300M memory missing in the >> 1st kernel. >> >>Besides users (at least for Fedora) usually expect kdump to work out of >>the box i.e. no manual password input or custom crashkernel value is >>needed. And it doesn't make sense to derivate the keys again in kdump >>kernel which seems to be redundant work. >> >>This patch set addresses the above issues by making the LUKS volume keys >>persistent for kdump kernel with the help of cryptsetup's new APIs >>(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of >>the kdump copies of LUKS volume keys, >> >>1. After the 1st kernel loads the initramfs during boot, systemd >> use an user-input passphrase to de-crypt the LUKS volume keys >> or TPM-sealed key and then save the volume keys to specified keyring >> (using the --link-vk-to-keyring API) and the key will expire within >> specified time. >> >>2. A user space tool (kdump initramfs loader like kdump-utils) create >> key items inside /sys/kernel/config/crash_dm_crypt_keys to inform >> the 1st kernel which keys are needed. >> >>3. When the kdump initramfs is loaded by the kexec_file_load >> syscall, the 1st kernel will iterate created key items, save the >> keys to kdump reserved memory. >> >>4. When the 1st kernel crashes and the kdump initramfs is booted, the >> kdump initramfs asks the kdump kernel to create a user key using the >> key stored in kdump reserved memory by writing yes to >> /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted >> device is unlocked with libcryptsetup's --volume-key-keyring API. >> >>5. The system gets rebooted to the 1st kernel after dumping vmcore to >> the LUKS encrypted device is finished >> >>After libcryptsetup saving the LUKS volume keys to specified keyring, >>whoever takes this should be responsible for the safety of these copies >>of keys. The keys will be saved in the memory area exclusively reserved >>for kdump where even the 1st kernel has no direct access. And further >>more, two additional protections are added, >>- save the copy randomly in kdump reserved memory as suggested by Jan >>- clear the _PAGE_PRESENT flag of the page that stores the copy as >> suggested by Pingfan >> >>This patch set only supports x86. There will be patches to support other >>architectures once this patch set gets merged. >> > >I'm not sure what's the problem here but I can reliably trigger a kernel >panic on a qemu VM + custom kernel (compiled from >bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b + your patches). Hi Arnaud, Thanks for testing the patches, finding this issue and also sharing the details to reproduce it! > >When I configure the crash configfs and call kexec in a systemd service >using ExecStart=, the panic occurs when I start the service: > >~ # cat /etc/systemd/system/my-kexec.service >[Unit] >Description=kexec loading for the crash capture kernel > >[Service] >Type=oneshot >ExecStart=/usr/bin/mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey >ExecStart=/usr/bin/echo cryptsetup:mykey >/sys/kernel/config/crash_dm_crypt_keys/mykey/description >ExecStart=/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd >KeyringMode=shared Can you try putting the above commands into a script e.g. /usr/local/bin/my-kexec.sh and then using ExecStart=/usr/local/bin/my-kexec.sh so I can be more sure that I've reproduced your issue? > >[Install] >WantedBy=default.target > > >Starting the service: > >~ # systemctl start my-kexec.service >kexec_file: kernel: 00000000ace85dcc kernel_size: 0x16e3000 >crash_core: Crash PT_LOAD ELF header. phdr=00000000d08940fa >vaddr=0xffff888000100000, paddr=0x100000, sz=0x700000 e_phnum=11 >p_offset=0x100000 >crash_core: Crash PT_LOAD ELF header. phdr=00000000304ef570 >vaddr=0xffff888000808000, paddr=0x808000, sz=0x3000 e_phnum=12 >p_offset=0x808000 >crash_core: Crash PT_LOAD ELF header. phdr=000000000275e248 >vaddr=0xffff88800080c000, paddr=0x80c000, sz=0x5000 e_phnum=13 >p_offset=0x80c000 >crash_core: Crash PT_LOAD ELF header. phdr=000000004e47ca09 >vaddr=0xffff888000900000, paddr=0x900000, sz=0xa5700000 e_phnum=14 >p_offset=0x900000 >crash_core: Crash PT_LOAD ELF header. phdr=00000000e56c8350 >vaddr=0xffff8880b6000000, paddr=0xb6000000, sz=0x7d51018 e_phnum=15 >p_offset=0xb6000000 >crash_core: Crash PT_LOAD ELF header. phdr=0000000099d67ff3 >vaddr=0xffff8880bdd51018, paddr=0xbdd51018, sz=0x27440 e_phnum=16 >p_offset=0xbdd51018 >crash_core: Crash PT_LOAD ELF header. phdr=00000000461a2f21 >vaddr=0xffff8880bdd78458, paddr=0xbdd78458, sz=0xbc0 e_phnum=17 >p_offset=0xbdd78458 >crash_core: Crash PT_LOAD ELF header. phdr=0000000058149b54 >vaddr=0xffff8880bdd79018, paddr=0xbdd79018, sz=0x9a40 e_phnum=18 >p_offset=0xbdd79018 >crash_core: Crash PT_LOAD ELF header. phdr=000000001e30ff2c >vaddr=0xffff8880bdd82a58, paddr=0xbdd82a58, sz=0xdbc5a8 e_phnum=19 >p_offset=0xbdd82a58 >crash_core: Crash PT_LOAD ELF header. phdr=00000000e67a9768 >vaddr=0xffff8880bec00000, paddr=0xbec00000, sz=0xaed000 e_phnum=20 >p_offset=0xbec00000 >crash_core: Crash PT_LOAD ELF header. phdr=000000005909c4c6 >vaddr=0xffff8880bf9ff000, paddr=0xbf9ff000, sz=0x453000 e_phnum=21 >p_offset=0xbf9ff000 >crash_core: Crash PT_LOAD ELF header. phdr=00000000473d74ef >vaddr=0xffff8880bfe58000, paddr=0xbfe58000, sz=0x64000 e_phnum=22 >p_offset=0xbfe58000 >crash_core: Crash PT_LOAD ELF header. phdr=00000000abde8123 >vaddr=0xffff888100000000, paddr=0x100000000, sz=0x23f000000 e_phnum=23 >p_offset=0x100000000 >crash_core: Crash PT_LOAD ELF header. phdr=00000000bda3e0bf >vaddr=0xffff88843f000000, paddr=0x43f000000, sz=0x1000000 e_phnum=24 >p_offset=0x43f000000 >kexec: Loaded ELF headers at 0x33f000000 bufsz=0x1000 memsz=0xe1000 >BUG: kernel NULL pointer dereference, address: 0000000000000000 >#PF: supervisor read access in kernel mode >#PF: error_code(0x0000) - not-present page >PGD 0 P4D 0 >Oops: Oops: 0000 [#1] SMP NOPTI >CPU: 5 UID: 0 PID: 3812 Comm: kexec Not tainted 6.14.0-rc1+ #20 >Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 2025.02-6 >04/08/2025 >RIP: 0010:sized_strscpy+0x71/0x150 >Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb >11 >48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e 49 >89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c >RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246 >RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080 >RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808 >RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080 >R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000 >R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000 >FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0 >PKRU: 55555554 >Call Trace: > <TASK> > ? __die+0x23/0x60 > ? page_fault_oops+0x177/0x510 > ? _prb_read_valid+0x2e7/0x370 > ? exc_page_fault+0x6f/0x130 > ? asm_exc_page_fault+0x26/0x30 > ? sized_strscpy+0x71/0x150 > crash_load_dm_crypt_keys+0x1bc/0x370 > bzImage64_load+0x41b/0xa30 > __do_sys_kexec_file_load+0x2af/0x8a0 > do_syscall_64+0x4b/0x110 > entry_SYSCALL_64_after_hwframe+0x76/0x7e >RIP: 0033:0x7f09ea848d6d >Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 >89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 >c3 48 8b 0d 6b 70 0d 00 f7 d8 64 89 01 48 >RSP: 002b:00007fff8cf979e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000140 >RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f09ea848d6d >RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000003 >RBP: 0000000000000003 R08: 000000000000000a R09: 00007fff8cf97a10 >R10: 000055de70eee9a0 R11: 0000000000000206 R12: 0000000000000003 >R13: 00007fff8cf97d08 R14: 000055de4c336448 R15: 0000000000000004 > </TASK> >CR2: 0000000000000000 >---[ end trace 0000000000000000 ]--- >RIP: 0010:sized_strscpy+0x71/0x150 >Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb >11 48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e >49 89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c >RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246 >RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080 >RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808 >RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080 >R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000 >R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000 >FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0 >PKRU: 55555554 >Kernel panic - not syncing: Fatal exception >Kernel Offset: disabled > > >Calling a script that does the same thing works fine and loads the keys >correctly: > >[Service] >ExecStart=/root/kexec.sh > >~ # cat /root/kexec.sh >#!/bin/bash > >mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey >echo cryptsetup:mykey > /sys/kernel/config/crash_dm_crypt_keys/mykey/description >/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd > >If that's any help, my crypttab: > >~ # cat /etc/crypttab >root UUID=8001fca4-2e54-48e9-9235-031c19fc6e36 none luks,link-volume-key=@u::%logon:cryptsetup:mykey > >If you can't reproduce, I can help track this. Just let me know if you need >any help. > -- Best regards, Coiby
On Mon, Apr 28, 2025 at 05:02:23PM +0800, Coiby Xu wrote: >On Thu, Apr 24, 2025 at 02:08:55AM +0200, Arnaud Lefebvre wrote: >>On Fri, Feb 07, 2025 at 04:08:08PM +0800, Coiby Xu wrote: >>>LUKS is the standard for Linux disk encryption, widely adopted by users, >>>and in some cases, such as Confidential VMs, it is a requirement. With >>>kdump enabled, when the first kernel crashes, the system can boot into >>>the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) >>>to a specified target. However, there are two challenges when dumping >>>vmcore to a LUKS-encrypted device: >>> >>>- Kdump kernel may not be able to decrypt the LUKS partition. For some >>> machines, a system administrator may not have a chance to enter the >>> password to decrypt the device in kdump initramfs after the 1st kernel >>> crashes; For cloud confidential VMs, depending on the policy the >>> kdump kernel may not be able to unseal the keys with TPM and the >>> console virtual keyboard is untrusted. >>> >>>- LUKS2 by default use the memory-hard Argon2 key derivation function >>> which is quite memory-consuming compared to the limited memory reserved >>> for kdump. Take Fedora example, by default, only 256M is reserved for >>> systems having memory between 4G-64G. With LUKS enabled, ~1300M needs >>> to be reserved for kdump. Note if the memory reserved for kdump can't >>> be used by 1st kernel i.e. an user sees ~1300M memory missing in the >>> 1st kernel. >>> >>>Besides users (at least for Fedora) usually expect kdump to work out of >>>the box i.e. no manual password input or custom crashkernel value is >>>needed. And it doesn't make sense to derivate the keys again in kdump >>>kernel which seems to be redundant work. >>> >>>This patch set addresses the above issues by making the LUKS volume keys >>>persistent for kdump kernel with the help of cryptsetup's new APIs >>>(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of >>>the kdump copies of LUKS volume keys, >>> >>>1. After the 1st kernel loads the initramfs during boot, systemd >>> use an user-input passphrase to de-crypt the LUKS volume keys >>> or TPM-sealed key and then save the volume keys to specified keyring >>> (using the --link-vk-to-keyring API) and the key will expire within >>> specified time. >>> >>>2. A user space tool (kdump initramfs loader like kdump-utils) create >>> key items inside /sys/kernel/config/crash_dm_crypt_keys to inform >>> the 1st kernel which keys are needed. >>> >>>3. When the kdump initramfs is loaded by the kexec_file_load >>> syscall, the 1st kernel will iterate created key items, save the >>> keys to kdump reserved memory. >>> >>>4. When the 1st kernel crashes and the kdump initramfs is booted, the >>> kdump initramfs asks the kdump kernel to create a user key using the >>> key stored in kdump reserved memory by writing yes to >>> /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted >>> device is unlocked with libcryptsetup's --volume-key-keyring API. >>> >>>5. The system gets rebooted to the 1st kernel after dumping vmcore to >>> the LUKS encrypted device is finished >>> >>>After libcryptsetup saving the LUKS volume keys to specified keyring, >>>whoever takes this should be responsible for the safety of these copies >>>of keys. The keys will be saved in the memory area exclusively reserved >>>for kdump where even the 1st kernel has no direct access. And further >>>more, two additional protections are added, >>>- save the copy randomly in kdump reserved memory as suggested by Jan >>>- clear the _PAGE_PRESENT flag of the page that stores the copy as >>> suggested by Pingfan >>> >>>This patch set only supports x86. There will be patches to support other >>>architectures once this patch set gets merged. >>> >> >>I'm not sure what's the problem here but I can reliably trigger a kernel >>panic on a qemu VM + custom kernel (compiled from >>bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b + your patches). > >Hi Arnaud, > >Thanks for testing the patches, finding this issue and also sharing the >details to reproduce it! > Hello, You're welcome, thanks to you for this patch series! >> >>When I configure the crash configfs and call kexec in a systemd service >>using ExecStart=, the panic occurs when I start the service: >> >>~ # cat /etc/systemd/system/my-kexec.service >>[Unit] >>Description=kexec loading for the crash capture kernel >> >>[Service] >>Type=oneshot >>ExecStart=/usr/bin/mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey >>ExecStart=/usr/bin/echo cryptsetup:mykey >/sys/kernel/config/crash_dm_crypt_keys/mykey/description >>ExecStart=/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd >>KeyringMode=shared > >Can you try putting the above commands into a script e.g. >/usr/local/bin/my-kexec.sh and then using >ExecStart=/usr/local/bin/my-kexec.sh >so I can be more sure that I've reproduced your issue? > I believe that's what I wrote at the end of my previous message (see below the panic trace). It works fine using a script like that. Did you miss it or is there a difference with what you're asking? >> >>[Install] >>WantedBy=default.target >> >> >>Starting the service: >> >>~ # systemctl start my-kexec.service >>kexec_file: kernel: 00000000ace85dcc kernel_size: 0x16e3000 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000d08940fa >>vaddr=0xffff888000100000, paddr=0x100000, sz=0x700000 e_phnum=11 >>p_offset=0x100000 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000304ef570 >>vaddr=0xffff888000808000, paddr=0x808000, sz=0x3000 e_phnum=12 >>p_offset=0x808000 >>crash_core: Crash PT_LOAD ELF header. phdr=000000000275e248 >>vaddr=0xffff88800080c000, paddr=0x80c000, sz=0x5000 e_phnum=13 >>p_offset=0x80c000 >>crash_core: Crash PT_LOAD ELF header. phdr=000000004e47ca09 >>vaddr=0xffff888000900000, paddr=0x900000, sz=0xa5700000 e_phnum=14 >>p_offset=0x900000 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000e56c8350 >>vaddr=0xffff8880b6000000, paddr=0xb6000000, sz=0x7d51018 e_phnum=15 >>p_offset=0xb6000000 >>crash_core: Crash PT_LOAD ELF header. phdr=0000000099d67ff3 >>vaddr=0xffff8880bdd51018, paddr=0xbdd51018, sz=0x27440 e_phnum=16 >>p_offset=0xbdd51018 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000461a2f21 >>vaddr=0xffff8880bdd78458, paddr=0xbdd78458, sz=0xbc0 e_phnum=17 >>p_offset=0xbdd78458 >>crash_core: Crash PT_LOAD ELF header. phdr=0000000058149b54 >>vaddr=0xffff8880bdd79018, paddr=0xbdd79018, sz=0x9a40 e_phnum=18 >>p_offset=0xbdd79018 >>crash_core: Crash PT_LOAD ELF header. phdr=000000001e30ff2c >>vaddr=0xffff8880bdd82a58, paddr=0xbdd82a58, sz=0xdbc5a8 e_phnum=19 >>p_offset=0xbdd82a58 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000e67a9768 >>vaddr=0xffff8880bec00000, paddr=0xbec00000, sz=0xaed000 e_phnum=20 >>p_offset=0xbec00000 >>crash_core: Crash PT_LOAD ELF header. phdr=000000005909c4c6 >>vaddr=0xffff8880bf9ff000, paddr=0xbf9ff000, sz=0x453000 e_phnum=21 >>p_offset=0xbf9ff000 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000473d74ef >>vaddr=0xffff8880bfe58000, paddr=0xbfe58000, sz=0x64000 e_phnum=22 >>p_offset=0xbfe58000 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000abde8123 >>vaddr=0xffff888100000000, paddr=0x100000000, sz=0x23f000000 e_phnum=23 >>p_offset=0x100000000 >>crash_core: Crash PT_LOAD ELF header. phdr=00000000bda3e0bf >>vaddr=0xffff88843f000000, paddr=0x43f000000, sz=0x1000000 e_phnum=24 >>p_offset=0x43f000000 >>kexec: Loaded ELF headers at 0x33f000000 bufsz=0x1000 memsz=0xe1000 >>BUG: kernel NULL pointer dereference, address: 0000000000000000 >>#PF: supervisor read access in kernel mode >>#PF: error_code(0x0000) - not-present page >>PGD 0 P4D 0 >>Oops: Oops: 0000 [#1] SMP NOPTI >>CPU: 5 UID: 0 PID: 3812 Comm: kexec Not tainted 6.14.0-rc1+ #20 >>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 2025.02-6 >>04/08/2025 >>RIP: 0010:sized_strscpy+0x71/0x150 >>Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb >>11 >>48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e 49 >>89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c >>RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246 >>RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080 >>RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808 >>RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080 >>R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000 >>R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000 >>FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 >>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0 >>PKRU: 55555554 >>Call Trace: >><TASK> >>? __die+0x23/0x60 >>? page_fault_oops+0x177/0x510 >>? _prb_read_valid+0x2e7/0x370 >>? exc_page_fault+0x6f/0x130 >>? asm_exc_page_fault+0x26/0x30 >>? sized_strscpy+0x71/0x150 >>crash_load_dm_crypt_keys+0x1bc/0x370 >>bzImage64_load+0x41b/0xa30 >>__do_sys_kexec_file_load+0x2af/0x8a0 >>do_syscall_64+0x4b/0x110 >>entry_SYSCALL_64_after_hwframe+0x76/0x7e >>RIP: 0033:0x7f09ea848d6d >>Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 >>89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 >>c3 48 8b 0d 6b 70 0d 00 f7 d8 64 89 01 48 >>RSP: 002b:00007fff8cf979e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000140 >>RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f09ea848d6d >>RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000003 >>RBP: 0000000000000003 R08: 000000000000000a R09: 00007fff8cf97a10 >>R10: 000055de70eee9a0 R11: 0000000000000206 R12: 0000000000000003 >>R13: 00007fff8cf97d08 R14: 000055de4c336448 R15: 0000000000000004 >></TASK> >>CR2: 0000000000000000 >>---[ end trace 0000000000000000 ]--- >>RIP: 0010:sized_strscpy+0x71/0x150 >>Code: b9 80 80 80 80 80 80 80 80 48 c1 e8 03 48 8d 1c c5 08 00 00 00 31 c0 eb >>11 48 89 34 07 48 83 c0 08 48 39 d8 0f 84 83 00 00 00 <49> 8b 34 00 4a 8d 14 1e >>49 89 f2 49 f7 d2 4c 21 d2 4c 8d 14 07 4c >>RSP: 0018:ffffc9000420fc68 EFLAGS: 00010246 >>RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000080 >>RDX: 0000000000000080 RSI: 0000000000000000 RDI: ffff8881030ec808 >>RBP: ffff888109724000 R08: 0000000000000000 R09: 8080808080808080 >>R10: ffffc9000420fc78 R11: fefefefefefefeff R12: ffffc90004219000 >>R13: ffff888104a80000 R14: 0000000000000008 R15: 0000000000000000 >>FS: 00007f09ea73f740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 >>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>CR2: 0000000000000000 CR3: 0000000120760002 CR4: 0000000000772ef0 >>PKRU: 55555554 >>Kernel panic - not syncing: Fatal exception >>Kernel Offset: disabled >> >> >>Calling a script that does the same thing works fine and loads the keys >>correctly: >> >>[Service] >>ExecStart=/root/kexec.sh >> >>~ # cat /root/kexec.sh >>#!/bin/bash >> >>mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey >>echo cryptsetup:mykey > /sys/kernel/config/crash_dm_crypt_keys/mykey/description >>/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd >> >>If that's any help, my crypttab: >> >>~ # cat /etc/crypttab >>root UUID=8001fca4-2e54-48e9-9235-031c19fc6e36 none luks,link-volume-key=@u::%logon:cryptsetup:mykey >> >>If you can't reproduce, I can help track this. Just let me know if you need >>any help. >> > >-- >Best regards, >Coiby >
On Mon, Apr 28, 2025 at 08:40:44PM +0200, Arnaud Lefebvre wrote: >On Mon, Apr 28, 2025 at 05:02:23PM +0800, Coiby Xu wrote: >>On Thu, Apr 24, 2025 at 02:08:55AM +0200, Arnaud Lefebvre wrote: >>>On Fri, Feb 07, 2025 at 04:08:08PM +0800, Coiby Xu wrote: >>>>LUKS is the standard for Linux disk encryption, widely adopted by users, >>>>and in some cases, such as Confidential VMs, it is a requirement. With >>>>kdump enabled, when the first kernel crashes, the system can boot into >>>>the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) >>>>to a specified target. However, there are two challenges when dumping >>>>vmcore to a LUKS-encrypted device: >>>> >>>>- Kdump kernel may not be able to decrypt the LUKS partition. For some >>>>machines, a system administrator may not have a chance to enter the >>>>password to decrypt the device in kdump initramfs after the 1st kernel >>>>crashes; For cloud confidential VMs, depending on the policy the >>>>kdump kernel may not be able to unseal the keys with TPM and the >>>>console virtual keyboard is untrusted. >>>> >>>>- LUKS2 by default use the memory-hard Argon2 key derivation function >>>>which is quite memory-consuming compared to the limited memory reserved >>>>for kdump. Take Fedora example, by default, only 256M is reserved for >>>>systems having memory between 4G-64G. With LUKS enabled, ~1300M needs >>>>to be reserved for kdump. Note if the memory reserved for kdump can't >>>>be used by 1st kernel i.e. an user sees ~1300M memory missing in the >>>>1st kernel. >>>> >>>>Besides users (at least for Fedora) usually expect kdump to work out of >>>>the box i.e. no manual password input or custom crashkernel value is >>>>needed. And it doesn't make sense to derivate the keys again in kdump >>>>kernel which seems to be redundant work. >>>> >>>>This patch set addresses the above issues by making the LUKS volume keys >>>>persistent for kdump kernel with the help of cryptsetup's new APIs >>>>(--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of >>>>the kdump copies of LUKS volume keys, >>>> >>>>1. After the 1st kernel loads the initramfs during boot, systemd >>>> use an user-input passphrase to de-crypt the LUKS volume keys >>>> or TPM-sealed key and then save the volume keys to specified keyring >>>> (using the --link-vk-to-keyring API) and the key will expire within >>>> specified time. >>>> >>>>2. A user space tool (kdump initramfs loader like kdump-utils) create >>>> key items inside /sys/kernel/config/crash_dm_crypt_keys to inform >>>> the 1st kernel which keys are needed. >>>> >>>>3. When the kdump initramfs is loaded by the kexec_file_load >>>> syscall, the 1st kernel will iterate created key items, save the >>>> keys to kdump reserved memory. >>>> >>>>4. When the 1st kernel crashes and the kdump initramfs is booted, the >>>> kdump initramfs asks the kdump kernel to create a user key using the >>>> key stored in kdump reserved memory by writing yes to >>>> /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted >>>> device is unlocked with libcryptsetup's --volume-key-keyring API. >>>> >>>>5. The system gets rebooted to the 1st kernel after dumping vmcore to >>>> the LUKS encrypted device is finished >>>> >>>>After libcryptsetup saving the LUKS volume keys to specified keyring, >>>>whoever takes this should be responsible for the safety of these copies >>>>of keys. The keys will be saved in the memory area exclusively reserved >>>>for kdump where even the 1st kernel has no direct access. And further >>>>more, two additional protections are added, >>>>- save the copy randomly in kdump reserved memory as suggested by Jan >>>>- clear the _PAGE_PRESENT flag of the page that stores the copy as >>>>suggested by Pingfan >>>> >>>>This patch set only supports x86. There will be patches to support other >>>>architectures once this patch set gets merged. >>>> >>> >>>I'm not sure what's the problem here but I can reliably trigger a kernel >>>panic on a qemu VM + custom kernel (compiled from >>>bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b + your patches). >> >>Hi Arnaud, >> >>Thanks for testing the patches, finding this issue and also sharing the >>details to reproduce it! >> > >Hello, > >You're welcome, thanks to you for this patch series! > >>> >>>When I configure the crash configfs and call kexec in a systemd service >>>using ExecStart=, the panic occurs when I start the service: >>> >>>~ # cat /etc/systemd/system/my-kexec.service >>>[Unit] >>>Description=kexec loading for the crash capture kernel >>> >>>[Service] >>>Type=oneshot >>>ExecStart=/usr/bin/mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey >>>ExecStart=/usr/bin/echo cryptsetup:mykey >/sys/kernel/config/crash_dm_crypt_keys/mykey/description >>>ExecStart=/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd >>>KeyringMode=shared >> >>Can you try putting the above commands into a script e.g. >>/usr/local/bin/my-kexec.sh and then using >>ExecStart=/usr/local/bin/my-kexec.sh >>so I can be more sure that I've reproduced your issue? >> > >I believe that's what I wrote at the end of my previous message >(see below the panic trace). It works fine using a script like that. > >Did you miss it or is there a difference with what you're asking? Oh, I missed it, thanks for the reminder! Now I'm sure I have reproduced the issue and understood how the problem happens. This is because systemd will ignore the bash redirection ">" as it doesn't invoke bash to run ExecStart. So you will see systemd logs as follows, warning: ignoring excess arguments, starting with with ‘>/sys/kernel/config/... So a key configfs item is created but the key description fails to be set. Unfortunately, the kernel doesn't check if the key description is null and crashes when trying to copy it. I'll send a new version of patches to resolve this issue, thanks! > > [...] >>>Kernel panic - not syncing: Fatal exception >>>Kernel Offset: disabled >>> >>> >>>Calling a script that does the same thing works fine and loads the keys >>>correctly: >>> >>>[Service] >>>ExecStart=/root/kexec.sh >>> >>>~ # cat /root/kexec.sh >>>#!/bin/bash >>> >>>mkdir /sys/kernel/config/crash_dm_crypt_keys/mykey >>>echo cryptsetup:mykey > /sys/kernel/config/crash_dm_crypt_keys/mykey/description >>>/usr/host/bin/kexec --debug --load-panic /linux-hv --initrd /crash-initrd >>> >>>If that's any help, my crypttab: >>> >>>~ # cat /etc/crypttab >>>root UUID=8001fca4-2e54-48e9-9235-031c19fc6e36 none luks,link-volume-key=@u::%logon:cryptsetup:mykey >>> >>>If you can't reproduce, I can help track this. Just let me know if you need >>>any help. >>> >> >>-- >>Best regards, >>Coiby >> > -- Best regards, Coiby
Hi X86 maintainers, Andrew, On 02/07/25 at 04:08pm, Coiby Xu wrote: ......snip... > This patch set only supports x86. There will be patches to support other > architectures once this patch set gets merged. who can help pick this patchset? It has been under many rounds of reviewing, now it's ready for merging from kdump reviewers' side. Or any comments or concern for further work? Thanks Baoquan > > v8 > - improve documentation [Randy] > - rebase onto 6.14.0-rc1 > > v7 > - Baoquan > - differentiate between failing to get dm crypt keys and no dm crypt keys > - add code comments, change function name and etc. to improve code readability > - add documentation for configfs API [Dave] > - fix building error found by kernel test robot > > v6 > - Baoquan > - support AMD SEV > - drop uncessary keys_header_size > - improve commit message of [PATCH 4/7] > > - Greg > - switch to configfs > - move ifdef from .c to .h files and rework kexec_random_start > - use tab instead of space for appended code comment > > - Process key description in a more flexible way to address problems > found by Ondrej > - improve cover letter > - fix an compilation error as found by kernel test robot > > v5 > - Baoquan > - limit the feature of placing kexec_buf randomly to kdump (CONFIG_CRASH_DUMP) > - add documentation for added sysfs API > - allow to re-send init command to support the case of user switching to > a different LUKS-encrypted target > - make CONFIG_CRASH_DM_CRYPT depends on CONFIG_DM_CRYPT > - check if the number of keys exceed KEY_NUM_MAX > - rename (struct keys_header).key_count as (struct keys_header).total_keys > to improve code readability > - improve commit message > - fix the failure of calling crash_exclude_mem_range (there is a split > of mem_range) > - use ret instead of r as return code > > - Greg > - add documentation for added sysfs API > - avoid spamming kernel logs > - fix a buffer overflow issue > - keep the state enums synced up with the string values > - use sysfs_emit other than sprintf > - explain KEY_NUM_MAX and KEY_SIZE_MAX > - s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/g > - improve code readability > > - Rebase onto latest Linus tree > > > v4 > - rebase onto latest Linus tree so Baoquan can apply the patches for > code review > - fix kernel test robot warnings > > v3 > - Support CPU/memory hot-plugging [Baoquan] > - Don't save the keys temporarily to simplify the implementation [Baoquan] > - Support multiple LUKS encrypted volumes > - Read logon key instead of user key to improve security [Ondrej] > - A kernel config option CRASH_DM_CRYPT for this feature (disabled by default) > - Fix warnings found by kernel test robot > - Rebase the code onto 6.9.0-rc5+ > > v2 > - work together with libscryptsetup's --link-vk-to-keyring/--volume-key-keyring APIs [Milan and Ondrej] > - add the case where console virtual keyboard is untrusted for confidential VM > - use dm_crypt_key instead of LUKS volume key [Milan and Eric] > - fix some code format issues > - don't move "struct kexec_segment" declaration > - Rebase the code onto latest Linus tree (6.7.0) > > v1 > - "Put the luks key handling related to crash_dump out into a separate > file kernel/crash_dump_luks.c" [Baoquan] > - Put the generic luks handling code before the x86 specific code to > make it easier for other arches to follow suit [Baoquan] > - Use phys_to_virt instead of "pfn -> page -> vaddr" [Dave Hansen] > - Drop the RFC prefix [Dave Young] > - Rebase the code onto latest Linus tree (6.4.0-rc4) > > RFC v2 > - libcryptsetup interacts with the kernel via sysfs instead of "hacking" > dm-crypt > - to save a kdump copy of the LUKS volume key in 1st kernel > - to add a logon key using the copy for libcryptsetup in kdump kernel [Milan] > - to avoid the incorrect usage of LUKS master key in dm-crypt [Milan] > - save the kdump copy of LUKS volume key randomly [Jan] > - mark the kdump copy inaccessible [Pingfan] > - Miscellaneous > - explain when operations related to the LUKS volume key happen [Jan] > - s/master key/volume key/g > - use crash_ instead of kexec_ as function prefix > - fix commit subject prefixes e.g. "x86, kdump" to x86/crash > > > Coiby Xu (7): > kexec_file: allow to place kexec_buf randomly > crash_dump: make dm crypt keys persist for the kdump kernel > crash_dump: store dm crypt keys in kdump reserved memory > crash_dump: reuse saved dm crypt keys for CPU/memory hot-plugging > crash_dump: retrieve dm crypt keys in kdump kernel > x86/crash: pass dm crypt keys to kdump kernel > x86/crash: make the page that stores the dm crypt keys inaccessible > > Documentation/admin-guide/kdump/kdump.rst | 32 ++ > arch/x86/kernel/crash.c | 26 +- > arch/x86/kernel/kexec-bzimage64.c | 11 + > arch/x86/kernel/machine_kexec_64.c | 22 ++ > include/linux/crash_core.h | 7 +- > include/linux/crash_dump.h | 2 + > include/linux/kexec.h | 34 ++ > kernel/Kconfig.kexec | 10 + > kernel/Makefile | 1 + > kernel/crash_dump_dm_crypt.c | 459 ++++++++++++++++++++++ > kernel/kexec_file.c | 3 + > 11 files changed, 604 insertions(+), 3 deletions(-) > create mode 100644 kernel/crash_dump_dm_crypt.c > > > base-commit: bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b > -- > 2.48.1 >
Hi Andrew, On 02/07/25 at 04:08pm, Coiby Xu wrote: ......snip... Ping again. This patchset is adding a generic infrastructure for luks support in crash dumping, and it adds the support in x86 ARCH. Since the x86 related change is only located in kdump only files, won't impact other x86 codes. Could you consider pick this into your tree? And by the way, this is a kdump only fix, not related to KHO (Kexec HandOver) which David suggested to adapt to earlier. Explained it here to remove misunderstanding. Thanks Baoquan > Documentation/admin-guide/kdump/kdump.rst | 32 ++ > arch/x86/kernel/crash.c | 26 +- > arch/x86/kernel/kexec-bzimage64.c | 11 + > arch/x86/kernel/machine_kexec_64.c | 22 ++ > include/linux/crash_core.h | 7 +- > include/linux/crash_dump.h | 2 + > include/linux/kexec.h | 34 ++ > kernel/Kconfig.kexec | 10 + > kernel/Makefile | 1 + > kernel/crash_dump_dm_crypt.c | 459 ++++++++++++++++++++++ > kernel/kexec_file.c | 3 + > 11 files changed, 604 insertions(+), 3 deletions(-) > create mode 100644 kernel/crash_dump_dm_crypt.c > > > base-commit: bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b > -- > 2.48.1 >
On 02/07/25 at 04:08pm, Coiby Xu wrote: > LUKS is the standard for Linux disk encryption, widely adopted by users, > and in some cases, such as Confidential VMs, it is a requirement. With > kdump enabled, when the first kernel crashes, the system can boot into > the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) > to a specified target. However, there are two challenges when dumping > vmcore to a LUKS-encrypted device: > > - Kdump kernel may not be able to decrypt the LUKS partition. For some > machines, a system administrator may not have a chance to enter the > password to decrypt the device in kdump initramfs after the 1st kernel > crashes; For cloud confidential VMs, depending on the policy the > kdump kernel may not be able to unseal the keys with TPM and the > console virtual keyboard is untrusted. > > - LUKS2 by default use the memory-hard Argon2 key derivation function > which is quite memory-consuming compared to the limited memory reserved > for kdump. Take Fedora example, by default, only 256M is reserved for > systems having memory between 4G-64G. With LUKS enabled, ~1300M needs > to be reserved for kdump. Note if the memory reserved for kdump can't > be used by 1st kernel i.e. an user sees ~1300M memory missing in the > 1st kernel. > > Besides users (at least for Fedora) usually expect kdump to work out of > the box i.e. no manual password input or custom crashkernel value is > needed. And it doesn't make sense to derivate the keys again in kdump > kernel which seems to be redundant work. > > This patch set addresses the above issues by making the LUKS volume keys > persistent for kdump kernel with the help of cryptsetup's new APIs > (--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of > the kdump copies of LUKS volume keys, > > 1. After the 1st kernel loads the initramfs during boot, systemd > use an user-input passphrase to de-crypt the LUKS volume keys > or TPM-sealed key and then save the volume keys to specified keyring > (using the --link-vk-to-keyring API) and the key will expire within > specified time. > > 2. A user space tool (kdump initramfs loader like kdump-utils) create > key items inside /sys/kernel/config/crash_dm_crypt_keys to inform > the 1st kernel which keys are needed. > > 3. When the kdump initramfs is loaded by the kexec_file_load > syscall, the 1st kernel will iterate created key items, save the > keys to kdump reserved memory. > > 4. When the 1st kernel crashes and the kdump initramfs is booted, the > kdump initramfs asks the kdump kernel to create a user key using the > key stored in kdump reserved memory by writing yes to > /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted > device is unlocked with libcryptsetup's --volume-key-keyring API. > > 5. The system gets rebooted to the 1st kernel after dumping vmcore to > the LUKS encrypted device is finished > > After libcryptsetup saving the LUKS volume keys to specified keyring, > whoever takes this should be responsible for the safety of these copies > of keys. The keys will be saved in the memory area exclusively reserved > for kdump where even the 1st kernel has no direct access. And further > more, two additional protections are added, > - save the copy randomly in kdump reserved memory as suggested by Jan > - clear the _PAGE_PRESENT flag of the page that stores the copy as > suggested by Pingfan > > This patch set only supports x86. There will be patches to support other > architectures once this patch set gets merged. This v8 looks good to me, thanks for the great effort, Coiby. Acked-by: Baoquan He <bhe@redhat.com>
Hi Andrew, On 02/11/25 at 06:25pm, Baoquan He wrote: > On 02/07/25 at 04:08pm, Coiby Xu wrote: > > LUKS is the standard for Linux disk encryption, widely adopted by users, > > and in some cases, such as Confidential VMs, it is a requirement. With > > kdump enabled, when the first kernel crashes, the system can boot into > > the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) > > to a specified target. However, there are two challenges when dumping > > vmcore to a LUKS-encrypted device: > > > > - Kdump kernel may not be able to decrypt the LUKS partition. For some > > machines, a system administrator may not have a chance to enter the > > password to decrypt the device in kdump initramfs after the 1st kernel > > crashes; For cloud confidential VMs, depending on the policy the > > kdump kernel may not be able to unseal the keys with TPM and the > > console virtual keyboard is untrusted. > > > > - LUKS2 by default use the memory-hard Argon2 key derivation function > > which is quite memory-consuming compared to the limited memory reserved > > for kdump. Take Fedora example, by default, only 256M is reserved for > > systems having memory between 4G-64G. With LUKS enabled, ~1300M needs > > to be reserved for kdump. Note if the memory reserved for kdump can't > > be used by 1st kernel i.e. an user sees ~1300M memory missing in the > > 1st kernel. > > > > Besides users (at least for Fedora) usually expect kdump to work out of > > the box i.e. no manual password input or custom crashkernel value is > > needed. And it doesn't make sense to derivate the keys again in kdump > > kernel which seems to be redundant work. > > > > This patch set addresses the above issues by making the LUKS volume keys > > persistent for kdump kernel with the help of cryptsetup's new APIs > > (--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of > > the kdump copies of LUKS volume keys, > > > > 1. After the 1st kernel loads the initramfs during boot, systemd > > use an user-input passphrase to de-crypt the LUKS volume keys > > or TPM-sealed key and then save the volume keys to specified keyring > > (using the --link-vk-to-keyring API) and the key will expire within > > specified time. > > > > 2. A user space tool (kdump initramfs loader like kdump-utils) create > > key items inside /sys/kernel/config/crash_dm_crypt_keys to inform > > the 1st kernel which keys are needed. > > > > 3. When the kdump initramfs is loaded by the kexec_file_load > > syscall, the 1st kernel will iterate created key items, save the > > keys to kdump reserved memory. > > > > 4. When the 1st kernel crashes and the kdump initramfs is booted, the > > kdump initramfs asks the kdump kernel to create a user key using the > > key stored in kdump reserved memory by writing yes to > > /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted > > device is unlocked with libcryptsetup's --volume-key-keyring API. > > > > 5. The system gets rebooted to the 1st kernel after dumping vmcore to > > the LUKS encrypted device is finished > > > > After libcryptsetup saving the LUKS volume keys to specified keyring, > > whoever takes this should be responsible for the safety of these copies > > of keys. The keys will be saved in the memory area exclusively reserved > > for kdump where even the 1st kernel has no direct access. And further > > more, two additional protections are added, > > - save the copy randomly in kdump reserved memory as suggested by Jan > > - clear the _PAGE_PRESENT flag of the page that stores the copy as > > suggested by Pingfan > > > > This patch set only supports x86. There will be patches to support other > > architectures once this patch set gets merged. Could you pick this patchset into your tree since no conern from other reviewers? Thanks Baoquan > > This v8 looks good to me, thanks for the great effort, Coiby. > > Acked-by: Baoquan He <bhe@redhat.com> >
On Mon, Feb 24, 2025 at 09:36:48AM +0800, Baoquan He wrote: >Hi Andrew, > >On 02/11/25 at 06:25pm, Baoquan He wrote: >> On 02/07/25 at 04:08pm, Coiby Xu wrote: >> > LUKS is the standard for Linux disk encryption, widely adopted by users, >> > and in some cases, such as Confidential VMs, it is a requirement. With >> > kdump enabled, when the first kernel crashes, the system can boot into >> > the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) >> > to a specified target. However, there are two challenges when dumping >> > vmcore to a LUKS-encrypted device: >> > [...] >> > >> > This patch set only supports x86. There will be patches to support other >> > architectures once this patch set gets merged. > >Could you pick this patchset into your tree since no conern from other >reviewers? Thanks to Baoquan for endorsing the patch set! Hi Andrew and Dave, If there is anything further I need to do, any suggestion or feedback will be appreciated! Or if it's more appropriate for Dave to take the patch set to the x86 tree, it couldn't be better. > >Thanks >Baoquan > >> >> This v8 looks good to me, thanks for the great effort, Coiby. >> >> Acked-by: Baoquan He <bhe@redhat.com> >> > -- Best regards, Coiby
On Tue, Feb 11, 2025 at 06:25:18PM +0800, Baoquan He wrote: >On 02/07/25 at 04:08pm, Coiby Xu wrote: >> LUKS is the standard for Linux disk encryption, widely adopted by users, >> and in some cases, such as Confidential VMs, it is a requirement. With >> kdump enabled, when the first kernel crashes, the system can boot into >> the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) >> to a specified target. However, there are two challenges when dumping >> vmcore to a LUKS-encrypted device: >> >> - Kdump kernel may not be able to decrypt the LUKS partition. For some >> machines, a system administrator may not have a chance to enter the >> password to decrypt the device in kdump initramfs after the 1st kernel >> crashes; For cloud confidential VMs, depending on the policy the >> kdump kernel may not be able to unseal the keys with TPM and the >> console virtual keyboard is untrusted. >> >> - LUKS2 by default use the memory-hard Argon2 key derivation function >> which is quite memory-consuming compared to the limited memory reserved >> for kdump. Take Fedora example, by default, only 256M is reserved for >> systems having memory between 4G-64G. With LUKS enabled, ~1300M needs >> to be reserved for kdump. Note if the memory reserved for kdump can't >> be used by 1st kernel i.e. an user sees ~1300M memory missing in the >> 1st kernel. >> >> Besides users (at least for Fedora) usually expect kdump to work out of >> the box i.e. no manual password input or custom crashkernel value is >> needed. And it doesn't make sense to derivate the keys again in kdump >> kernel which seems to be redundant work. >> >> This patch set addresses the above issues by making the LUKS volume keys >> persistent for kdump kernel with the help of cryptsetup's new APIs >> (--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of >> the kdump copies of LUKS volume keys, >> >> 1. After the 1st kernel loads the initramfs during boot, systemd >> use an user-input passphrase to de-crypt the LUKS volume keys >> or TPM-sealed key and then save the volume keys to specified keyring >> (using the --link-vk-to-keyring API) and the key will expire within >> specified time. >> >> 2. A user space tool (kdump initramfs loader like kdump-utils) create >> key items inside /sys/kernel/config/crash_dm_crypt_keys to inform >> the 1st kernel which keys are needed. >> >> 3. When the kdump initramfs is loaded by the kexec_file_load >> syscall, the 1st kernel will iterate created key items, save the >> keys to kdump reserved memory. >> >> 4. When the 1st kernel crashes and the kdump initramfs is booted, the >> kdump initramfs asks the kdump kernel to create a user key using the >> key stored in kdump reserved memory by writing yes to >> /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted >> device is unlocked with libcryptsetup's --volume-key-keyring API. >> >> 5. The system gets rebooted to the 1st kernel after dumping vmcore to >> the LUKS encrypted device is finished >> >> After libcryptsetup saving the LUKS volume keys to specified keyring, >> whoever takes this should be responsible for the safety of these copies >> of keys. The keys will be saved in the memory area exclusively reserved >> for kdump where even the 1st kernel has no direct access. And further >> more, two additional protections are added, >> - save the copy randomly in kdump reserved memory as suggested by Jan >> - clear the _PAGE_PRESENT flag of the page that stores the copy as >> suggested by Pingfan >> >> This patch set only supports x86. There will be patches to support other >> architectures once this patch set gets merged. > >This v8 looks good to me, thanks for the great effort, Coiby. > >Acked-by: Baoquan He <bhe@redhat.com> Great, thanks for reviewing and acknowledging the patch set! -- Best regards, Coiby
© 2016 - 2026 Red Hat, Inc.