[PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened

David Hildenbrand posted 11 patches 1 year, 3 months ago
There is a newer version of this series
[PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
Posted by David Hildenbrand 1 year, 3 months ago
Let's protect all vmcore modifications by the vmcore_mutex and
disallow vmcore modifications after the vmcore was opened: modifications
would no longer be safe. Properly synchronize against concurrent opening
of the vmcore.

As a nice side-effect, we now properly protect concurrent vmcore
modifications.

No need to grab the mutex during mmap()/read(): after we opened the
vmcore, modifications are impossible.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/vmcore.c | 42 +++++++++++++++++++-----------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index b91c304463c9..6371dbaa21be 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -243,33 +243,27 @@ static int vmcoredd_copy_dumps(struct iov_iter *iter, u64 start, size_t size)
 {
 	struct vmcoredd_node *dump;
 	u64 offset = 0;
-	int ret = 0;
 	size_t tsz;
 	char *buf;
 
-	mutex_lock(&vmcore_mutex);
 	list_for_each_entry(dump, &vmcoredd_list, list) {
 		if (start < offset + dump->size) {
 			tsz = min(offset + (u64)dump->size - start, (u64)size);
 			buf = dump->buf + start - offset;
-			if (copy_to_iter(buf, tsz, iter) < tsz) {
-				ret = -EFAULT;
-				goto out_unlock;
-			}
+			if (copy_to_iter(buf, tsz, iter) < tsz)
+				return -EFAULT;
 
 			size -= tsz;
 			start += tsz;
 
 			/* Leave now if buffer filled already */
 			if (!size)
-				goto out_unlock;
+				return 0;
 		}
 		offset += dump->size;
 	}
 
-out_unlock:
-	mutex_unlock(&vmcore_mutex);
-	return ret;
+	return 0;
 }
 
 #ifdef CONFIG_MMU
@@ -278,20 +272,16 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
 {
 	struct vmcoredd_node *dump;
 	u64 offset = 0;
-	int ret = 0;
 	size_t tsz;
 	char *buf;
 
-	mutex_lock(&vmcore_mutex);
 	list_for_each_entry(dump, &vmcoredd_list, list) {
 		if (start < offset + dump->size) {
 			tsz = min(offset + (u64)dump->size - start, (u64)size);
 			buf = dump->buf + start - offset;
 			if (remap_vmalloc_range_partial(vma, dst, buf, 0,
-							tsz)) {
-				ret = -EFAULT;
-				goto out_unlock;
-			}
+							tsz))
+				return -EFAULT;
 
 			size -= tsz;
 			start += tsz;
@@ -299,14 +289,12 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
 
 			/* Leave now if buffer filled already */
 			if (!size)
-				goto out_unlock;
+				return 0;
 		}
 		offset += dump->size;
 	}
 
-out_unlock:
-	mutex_unlock(&vmcore_mutex);
-	return ret;
+	return 0;
 }
 #endif /* CONFIG_MMU */
 #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
@@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
 		return -EINVAL;
 	}
 
+	/* We'll recheck under lock later. */
+	if (data_race(vmcore_opened))
+		return -EBUSY;
+
 	if (!data || !strlen(data->dump_name) ||
 	    !data->vmcoredd_callback || !data->size)
 		return -EINVAL;
@@ -1515,12 +1507,16 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
 	dump->buf = buf;
 	dump->size = data_size;
 
-	/* Add the dump to driver sysfs list */
+	/* Add the dump to driver sysfs list and update the elfcore hdr */
 	mutex_lock(&vmcore_mutex);
-	list_add_tail(&dump->list, &vmcoredd_list);
-	mutex_unlock(&vmcore_mutex);
+	if (vmcore_opened) {
+		ret = -EBUSY;
+		goto out_err;
+	}
 
+	list_add_tail(&dump->list, &vmcoredd_list);
 	vmcoredd_update_size(data_size);
+	mutex_unlock(&vmcore_mutex);
 	return 0;
 
 out_err:
-- 
2.46.1
Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
Posted by Baoquan He 1 year, 2 months ago
On 10/25/24 at 05:11pm, David Hildenbrand wrote:
......snip...
> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>  		return -EINVAL;
>  	}
>  
> +	/* We'll recheck under lock later. */
> +	if (data_race(vmcore_opened))
> +		return -EBUSY;

As I commented to patch 7, if vmcore is opened and closed after
checking, do we need to give up any chance to add device dumping
as below? 

fd = open(/proc/vmcore);
...do checking;
close(fd);

quit any device dump adding;

run makedumpfile on s390;
  ->fd = open(/proc/vmcore);
    -> try to dump;
  ->close(fd);

> +
>  	if (!data || !strlen(data->dump_name) ||
>  	    !data->vmcoredd_callback || !data->size)
>  		return -EINVAL;
> @@ -1515,12 +1507,16 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>  	dump->buf = buf;
>  	dump->size = data_size;
>  
> -	/* Add the dump to driver sysfs list */
> +	/* Add the dump to driver sysfs list and update the elfcore hdr */
>  	mutex_lock(&vmcore_mutex);
> -	list_add_tail(&dump->list, &vmcoredd_list);
> -	mutex_unlock(&vmcore_mutex);
> +	if (vmcore_opened) {
> +		ret = -EBUSY;
> +		goto out_err;
> +	}
>  
> +	list_add_tail(&dump->list, &vmcoredd_list);
>  	vmcoredd_update_size(data_size);
> +	mutex_unlock(&vmcore_mutex);
>  	return 0;
>  
>  out_err:
> -- 
> 2.46.1
>
Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
Posted by David Hildenbrand 1 year, 2 months ago
On 22.11.24 10:16, Baoquan He wrote:
> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> ......snip...
>> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>>   		return -EINVAL;
>>   	}
>>   
>> +	/* We'll recheck under lock later. */
>> +	if (data_race(vmcore_opened))
>> +		return -EBUSY;
> 

Hi,

> As I commented to patch 7, if vmcore is opened and closed after
> checking, do we need to give up any chance to add device dumping
> as below?
> 
> fd = open(/proc/vmcore);
> ...do checking;
> close(fd);
> 
> quit any device dump adding;
> 
> run makedumpfile on s390;
>    ->fd = open(/proc/vmcore);
>      -> try to dump;
>    ->close(fd);

The only reasonable case where this could happen (with virtio_mem) would 
be when you hotplug a virtio-mem device into a VM that is currently in 
the kdump kernel. However, in this case, the device would not provide 
any memory we want to dump:

(1) The memory was not available to the 1st (crashed) kernel, because
     the device got hotplugged later.
(2) Hotplugged virtio-mem devices show up with "no plugged memory",
     meaning there wouldn't be even something to dump.

Drivers will be loaded (as part of the kernel or as part of the initrd) 
before any kdump action is happening. Similarly, just imagine your NIC 
driver not being loaded when you start dumping to a network share ...

This should similarly apply to vmcoredd providers.

There is another concern I had at some point with changing the effective 
/proc/vmcore size after someone already opened it, and might assume the 
size will stay unmodified (IOW, the file was completely static before 
vmcoredd showed up).

So unless there is a real use case that requires tracking whether the 
file is no longer open, to support modifying the vmcore afterwards, we 
should keep it simple.

I am not aware of any such cases, and my experiments with virtio_mem 
showed that the driver get loaded extremely early from the initrd, 
compared to when we actually start messing with /proc/vmcore from user 
space.

Thanks!

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
Posted by Baoquan He 1 year, 2 months ago
On 11/22/24 at 10:30am, David Hildenbrand wrote:
> On 22.11.24 10:16, Baoquan He wrote:
> > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > ......snip...
> > > @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> > >   		return -EINVAL;
> > >   	}
> > > +	/* We'll recheck under lock later. */
> > > +	if (data_race(vmcore_opened))
> > > +		return -EBUSY;
> > 
> 
> Hi,
> 
> > As I commented to patch 7, if vmcore is opened and closed after
> > checking, do we need to give up any chance to add device dumping
> > as below?
> > 
> > fd = open(/proc/vmcore);
> > ...do checking;
> > close(fd);
> > 
> > quit any device dump adding;
> > 
> > run makedumpfile on s390;
> >    ->fd = open(/proc/vmcore);
> >      -> try to dump;
> >    ->close(fd);
> 
> The only reasonable case where this could happen (with virtio_mem) would be
> when you hotplug a virtio-mem device into a VM that is currently in the
> kdump kernel. However, in this case, the device would not provide any memory
> we want to dump:
> 
> (1) The memory was not available to the 1st (crashed) kernel, because
>     the device got hotplugged later.
> (2) Hotplugged virtio-mem devices show up with "no plugged memory",
>     meaning there wouldn't be even something to dump.
> 
> Drivers will be loaded (as part of the kernel or as part of the initrd)
> before any kdump action is happening. Similarly, just imagine your NIC
> driver not being loaded when you start dumping to a network share ...
> 
> This should similarly apply to vmcoredd providers.
> 
> There is another concern I had at some point with changing the effective
> /proc/vmcore size after someone already opened it, and might assume the size
> will stay unmodified (IOW, the file was completely static before vmcoredd
> showed up).
> 
> So unless there is a real use case that requires tracking whether the file
> is no longer open, to support modifying the vmcore afterwards, we should
> keep it simple.
> 
> I am not aware of any such cases, and my experiments with virtio_mem showed
> that the driver get loaded extremely early from the initrd, compared to when
> we actually start messing with /proc/vmcore from user space.

Hmm, thanks for sharing your thoughts. I personally think if we could,
we had better make code as robust as possible. Here, since we have
already integrated the lock with one vmcore_mutex, whether the vmcoredd 
is added before or after /proc/vmcore opening/closing, it won't harm.
The benefit is it works well with the current kwown case, virtio-mem 
probing, makedumpfile running. And it also works well with other
possible cases, e.g if you doubt virtio-mem dumping and want to debug,
you set rd.break or take any way to drop into emengency shell of kdump
kernel, you can play it to poke virtio-mem module again and run makedumpfile
manually or reversing the order or taking any testing. Then kernel
implementation should not preset that user space have to run the
makedumpfile after the much earlier virtio-mem probing. If we implement
codes according to preset userspace scenario, that limit the future
adapttion, IMHO.
Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
Posted by David Hildenbrand 1 year, 2 months ago
On 25.11.24 15:41, Baoquan He wrote:
> On 11/22/24 at 10:30am, David Hildenbrand wrote:
>> On 22.11.24 10:16, Baoquan He wrote:
>>> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>>> ......snip...
>>>> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>>>>    		return -EINVAL;
>>>>    	}
>>>> +	/* We'll recheck under lock later. */
>>>> +	if (data_race(vmcore_opened))
>>>> +		return -EBUSY;
>>>
>>
>> Hi,
>>
>>> As I commented to patch 7, if vmcore is opened and closed after
>>> checking, do we need to give up any chance to add device dumping
>>> as below?
>>>
>>> fd = open(/proc/vmcore);
>>> ...do checking;
>>> close(fd);
>>>
>>> quit any device dump adding;
>>>
>>> run makedumpfile on s390;
>>>     ->fd = open(/proc/vmcore);
>>>       -> try to dump;
>>>     ->close(fd);
>>
>> The only reasonable case where this could happen (with virtio_mem) would be
>> when you hotplug a virtio-mem device into a VM that is currently in the
>> kdump kernel. However, in this case, the device would not provide any memory
>> we want to dump:
>>
>> (1) The memory was not available to the 1st (crashed) kernel, because
>>      the device got hotplugged later.
>> (2) Hotplugged virtio-mem devices show up with "no plugged memory",
>>      meaning there wouldn't be even something to dump.
>>
>> Drivers will be loaded (as part of the kernel or as part of the initrd)
>> before any kdump action is happening. Similarly, just imagine your NIC
>> driver not being loaded when you start dumping to a network share ...
>>
>> This should similarly apply to vmcoredd providers.
>>
>> There is another concern I had at some point with changing the effective
>> /proc/vmcore size after someone already opened it, and might assume the size
>> will stay unmodified (IOW, the file was completely static before vmcoredd
>> showed up).
>>
>> So unless there is a real use case that requires tracking whether the file
>> is no longer open, to support modifying the vmcore afterwards, we should
>> keep it simple.
>>
>> I am not aware of any such cases, and my experiments with virtio_mem showed
>> that the driver get loaded extremely early from the initrd, compared to when
>> we actually start messing with /proc/vmcore from user space.
> 
> Hmm, thanks for sharing your thoughts.

To assist the discussion, here is the kdump dmesg output on s390x with the
virtio-mem driver as part of the initrd:


[Fri Nov 29 04:55:28 2024] Run /init as init process
[Fri Nov 29 04:55:28 2024]   with arguments:
[Fri Nov 29 04:55:28 2024]     /init
[Fri Nov 29 04:55:28 2024]   with environment:
[Fri Nov 29 04:55:28 2024]     HOME=/
[Fri Nov 29 04:55:28 2024]     TERM=linux
[Fri Nov 29 04:55:28 2024]     numa=off
[Fri Nov 29 04:55:28 2024]     cma=0
[Fri Nov 29 04:55:28 2024] loop: module loaded
[Fri Nov 29 04:55:28 2024] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[Fri Nov 29 04:55:28 2024] loop0: detected capacity change from 0 to 45280
[Fri Nov 29 04:55:28 2024] overlayfs: upper fs does not support RENAME_WHITEOUT.
[Fri Nov 29 04:55:28 2024] overlayfs: failed to set xattr on upper
[Fri Nov 29 04:55:28 2024] overlayfs: ...falling back to uuid=null.
[Fri Nov 29 04:55:28 2024] systemd[1]: Successfully made /usr/ read-only.
[Fri Nov 29 04:55:28 2024] systemd[1]: systemd 256.8-1.fc41 running in system mode [...]
[Fri Nov 29 04:55:28 2024] systemd[1]: Detected virtualization kvm.
[Fri Nov 29 04:55:28 2024] systemd[1]: Detected architecture s390x.
[Fri Nov 29 04:55:28 2024] systemd[1]: Running in initrd.
[Fri Nov 29 04:55:28 2024] systemd[1]: No hostname configured, using default hostname.
[Fri Nov 29 04:55:28 2024] systemd[1]: Hostname set to <localhost>.
[Fri Nov 29 04:55:28 2024] systemd[1]: Queued start job for default target initrd.target.
[Fri Nov 29 04:55:28 2024] systemd[1]: Started systemd-ask-password-console.path - Dispatch Password Requests to Console Directory Watch.
[Fri Nov 29 04:55:28 2024] systemd[1]: Expecting device dev-mapper-sysvg\x2droot.device - /dev/mapper/sysvg-root...
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target initrd-usr-fs.target - Initrd /usr File System.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target paths.target - Path Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target slices.target - Slice Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target swap.target - Swaps.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target timers.target - Timer Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-journald-dev-log.socket - Journal Socket (/dev/log).
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-journald.socket - Journal Sockets.
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-udevd-control.socket - udev Control Socket.
[Fri Nov 29 04:55:28 2024] systemd[1]: Listening on systemd-udevd-kernel.socket - udev Kernel Socket.
[Fri Nov 29 04:55:28 2024] systemd[1]: Reached target sockets.target - Socket Units.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting kmod-static-nodes.service - Create List of Static Device Nodes...
[Fri Nov 29 04:55:28 2024] systemd[1]: memstrack.service - Memstrack Anylazing Service was skipped because no trigger condition checks were met.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-journald.service - Journal Service...
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-modules-load.service - Load Kernel Modules...
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup...
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished kmod-static-nodes.service - Create List of Static Device Nodes.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-tmpfiles-setup-dev-early.service - Create Static Device Nodes in /dev gracefully...
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished systemd-modules-load.service - Load Kernel Modules.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-sysctl.service - Apply Kernel Variables...
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished systemd-sysctl.service - Apply Kernel Variables.
[Fri Nov 29 04:55:28 2024] systemd[1]: Finished systemd-tmpfiles-setup-dev-early.service - Create Static Device Nodes in /dev gracefully.
[Fri Nov 29 04:55:28 2024] systemd[1]: Starting systemd-sysusers.service - Create System Users...
[Fri Nov 29 04:55:28 2024] systemd-journald[205]: Collecting audit messages is disabled.
[Fri Nov 29 04:55:28 2024] systemd[1]: Started systemd-journald.service - Journal Service.
[Fri Nov 29 04:55:28 2024] evm: overlay not supported
[Fri Nov 29 04:55:29 2024] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[Fri Nov 29 04:55:29 2024] device-mapper: uevent: version 1.0.3
[Fri Nov 29 04:55:29 2024] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
[Fri Nov 29 04:55:29 2024] VFIO - User Level meta-driver version: 0.3
[Fri Nov 29 04:55:29 2024] virtio_blk virtio1: 1/0/0 default/read/poll queues
[Fri Nov 29 04:55:29 2024] virtio_blk virtio1: [vda] 35651584 512-byte logical blocks (18.3 GB/17.0 GiB)
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: start address: 0x100000000
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: region size: 0x400000000
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: device block size: 0x100000
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: nid: 0
[Fri Nov 29 04:55:29 2024] virtio_mem virtio2: memory hot(un)plug disabled in kdump kernel
[Fri Nov 29 04:55:29 2024] GPT:Primary header thinks Alt. header is not at the end of the disk.
[Fri Nov 29 04:55:29 2024] GPT:14680063 != 35651583
[Fri Nov 29 04:55:29 2024] GPT:Alternate GPT header not at the end of the disk.
[Fri Nov 29 04:55:29 2024] GPT:14680063 != 35651583
[Fri Nov 29 04:55:29 2024] GPT: Use GNU Parted to correct GPT errors.
[Fri Nov 29 04:55:29 2024]  vda: vda1 vda2
[Fri Nov 29 04:55:29 2024] SGI XFS with ACLs, security attributes, scrub, quota, no debug enabled
[Fri Nov 29 04:55:29 2024] XFS: attr2 mount option is deprecated.
[Fri Nov 29 04:55:29 2024] XFS (dm-0): Mounting V5 Filesystem 4171e972-6865-4c47-ba7f-20bd52628650
[Fri Nov 29 04:55:29 2024] XFS (dm-0): Starting recovery (logdev: internal)
[Fri Nov 29 04:55:29 2024] XFS (dm-0): Ending recovery (logdev: internal)
Nov 29 04:55:28 localhost systemd-journald[205]: Journal started
Nov 29 04:55:28 localhost systemd-journald[205]: Runtime Journal (/run/log/journal/1377b7cdca054edd8904cb6f80695a23) is 1.4M, max 11.2M, 9.8M free.
Nov 29 04:55:28 localhost systemd-vconsole-setup[207]: /usr/bin/setfont failed with a "system error" (EX_OSERR), ignoring.
Nov 29 04:55:28 localhost systemd-modules-load[206]: Inserted module 'pkey'
Nov 29 04:55:28 localhost systemd-modules-load[206]: Module 'scsi_dh_alua' is built in
Nov 29 04:55:28 localhost systemd-modules-load[206]: Module 'scsi_dh_emc' is built in
Nov 29 04:55:28 localhost systemd-vconsole-setup[210]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version
Nov 29 04:55:28 localhost systemd-modules-load[206]: Module 'scsi_dh_rdac' is built in
Nov 29 04:55:28 localhost systemd-sysusers[216]: Creating group 'systemd-journal' with GID 190.
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-sysusers.service - Create System Users.
Nov 29 04:55:28 localhost systemd[1]: Starting systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev...
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev.
Nov 29 04:55:28 localhost systemd[1]: Reached target local-fs-pre.target - Preparation for Local File Systems.
Nov 29 04:55:28 localhost systemd[1]: Reached target local-fs.target - Local File Systems.
Nov 29 04:55:28 localhost systemd[1]: Starting systemd-tmpfiles-setup.service - Create System Files and Directories...
Nov 29 04:55:28 localhost systemd-tmpfiles[227]: /usr/lib/tmpfiles.d/var.conf:14: Duplicate line for path "/var/log", ignoring.
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-tmpfiles-setup.service - Create System Files and Directories.
Nov 29 04:55:28 localhost systemd-vconsole-setup[207]: Setting source virtual console failed, ignoring remaining ones.
Nov 29 04:55:28 localhost systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup.
Nov 29 04:55:28 localhost systemd[1]: Starting dracut-cmdline-ask.service - dracut ask for additional cmdline parameters...
Nov 29 04:55:28 localhost systemd[1]: Finished dracut-cmdline-ask.service - dracut ask for additional cmdline parameters.
Nov 29 04:55:28 localhost systemd[1]: Starting dracut-cmdline.service - dracut cmdline hook...
Nov 29 04:55:28 localhost dracut-cmdline[244]: dracut-102-3.fc41
Nov 29 04:55:28 localhost dracut-cmdline[244]: Using kernel command line parameters:  rd.zfcp.conf=0 rd.lvm.lv=sysvg/root   console=tty1 console=ttyS0,115200n8 memhp_default_state=online_movable nr_cpus=1 cgroup_disable=memory numa=off udev.children-max=2 panic=10 transparent_hugepage=never novmcoredd vmcp_cma=0 cma=0 hugetlb_cma=0
Nov 29 04:55:29 localhost dracut-cmdline[244]: cio_ignored disabled on commandline
Nov 29 04:55:29 localhost dracut-cmdline[338]: rm: cannot remove '/etc/zfcp.conf': No such file or directory
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-cmdline.service - dracut cmdline hook.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-pre-udev.service - dracut pre-udev hook...
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-pre-udev.service - dracut pre-udev hook.
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-udevd.service - Rule-based Manager for Device Events and Files...
Nov 29 04:55:29 localhost systemd-udevd[394]: Using default interface naming scheme 'v255'.
Nov 29 04:55:29 localhost systemd[1]: Started systemd-udevd.service - Rule-based Manager for Device Events and Files.
Nov 29 04:55:29 localhost systemd[1]: dracut-pre-trigger.service - dracut pre-trigger hook was skipped because no trigger condition checks were met.
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-udev-trigger.service - Coldplug All udev Devices...
Nov 29 04:55:29 localhost systemd[1]: Finished systemd-udev-trigger.service - Coldplug All udev Devices.
Nov 29 04:55:29 localhost systemd[1]: Created slice system-modprobe.slice - Slice /system/modprobe.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-initqueue.service - dracut initqueue hook...
Nov 29 04:55:29 localhost systemd[1]: Starting modprobe@configfs.service - Load Kernel Module configfs...
Nov 29 04:55:29 localhost systemd[1]: modprobe@configfs.service: Deactivated successfully.
Nov 29 04:55:29 localhost systemd[1]: Finished modprobe@configfs.service - Load Kernel Module configfs.
Nov 29 04:55:29 localhost systemd[1]: systemd-vconsole-setup.service: Deactivated successfully.
Nov 29 04:55:29 localhost systemd[1]: Stopped systemd-vconsole-setup.service - Virtual Console Setup.
Nov 29 04:55:29 localhost systemd[1]: Stopping systemd-vconsole-setup.service - Virtual Console Setup...
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup...
Nov 29 04:55:29 localhost systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup.
Nov 29 04:55:29 localhost dracut-initqueue[485]: Scanning devices vda2  for LVM logical volumes sysvg/root
Nov 29 04:55:29 localhost dracut-initqueue[485]:   sysvg/root linear
Nov 29 04:55:29 localhost systemd[1]: Found device dev-mapper-sysvg\x2droot.device - /dev/mapper/sysvg-root.
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd-root-device.target - Initrd Root Device.
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-initqueue.service - dracut initqueue hook.
Nov 29 04:55:29 localhost systemd[1]: Reached target remote-fs-pre.target - Preparation for Remote File Systems.
Nov 29 04:55:29 localhost systemd[1]: Reached target remote-fs.target - Remote File Systems.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-pre-mount.service - dracut pre-mount hook...
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-pre-mount.service - dracut pre-mount hook.
Nov 29 04:55:29 localhost systemd[1]: Starting systemd-fsck-root.service - File System Check on /dev/mapper/sysvg-root...
Nov 29 04:55:29 localhost systemd-fsck[531]: /usr/sbin/fsck.xfs: XFS file system.
Nov 29 04:55:29 localhost systemd[1]: Finished systemd-fsck-root.service - File System Check on /dev/mapper/sysvg-root.
Nov 29 04:55:29 localhost systemd[1]: Mounting sys-kernel-config.mount - Kernel Configuration File System...
Nov 29 04:55:29 localhost systemd[1]: Mounting sysroot.mount - /sysroot...
Nov 29 04:55:29 localhost systemd[1]: Mounted sys-kernel-config.mount - Kernel Configuration File System.
Nov 29 04:55:29 localhost systemd[1]: Reached target sysinit.target - System Initialization.
Nov 29 04:55:29 localhost systemd[1]: Reached target basic.target - Basic System.
Nov 29 04:55:29 localhost systemd[1]: System is tainted: unmerged-bin
Nov 29 04:55:29 localhost systemd[1]: Mounted sysroot.mount - /sysroot.
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd-root-fs.target - Initrd Root File System.
Nov 29 04:55:29 localhost systemd[1]: initrd-parse-etc.service - Mountpoints Configured in the Real Root was skipped because of an unmet condition check (ConditionPathExists=!/proc/vmcore).
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd-fs.target - Initrd File Systems.
Nov 29 04:55:29 localhost systemd[1]: Reached target initrd.target - Initrd Default Target.
Nov 29 04:55:29 localhost systemd[1]: dracut-mount.service - dracut mount hook was skipped because no trigger condition checks were met.
Nov 29 04:55:29 localhost systemd[1]: Starting dracut-pre-pivot.service - dracut pre-pivot and cleanup hook...
Nov 29 04:55:29 localhost systemd[1]: Finished dracut-pre-pivot.service - dracut pre-pivot and cleanup hook.
Nov 29 04:55:29 localhost systemd[1]: Starting kdump-capture.service - Kdump Vmcore Save Service...
Nov 29 04:55:29 localhost kdump[574]: Kdump is using the default log level(3).
Nov 29 04:55:30 localhost kdump[618]: saving to /sysroot/var/crash/127.0.0.1-2024-11-29-04:55:29/
Nov 29 04:55:30 localhost kdump[623]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2024-11-29-04:55:29/
Nov 29 04:55:30 localhost kdump[629]: saving vmcore-dmesg.txt complete
Nov 29 04:55:30 localhost kdump[631]: saving vmcore


Observe how the virtio_mem driver is loaded along with the other drivers that are essential for
any kdump activity.

> I personally think if we could,
> we had better make code as robust as possible.

And that's where we disagree. We will make is less robust if we don't warn the user that something
very unexpected is happening, treating it like it would be normal.

It isn't normal.

> Here, since we have
> already integrated the lock with one vmcore_mutex, whether the vmcoredd
> is added before or after /proc/vmcore opening/closing, it won't harm.
> The benefit is it works well with the current kwown case, virtio-mem
> probing, makedumpfile running. And it also works well with other
> possible cases, e.g if you doubt virtio-mem dumping and want to debug,
> you set rd.break or take any way to drop into emengency shell of kdump
> kernel, you can play it to poke virtio-mem module again and run makedumpfile
> manually or reversing the order or taking any testing. 

Intercepting kdump to trigger it manually will just work as it used to.

Having developed and debugged the virtio_mem driver in kdump mode, I did not once even
were tempted to do any of that, and looking back it would not have been any helpful ;)

Crashing in a VM is easy+cheap, rebooting in a VM is easy+cheap, recompiling virtio_mem +
rebuilding the kdump initrd is easy and cheap.

> Then kernel
> implementation should not preset that user space have to run the
> makedumpfile after the much earlier virtio-mem probing. If we implement
> codes according to preset userspace scenario, that limit the future
> adapttion, IMHO.

It limits the userspace scenario to something that is reasonable and future proof:
start working on the vmcore once the system is sufficiently initialized
(basic driver loaded).

Then have it be immutable, just like it always was before we started allowing to patch it.


The only "alterantive" I see is allow modifying the vmcore after it was closed again,
but still issue a warning if that happens.

But again,  I don't really see any reasonable use case for that.

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
Posted by Baoquan He 1 year, 2 months ago
On 11/29/24 at 11:38am, David Hildenbrand wrote:
> On 25.11.24 15:41, Baoquan He wrote:
> > On 11/22/24 at 10:30am, David Hildenbrand wrote:
> > > On 22.11.24 10:16, Baoquan He wrote:
> > > > On 10/25/24 at 05:11pm, David Hildenbrand wrote:
> > > > ......snip...
> > > > > @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> > > > >    		return -EINVAL;
> > > > >    	}
> > > > > +	/* We'll recheck under lock later. */
> > > > > +	if (data_race(vmcore_opened))
> > > > > +		return -EBUSY;
> > > > 
> > > 
> > > Hi,
> > > 
> > > > As I commented to patch 7, if vmcore is opened and closed after
> > > > checking, do we need to give up any chance to add device dumping
> > > > as below?
> > > > 
> > > > fd = open(/proc/vmcore);
> > > > ...do checking;
> > > > close(fd);
> > > > 
> > > > quit any device dump adding;
> > > > 
> > > > run makedumpfile on s390;
> > > >     ->fd = open(/proc/vmcore);
> > > >       -> try to dump;
> > > >     ->close(fd);
> > > 
> > > The only reasonable case where this could happen (with virtio_mem) would be
> > > when you hotplug a virtio-mem device into a VM that is currently in the
> > > kdump kernel. However, in this case, the device would not provide any memory
> > > we want to dump:
> > > 
> > > (1) The memory was not available to the 1st (crashed) kernel, because
> > >      the device got hotplugged later.
> > > (2) Hotplugged virtio-mem devices show up with "no plugged memory",
> > >      meaning there wouldn't be even something to dump.
> > > 
> > > Drivers will be loaded (as part of the kernel or as part of the initrd)
> > > before any kdump action is happening. Similarly, just imagine your NIC
> > > driver not being loaded when you start dumping to a network share ...
> > > 
> > > This should similarly apply to vmcoredd providers.
> > > 
> > > There is another concern I had at some point with changing the effective
> > > /proc/vmcore size after someone already opened it, and might assume the size
> > > will stay unmodified (IOW, the file was completely static before vmcoredd
> > > showed up).
> > > 
> > > So unless there is a real use case that requires tracking whether the file
> > > is no longer open, to support modifying the vmcore afterwards, we should
> > > keep it simple.
> > > 
> > > I am not aware of any such cases, and my experiments with virtio_mem showed
> > > that the driver get loaded extremely early from the initrd, compared to when
> > > we actually start messing with /proc/vmcore from user space.

It's OK, David, I don't have strong opinion about the current
implementation. I raised this concern because

1) I saw the original vmcoredd only warn when doing register if
vmcore_opened is true;

2) in patch 1, it says vmcore_mutex is introduced to protect vmcore
modifications from concurrent opening. If we are confident, the old
vmcoredd_mutex can guarantee it, I could be wrong here.

Anyway, it's just a tiny concern, I believe it won't cause issue at
present. So it's up to you. 

Thanks
Re: [PATCH v1 03/11] fs/proc/vmcore: disallow vmcore modifications after the vmcore was opened
Posted by David Hildenbrand 1 year, 2 months ago
On 03.12.24 11:42, Baoquan He wrote:
> On 11/29/24 at 11:38am, David Hildenbrand wrote:
>> On 25.11.24 15:41, Baoquan He wrote:
>>> On 11/22/24 at 10:30am, David Hildenbrand wrote:
>>>> On 22.11.24 10:16, Baoquan He wrote:
>>>>> On 10/25/24 at 05:11pm, David Hildenbrand wrote:
>>>>> ......snip...
>>>>>> @@ -1482,6 +1470,10 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>>>>>>     		return -EINVAL;
>>>>>>     	}
>>>>>> +	/* We'll recheck under lock later. */
>>>>>> +	if (data_race(vmcore_opened))
>>>>>> +		return -EBUSY;
>>>>>
>>>>
>>>> Hi,
>>>>
>>>>> As I commented to patch 7, if vmcore is opened and closed after
>>>>> checking, do we need to give up any chance to add device dumping
>>>>> as below?
>>>>>
>>>>> fd = open(/proc/vmcore);
>>>>> ...do checking;
>>>>> close(fd);
>>>>>
>>>>> quit any device dump adding;
>>>>>
>>>>> run makedumpfile on s390;
>>>>>      ->fd = open(/proc/vmcore);
>>>>>        -> try to dump;
>>>>>      ->close(fd);
>>>>
>>>> The only reasonable case where this could happen (with virtio_mem) would be
>>>> when you hotplug a virtio-mem device into a VM that is currently in the
>>>> kdump kernel. However, in this case, the device would not provide any memory
>>>> we want to dump:
>>>>
>>>> (1) The memory was not available to the 1st (crashed) kernel, because
>>>>       the device got hotplugged later.
>>>> (2) Hotplugged virtio-mem devices show up with "no plugged memory",
>>>>       meaning there wouldn't be even something to dump.
>>>>
>>>> Drivers will be loaded (as part of the kernel or as part of the initrd)
>>>> before any kdump action is happening. Similarly, just imagine your NIC
>>>> driver not being loaded when you start dumping to a network share ...
>>>>
>>>> This should similarly apply to vmcoredd providers.
>>>>
>>>> There is another concern I had at some point with changing the effective
>>>> /proc/vmcore size after someone already opened it, and might assume the size
>>>> will stay unmodified (IOW, the file was completely static before vmcoredd
>>>> showed up).
>>>>
>>>> So unless there is a real use case that requires tracking whether the file
>>>> is no longer open, to support modifying the vmcore afterwards, we should
>>>> keep it simple.
>>>>
>>>> I am not aware of any such cases, and my experiments with virtio_mem showed
>>>> that the driver get loaded extremely early from the initrd, compared to when
>>>> we actually start messing with /proc/vmcore from user space.
> 
> It's OK, David, I don't have strong opinion about the current
> implementation. I raised this concern because
> 
> 1) I saw the original vmcoredd only warn when doing register if
> vmcore_opened is true;
> 
> 2) in patch 1, it says vmcore_mutex is introduced to protect vmcore
> modifications from concurrent opening. If we are confident, the old
> vmcoredd_mutex can guarantee it, I could be wrong here.

I don't see how. We're protecting the list, but not the 
vmcoredd_update_size(), which modifies sizes, offsets and all that.

> 
> Anyway, it's just a tiny concern, I believe it won't cause issue at
> present. So it's up to you.

I can keep allowing to add stuff after the vmcore was opened and closed 
again (but not while it is open). It doesn't make any sense IMHO, but it 
seems to be easy to add.

Thanks!

-- 
Cheers,

David / dhildenb