[PATCH] PCI: vmd: Create domain symlink before pci_bus_add_devices

Jiwei Sun posted 1 patch 1 year, 8 months ago
drivers/pci/controller/vmd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
[PATCH] PCI: vmd: Create domain symlink before pci_bus_add_devices
Posted by Jiwei Sun 1 year, 8 months ago
From: Jiwei Sun <sunjw10@lenovo.com>

During booting into the kernel, the following error message appears:

  (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device''
  (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.'
  (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1'
  (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.'
  (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1.

This symptom prevents the OS from booting successfully.

After a NVMe disk is probed/added by the nvme driver, the udevd executes
some rule scripts by invoking mdadm command to detect if there is a
mdraid associated with this NVMe disk. The mdadm determines if one
NVMe devce is connected to a particular VMD domain by checking the
domain symlink. Here is the root cause:

Thread A                   Thread B             Thread mdadm
vmd_enable_domain
  pci_bus_add_devices
    __driver_probe_device
     ...
     work_on_cpu
       schedule_work_on
       : wakeup Thread B
                           nvme_probe
                           : wakeup scan_work
                             to scan nvme disk
                             and add nvme disk
                             then wakeup udevd
                                                : udevd executes
                                                  mdadm command
       flush_work                               main
       : wait for nvme_probe done                ...
    __driver_probe_device                        find_driver_devices
    : probe next nvme device                     : 1) Detect the domain
    ...                                            symlink; 2) Find the
    ...                                            domain symlink from
    ...                                            vmd sysfs; 3) The
    ...                                            domain symlink is not
    ...                                            created yet, failed
  sysfs_create_link
  : create domain symlink

sysfs_create_link is invoked at the end of vmd_enable_domain. However,
this implementation introduces a timing issue, where mdadm might fail
to retrieve the vmd symlink path because the symlink has not been
created yet.

Fix the issue by creating VMD domain symlinks before invoking
pci_bus_add_devices.

Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
Suggested-by: Adrian Huang <ahuang12@lenovo.com>
---
 drivers/pci/controller/vmd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 87b7856f375a..3f208c5f9ec9 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -961,12 +961,12 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	list_for_each_entry(child, &vmd->bus->children, node)
 		pcie_bus_configure_settings(child);
 
+	WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
+			       "domain"), "Can't create symlink to domain\n");
+
 	pci_bus_add_devices(vmd->bus);
 
 	vmd_acpi_end();
-
-	WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
-			       "domain"), "Can't create symlink to domain\n");
 	return 0;
 }
 
-- 
2.27.0
Re: [PATCH] PCI: vmd: Create domain symlink before pci_bus_add_devices
Posted by Bjorn Helgaas 1 year, 8 months ago
On Mon, Jun 03, 2024 at 10:03:29PM +0800, Jiwei Sun wrote:
> From: Jiwei Sun <sunjw10@lenovo.com>
> 
> During booting into the kernel, the following error message appears:
> 
>   (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device''
>   (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.'
>   (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1'
>   (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.'
>   (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1.
> 
> This symptom prevents the OS from booting successfully.
> 
> After a NVMe disk is probed/added by the nvme driver, the udevd executes
> some rule scripts by invoking mdadm command to detect if there is a
> mdraid associated with this NVMe disk. The mdadm determines if one
> NVMe devce is connected to a particular VMD domain by checking the
       device

> domain symlink. Here is the root cause:
> 
> Thread A                   Thread B             Thread mdadm
> vmd_enable_domain
>   pci_bus_add_devices
>     __driver_probe_device
>      ...
>      work_on_cpu
>        schedule_work_on
>        : wakeup Thread B
>                            nvme_probe
>                            : wakeup scan_work
>                              to scan nvme disk
>                              and add nvme disk
>                              then wakeup udevd
>                                                 : udevd executes
>                                                   mdadm command
>        flush_work                               main
>        : wait for nvme_probe done                ...
>     __driver_probe_device                        find_driver_devices
>     : probe next nvme device                     : 1) Detect the domain
>     ...                                            symlink; 2) Find the
>     ...                                            domain symlink from
>     ...                                            vmd sysfs; 3) The
>     ...                                            domain symlink is not
>     ...                                            created yet, failed
>   sysfs_create_link
>   : create domain symlink
> 
> sysfs_create_link is invoked at the end of vmd_enable_domain. However,
> this implementation introduces a timing issue, where mdadm might fail
> to retrieve the vmd symlink path because the symlink has not been
> created yet.
> 
> Fix the issue by creating VMD domain symlinks before invoking
> pci_bus_add_devices.

Add "()" after function names in subject and commit log.

> Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
> Suggested-by: Adrian Huang <ahuang12@lenovo.com>
> ---
>  drivers/pci/controller/vmd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 87b7856f375a..3f208c5f9ec9 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -961,12 +961,12 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  	list_for_each_entry(child, &vmd->bus->children, node)
>  		pcie_bus_configure_settings(child);
>  
> +	WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
> +			       "domain"), "Can't create symlink to domain\n");

Seems OK to me.  IIUC, this "domain" link is created in the directory
of the VMD pci_dev, and it points to the directory of the new "root
bus" behind the VMD.

Since it's unrelated to the *devices* on that new root bus, I would
probably move this even earlier, so it's with the other code that sets
up the new root bus, e.g., somewhere around vmd_attach_resources().

>  	pci_bus_add_devices(vmd->bus);
>  
>  	vmd_acpi_end();
> -
> -	WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
> -			       "domain"), "Can't create symlink to domain\n");
>  	return 0;
>  }
>  
> -- 
> 2.27.0
>
Re: [PATCH] PCI: vmd: Create domain symlink before pci_bus_add_devices
Posted by Paul M Stillwell Jr 1 year, 8 months ago
On 6/3/2024 7:03 AM, Jiwei Sun wrote:
> From: Jiwei Sun <sunjw10@lenovo.com>
> 
> During booting into the kernel, the following error message appears:
> 
>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device''
>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.'
>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1'
>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.'
>    (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1.
> 
> This symptom prevents the OS from booting successfully.
> 

I'm just curious: has this been doing this forever or has this just 
started recently?

Paul

> After a NVMe disk is probed/added by the nvme driver, the udevd executes
> some rule scripts by invoking mdadm command to detect if there is a
> mdraid associated with this NVMe disk. The mdadm determines if one
> NVMe devce is connected to a particular VMD domain by checking the
> domain symlink. Here is the root cause:
> 
> Thread A                   Thread B             Thread mdadm
> vmd_enable_domain
>    pci_bus_add_devices
>      __driver_probe_device
>       ...
>       work_on_cpu
>         schedule_work_on
>         : wakeup Thread B
>                             nvme_probe
>                             : wakeup scan_work
>                               to scan nvme disk
>                               and add nvme disk
>                               then wakeup udevd
>                                                  : udevd executes
>                                                    mdadm command
>         flush_work                               main
>         : wait for nvme_probe done                ...
>      __driver_probe_device                        find_driver_devices
>      : probe next nvme device                     : 1) Detect the domain
>      ...                                            symlink; 2) Find the
>      ...                                            domain symlink from
>      ...                                            vmd sysfs; 3) The
>      ...                                            domain symlink is not
>      ...                                            created yet, failed
>    sysfs_create_link
>    : create domain symlink
> 
> sysfs_create_link is invoked at the end of vmd_enable_domain. However,
> this implementation introduces a timing issue, where mdadm might fail
> to retrieve the vmd symlink path because the symlink has not been
> created yet.
> 
> Fix the issue by creating VMD domain symlinks before invoking
> pci_bus_add_devices.
> 
> Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
> Suggested-by: Adrian Huang <ahuang12@lenovo.com>
> ---
>   drivers/pci/controller/vmd.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 87b7856f375a..3f208c5f9ec9 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -961,12 +961,12 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>   	list_for_each_entry(child, &vmd->bus->children, node)
>   		pcie_bus_configure_settings(child);
>   
> +	WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
> +			       "domain"), "Can't create symlink to domain\n");
> +
>   	pci_bus_add_devices(vmd->bus);
>   
>   	vmd_acpi_end();
> -
> -	WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
> -			       "domain"), "Can't create symlink to domain\n");
>   	return 0;
>   }
>
Re: [PATCH] PCI: vmd: Create domain symlink before pci_bus_add_devices
Posted by Jiwei Sun 1 year, 8 months ago

On 6/3/24 23:47, Paul M Stillwell Jr wrote:
> On 6/3/2024 7:03 AM, Jiwei Sun wrote:
>> From: Jiwei Sun <sunjw10@lenovo.com>
>>
>> During booting into the kernel, the following error message appears:
>>
>>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device''
>>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.'
>>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1'
>>    (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.'
>>    (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1.
>>
>> This symptom prevents the OS from booting successfully.
>>
> 
> I'm just curious: has this been doing this forever or has this just started recently?

Thanks for your reply. 

The issue was only reproduced in certain specific servers (VROC configuration
with RAID1 in two NVMe drives, 7mm NVME 2-bay rear RAID enablement kits),
and the VROC RAID1 disk was installed with SLES15.6 (kernel 6.4). According
to our test, the issue has been easily reproduced on this configured server
since kernel 6.2. 
And according to the journalctl log, we found that the systemd-udevd starts 
running earlier than NVMe device added, it exposes this timing issue.

Thanks,
Regards,
Jiwei

> 
> Paul
> 
>> After a NVMe disk is probed/added by the nvme driver, the udevd executes
>> some rule scripts by invoking mdadm command to detect if there is a
>> mdraid associated with this NVMe disk. The mdadm determines if one
>> NVMe devce is connected to a particular VMD domain by checking the
>> domain symlink. Here is the root cause:
>>
>> Thread A                   Thread B             Thread mdadm
>> vmd_enable_domain
>>    pci_bus_add_devices
>>      __driver_probe_device
>>       ...
>>       work_on_cpu
>>         schedule_work_on
>>         : wakeup Thread B
>>                             nvme_probe
>>                             : wakeup scan_work
>>                               to scan nvme disk
>>                               and add nvme disk
>>                               then wakeup udevd
>>                                                  : udevd executes
>>                                                    mdadm command
>>         flush_work                               main
>>         : wait for nvme_probe done                ...
>>      __driver_probe_device                        find_driver_devices
>>      : probe next nvme device                     : 1) Detect the domain
>>      ...                                            symlink; 2) Find the
>>      ...                                            domain symlink from
>>      ...                                            vmd sysfs; 3) The
>>      ...                                            domain symlink is not
>>      ...                                            created yet, failed
>>    sysfs_create_link
>>    : create domain symlink
>>
>> sysfs_create_link is invoked at the end of vmd_enable_domain. However,
>> this implementation introduces a timing issue, where mdadm might fail
>> to retrieve the vmd symlink path because the symlink has not been
>> created yet.
>>
>> Fix the issue by creating VMD domain symlinks before invoking
>> pci_bus_add_devices.
>>
>> Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
>> Suggested-by: Adrian Huang <ahuang12@lenovo.com>
>> ---
>>   drivers/pci/controller/vmd.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index 87b7856f375a..3f208c5f9ec9 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -961,12 +961,12 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>>       list_for_each_entry(child, &vmd->bus->children, node)
>>           pcie_bus_configure_settings(child);
>>   +    WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
>> +                   "domain"), "Can't create symlink to domain\n");
>> +
>>       pci_bus_add_devices(vmd->bus);
>>         vmd_acpi_end();
>> -
>> -    WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
>> -                   "domain"), "Can't create symlink to domain\n");
>>       return 0;
>>   }
>>