[PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI

Farhan Ali posted 6 patches 1 month, 3 weeks ago
There is a newer version of this series
[PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Farhan Ali 1 month, 3 weeks ago
For zPCI devices we should drive a platform specific function reset
as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
in error state.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 arch/s390/pci/pci.c              |  1 +
 drivers/vfio/pci/vfio_pci_core.c |  4 ++++
 drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
 drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index f795e05b5001..860a64993b58 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
 
 	return rc;
 }
+EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
 
 /**
  * zpci_create_device() - Create a new zpci_dev and add it to the zbus
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7dcf5439dedc..7220a22135a9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
 	 */
 	vfio_pci_set_power_state(vdev, PCI_D0);
 
+	ret = vfio_pci_zdev_reset(vdev);
+	if (ret && ret != -ENODEV)
+		return ret;
+
 	ret = pci_try_reset_function(vdev->pdev);
 	up_write(&vdev->memory_lock);
 
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index a9972eacb293..5288577b3170 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
 				struct vfio_info_cap *caps);
 int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
 void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
+int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
 #else
 static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
 					      struct vfio_info_cap *caps)
@@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
 
 static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
 {}
+int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
+{
+	return -ENODEV;
+}
 #endif
 
 static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 818235b28caa..dd1919ccb3be 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
 	return ret;
 }
 
+int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
+{
+	struct zpci_dev *zdev = to_zpci(vdev->pdev);
+	int rc = -EIO;
+
+	if (!zdev)
+		return -ENODEV;
+
+	/*
+	 * If we can't get the zdev->state_lock the device state is
+	 * currently undergoing a transition and we bail out - just
+	 * the same as if the device's state is not configured at all.
+	 */
+	if (!mutex_trylock(&zdev->state_lock))
+		return rc;
+
+	/* We can reset only if the function is configured */
+	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
+		goto out;
+
+	rc = zpci_hot_reset_device(zdev);
+	if (rc != 0)
+		goto out;
+
+	if (!vdev->pci_saved_state) {
+		pci_err(vdev->pdev, "No saved available for the device");
+		rc = -EIO;
+		goto out;
+	}
+
+	pci_dev_lock(vdev->pdev);
+	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
+	pci_restore_state(vdev->pdev);
+	pci_dev_unlock(vdev->pdev);
+out:
+	mutex_unlock(&zdev->state_lock);
+	return rc;
+}
+
 int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
 {
 	struct zpci_dev *zdev = to_zpci(vdev->pdev);
-- 
2.43.0
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by kernel test robot 1 month, 3 weeks ago
Hi Farhan,

kernel test robot noticed the following build errors:

[auto build test ERROR on awilliam-vfio/next]
[also build test ERROR on s390/features linus/master v6.17-rc1 next-20250814]
[cannot apply to kvms390/next awilliam-vfio/for-linus]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Farhan-Ali/s390-pci-Restore-airq-unconditionally-for-the-zPCI-device/20250814-012243
base:   https://github.com/awilliam/linux-vfio.git next
patch link:    https://lore.kernel.org/r/20250813170821.1115-6-alifm%40linux.ibm.com
patch subject: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
config: csky-randconfig-002-20250814 (https://download.01.org/0day-ci/archive/20250814/202508141518.Z82dHhVu-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250814/202508141518.Z82dHhVu-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508141518.Z82dHhVu-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/vfio/pci/vfio_pci_core.c:36:
>> drivers/vfio/pci/vfio_pci_priv.h:104:5: warning: no previous prototype for 'vfio_pci_zdev_reset' [-Wmissing-prototypes]
     104 | int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
         |     ^~~~~~~~~~~~~~~~~~~
--
   csky-linux-ld: drivers/vfio/pci/vfio_pci_intrs.o: in function `vfio_pci_zdev_reset':
>> drivers/vfio/pci/vfio_pci_priv.h:105: multiple definition of `vfio_pci_zdev_reset'; drivers/vfio/pci/vfio_pci_core.o:(.text+0x1a6c): first defined here
   csky-linux-ld: drivers/vfio/pci/vfio_pci_rdwr.o: in function `vfio_pci_zdev_reset':
>> drivers/vfio/pci/vfio_pci_priv.h:105: multiple definition of `vfio_pci_zdev_reset'; drivers/vfio/pci/vfio_pci_core.o:(.text+0x1a6c): first defined here
   csky-linux-ld: drivers/vfio/pci/vfio_pci_config.o: in function `vfio_pci_zdev_reset':
   (.text+0x1964): multiple definition of `vfio_pci_zdev_reset'; drivers/vfio/pci/vfio_pci_core.o:(.text+0x1a6c): first defined here


vim +105 drivers/vfio/pci/vfio_pci_priv.h

   101	
   102	static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
   103	{}
 > 104	int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
 > 105	{
   106		return -ENODEV;
   107	}
   108	#endif
   109	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by kernel test robot 1 month, 3 weeks ago
Hi Farhan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on awilliam-vfio/next]
[also build test WARNING on s390/features linus/master v6.17-rc1 next-20250813]
[cannot apply to kvms390/next awilliam-vfio/for-linus]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Farhan-Ali/s390-pci-Restore-airq-unconditionally-for-the-zPCI-device/20250814-012243
base:   https://github.com/awilliam/linux-vfio.git next
patch link:    https://lore.kernel.org/r/20250813170821.1115-6-alifm%40linux.ibm.com
patch subject: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
config: i386-randconfig-005-20250814 (https://download.01.org/0day-ci/archive/20250814/202508141314.OUnmuib7-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250814/202508141314.OUnmuib7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508141314.OUnmuib7-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/vfio/pci/vfio_pci_intrs.c:23:
>> drivers/vfio/pci/vfio_pci_priv.h:104:5: warning: no previous prototype for function 'vfio_pci_zdev_reset' [-Wmissing-prototypes]
     104 | int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
         |     ^
   drivers/vfio/pci/vfio_pci_priv.h:104:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     104 | int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
         | ^
         | static 
   1 warning generated.


vim +/vfio_pci_zdev_reset +104 drivers/vfio/pci/vfio_pci_priv.h

   101	
   102	static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
   103	{}
 > 104	int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
   105	{
   106		return -ENODEV;
   107	}
   108	#endif
   109	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Alex Williamson 1 month, 3 weeks ago
On Wed, 13 Aug 2025 10:08:19 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> For zPCI devices we should drive a platform specific function reset
> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> in error state.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  arch/s390/pci/pci.c              |  1 +
>  drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>  drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>  drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>  4 files changed, 49 insertions(+)
> 
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index f795e05b5001..860a64993b58 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
>  
>  	return rc;
>  }
> +EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
>  
>  /**
>   * zpci_create_device() - Create a new zpci_dev and add it to the zbus
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 7dcf5439dedc..7220a22135a9 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
>  	 */
>  	vfio_pci_set_power_state(vdev, PCI_D0);
>  
> +	ret = vfio_pci_zdev_reset(vdev);
> +	if (ret && ret != -ENODEV)
> +		return ret;
> +
>  	ret = pci_try_reset_function(vdev->pdev);
>  	up_write(&vdev->memory_lock);

You're going to be very unhappy if this lock isn't released.

>  
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index a9972eacb293..5288577b3170 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>  				struct vfio_info_cap *caps);
>  int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
>  void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
>  #else
>  static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>  					      struct vfio_info_cap *caps)
> @@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>  
>  static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
>  {}
> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> +{
> +	return -ENODEV;
> +}
>  #endif
>  
>  static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> index 818235b28caa..dd1919ccb3be 100644
> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
>  	return ret;
>  }
>  
> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> +{
> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> +	int rc = -EIO;
> +
> +	if (!zdev)
> +		return -ENODEV;
> +
> +	/*
> +	 * If we can't get the zdev->state_lock the device state is
> +	 * currently undergoing a transition and we bail out - just
> +	 * the same as if the device's state is not configured at all.
> +	 */
> +	if (!mutex_trylock(&zdev->state_lock))
> +		return rc;
> +
> +	/* We can reset only if the function is configured */
> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> +		goto out;
> +
> +	rc = zpci_hot_reset_device(zdev);
> +	if (rc != 0)
> +		goto out;
> +
> +	if (!vdev->pci_saved_state) {
> +		pci_err(vdev->pdev, "No saved available for the device");
> +		rc = -EIO;
> +		goto out;
> +	}
> +
> +	pci_dev_lock(vdev->pdev);
> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> +	pci_restore_state(vdev->pdev);
> +	pci_dev_unlock(vdev->pdev);
> +out:
> +	mutex_unlock(&zdev->state_lock);
> +	return rc;
> +}

This looks like it should be a device or arch specific reset
implemented in drivers/pci, not vfio.  Thanks,

Alex

> +
>  int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>  {
>  	struct zpci_dev *zdev = to_zpci(vdev->pdev);
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Farhan Ali 1 month, 3 weeks ago
On 8/13/2025 1:30 PM, Alex Williamson wrote:
> On Wed, 13 Aug 2025 10:08:19 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> For zPCI devices we should drive a platform specific function reset
>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>> in error state.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   arch/s390/pci/pci.c              |  1 +
>>   drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>   drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>   drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>   4 files changed, 49 insertions(+)
>>
>> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
>> index f795e05b5001..860a64993b58 100644
>> --- a/arch/s390/pci/pci.c
>> +++ b/arch/s390/pci/pci.c
>> @@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
>>   
>>   	return rc;
>>   }
>> +EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
>>   
>>   /**
>>    * zpci_create_device() - Create a new zpci_dev and add it to the zbus
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 7dcf5439dedc..7220a22135a9 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
>>   	 */
>>   	vfio_pci_set_power_state(vdev, PCI_D0);
>>   
>> +	ret = vfio_pci_zdev_reset(vdev);
>> +	if (ret && ret != -ENODEV)
>> +		return ret;
>> +
>>   	ret = pci_try_reset_function(vdev->pdev);
>>   	up_write(&vdev->memory_lock);
> You're going to be very unhappy if this lock isn't released.
>
Ah yes, thanks for catching that. Will fix this.

>
>>   
>> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
>> index a9972eacb293..5288577b3170 100644
>> --- a/drivers/vfio/pci/vfio_pci_priv.h
>> +++ b/drivers/vfio/pci/vfio_pci_priv.h
>> @@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>>   				struct vfio_info_cap *caps);
>>   int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
>>   void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
>>   #else
>>   static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>>   					      struct vfio_info_cap *caps)
>> @@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>>   
>>   static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
>>   {}
>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>> +{
>> +	return -ENODEV;
>> +}
>>   #endif
>>   
>>   static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
>> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
>> index 818235b28caa..dd1919ccb3be 100644
>> --- a/drivers/vfio/pci/vfio_pci_zdev.c
>> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
>> @@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
>>   	return ret;
>>   }
>>   
>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>> +{
>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>> +	int rc = -EIO;
>> +
>> +	if (!zdev)
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * If we can't get the zdev->state_lock the device state is
>> +	 * currently undergoing a transition and we bail out - just
>> +	 * the same as if the device's state is not configured at all.
>> +	 */
>> +	if (!mutex_trylock(&zdev->state_lock))
>> +		return rc;
>> +
>> +	/* We can reset only if the function is configured */
>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>> +		goto out;
>> +
>> +	rc = zpci_hot_reset_device(zdev);
>> +	if (rc != 0)
>> +		goto out;
>> +
>> +	if (!vdev->pci_saved_state) {
>> +		pci_err(vdev->pdev, "No saved available for the device");
>> +		rc = -EIO;
>> +		goto out;
>> +	}
>> +
>> +	pci_dev_lock(vdev->pdev);
>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>> +	pci_restore_state(vdev->pdev);
>> +	pci_dev_unlock(vdev->pdev);
>> +out:
>> +	mutex_unlock(&zdev->state_lock);
>> +	return rc;
>> +}
> This looks like it should be a device or arch specific reset
> implemented in drivers/pci, not vfio.  Thanks,
>
> Alex

Are you suggesting to move this to an arch specific function? One thing 
we need to do after the zpci_hot_reset_device, is to correctly restore 
the config space of the device. And for vfio-pci bound devices we want 
to restore the state of the device to when it was initially opened.

Thanks
Farhan


>
>> +
>>   int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>>   {
>>   	struct zpci_dev *zdev = to_zpci(vdev->pdev);
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Alex Williamson 1 month, 3 weeks ago
On Wed, 13 Aug 2025 14:52:24 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 8/13/2025 1:30 PM, Alex Williamson wrote:
> > On Wed, 13 Aug 2025 10:08:19 -0700
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >  
> >> For zPCI devices we should drive a platform specific function reset
> >> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> >> in error state.
> >>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >>   arch/s390/pci/pci.c              |  1 +
> >>   drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> >>   drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> >>   drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> >>   4 files changed, 49 insertions(+)
> >>
> >> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> >> index f795e05b5001..860a64993b58 100644
> >> --- a/arch/s390/pci/pci.c
> >> +++ b/arch/s390/pci/pci.c
> >> @@ -788,6 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
> >>   
> >>   	return rc;
> >>   }
> >> +EXPORT_SYMBOL_GPL(zpci_hot_reset_device);
> >>   
> >>   /**
> >>    * zpci_create_device() - Create a new zpci_dev and add it to the zbus
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >> index 7dcf5439dedc..7220a22135a9 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -1227,6 +1227,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> >>   	 */
> >>   	vfio_pci_set_power_state(vdev, PCI_D0);
> >>   
> >> +	ret = vfio_pci_zdev_reset(vdev);
> >> +	if (ret && ret != -ENODEV)
> >> +		return ret;
> >> +
> >>   	ret = pci_try_reset_function(vdev->pdev);
> >>   	up_write(&vdev->memory_lock);  
> > You're going to be very unhappy if this lock isn't released.
> >  
> Ah yes, thanks for catching that. Will fix this.
> 
> >  
> >>   
> >> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> >> index a9972eacb293..5288577b3170 100644
> >> --- a/drivers/vfio/pci/vfio_pci_priv.h
> >> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> >> @@ -86,6 +86,7 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> >>   				struct vfio_info_cap *caps);
> >>   int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev);
> >>   void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev);
> >> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev);
> >>   #else
> >>   static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
> >>   					      struct vfio_info_cap *caps)
> >> @@ -100,6 +101,10 @@ static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
> >>   
> >>   static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
> >>   {}
> >> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> >> +{
> >> +	return -ENODEV;
> >> +}
> >>   #endif
> >>   
> >>   static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
> >> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> >> index 818235b28caa..dd1919ccb3be 100644
> >> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> >> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> >> @@ -212,6 +212,45 @@ static int vfio_pci_zdev_setup_err_region(struct vfio_pci_core_device *vdev)
> >>   	return ret;
> >>   }
> >>   
> >> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> >> +{
> >> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> >> +	int rc = -EIO;
> >> +
> >> +	if (!zdev)
> >> +		return -ENODEV;
> >> +
> >> +	/*
> >> +	 * If we can't get the zdev->state_lock the device state is
> >> +	 * currently undergoing a transition and we bail out - just
> >> +	 * the same as if the device's state is not configured at all.
> >> +	 */
> >> +	if (!mutex_trylock(&zdev->state_lock))
> >> +		return rc;
> >> +
> >> +	/* We can reset only if the function is configured */
> >> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> >> +		goto out;
> >> +
> >> +	rc = zpci_hot_reset_device(zdev);
> >> +	if (rc != 0)
> >> +		goto out;
> >> +
> >> +	if (!vdev->pci_saved_state) {
> >> +		pci_err(vdev->pdev, "No saved available for the device");
> >> +		rc = -EIO;
> >> +		goto out;
> >> +	}
> >> +
> >> +	pci_dev_lock(vdev->pdev);
> >> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> >> +	pci_restore_state(vdev->pdev);
> >> +	pci_dev_unlock(vdev->pdev);
> >> +out:
> >> +	mutex_unlock(&zdev->state_lock);
> >> +	return rc;
> >> +}  
> > This looks like it should be a device or arch specific reset
> > implemented in drivers/pci, not vfio.  Thanks,
> >
> > Alex  
> 
> Are you suggesting to move this to an arch specific function? One thing 
> we need to do after the zpci_hot_reset_device, is to correctly restore 
> the config space of the device. And for vfio-pci bound devices we want 
> to restore the state of the device to when it was initially opened.

We generally rely on the abstraction of pci_reset_function() to select
the correct type of reset for a function scope reset.  We've gone to
quite a bit of effort to implement all device specific resets and
quirks in the PCI core to be re-used across the kernel.

Calling zpci_hot_reset_device() directly seems contradictory to those
efforts.  Should pci_reset_function() call this universally on s390x
rather than providing access to FLR/PM/SBR reset?  Why is it
universally correct here given the ioctl previously made use of
standard reset mechanisms?

The DEVICE_RESET ioctl is simply an in-place reset of the device,
without restoring the original device state.  So we're also subtly
changing that behavior here, presumably because we're targeting the
specific error recovery case.  Have you considered how this might
break non-error-recovery use cases?

I wonder if we want a different reset mechanism for this use case
rather than these subtle semantic changes.  Thanks,

Alex
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Niklas Schnelle 1 month, 3 weeks ago
On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
> On Wed, 13 Aug 2025 14:52:24 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > On 8/13/2025 1:30 PM, Alex Williamson wrote:
> > > On Wed, 13 Aug 2025 10:08:19 -0700
> > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > >  
> > > > For zPCI devices we should drive a platform specific function reset
> > > > as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> > > > in error state.
> > > > 
> > > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > > ---
> > > >   arch/s390/pci/pci.c              |  1 +
> > > >   drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> > > >   drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> > > >   drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> > > >   4 files changed, 49 insertions(+)
--- snip ---
> > > >   
> > > > +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> > > > +{
> > > > +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> > > > +	int rc = -EIO;
> > > > +
> > > > +	if (!zdev)
> > > > +		return -ENODEV;
> > > > +
> > > > +	/*
> > > > +	 * If we can't get the zdev->state_lock the device state is
> > > > +	 * currently undergoing a transition and we bail out - just
> > > > +	 * the same as if the device's state is not configured at all.
> > > > +	 */
> > > > +	if (!mutex_trylock(&zdev->state_lock))
> > > > +		return rc;
> > > > +
> > > > +	/* We can reset only if the function is configured */
> > > > +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> > > > +		goto out;
> > > > +
> > > > +	rc = zpci_hot_reset_device(zdev);
> > > > +	if (rc != 0)
> > > > +		goto out;
> > > > +
> > > > +	if (!vdev->pci_saved_state) {
> > > > +		pci_err(vdev->pdev, "No saved available for the device");
> > > > +		rc = -EIO;
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	pci_dev_lock(vdev->pdev);
> > > > +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> > > > +	pci_restore_state(vdev->pdev);
> > > > +	pci_dev_unlock(vdev->pdev);
> > > > +out:
> > > > +	mutex_unlock(&zdev->state_lock);
> > > > +	return rc;
> > > > +}  
> > > This looks like it should be a device or arch specific reset
> > > implemented in drivers/pci, not vfio.  Thanks,
> > > 
> > > Alex  
> > 
> > Are you suggesting to move this to an arch specific function? One thing 
> > we need to do after the zpci_hot_reset_device, is to correctly restore 
> > the config space of the device. And for vfio-pci bound devices we want 
> > to restore the state of the device to when it was initially opened.
> 
> We generally rely on the abstraction of pci_reset_function() to select
> the correct type of reset for a function scope reset.  We've gone to
> quite a bit of effort to implement all device specific resets and
> quirks in the PCI core to be re-used across the kernel.
> 
> Calling zpci_hot_reset_device() directly seems contradictory to those
> efforts.  Should pci_reset_function() call this universally on s390x
> rather than providing access to FLR/PM/SBR reset? 
> 

I agree with you Alex. Still trying to figure out what's needed for
this. We already do zpci_hot_reset_device() in reset_slot() from the
s390_pci_hpc.c hotplug slot driver and that does get called via
pci_reset_hotplug_slot() and pci_reset_function(). There are a few
problems though that meant it didn't work for Farhan but I agree maybe
we can fix them for the general case. For one pci_reset_function()
via DEVICE_RESET first tries FLR but that won't work with the device in
the error state and MMIO blocked. Sadly __pci_reset_function_locked()
then concludes that other resets also won't work. So that's something
we might want to improve in general, for example maybe we need
something more like pci_dev_acpi_reset() with higher priority than FLR.

Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
sure why that won't work as is. @Farhan do you know?

>  Why is it
> universally correct here given the ioctl previously made use of
> standard reset mechanisms?
> 
> The DEVICE_RESET ioctl is simply an in-place reset of the device,
> without restoring the original device state.  So we're also subtly
> changing that behavior here, presumably because we're targeting the
> specific error recovery case.  Have you considered how this might
> break non-error-recovery use cases?
> 
> I wonder if we want a different reset mechanism for this use case
> rather than these subtle semantic changes. 

I think an alternative to that, which Farhan actually had in the
previous internal version, is to implement
pci_error_handlers::reset_done() and do the pci_load_saved_state()
there. That would only affect the error recovery case leaving other
cases alone.

Thanks,
Niklas
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Farhan Ali 1 month, 3 weeks ago
On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
>> On Wed, 13 Aug 2025 14:52:24 -0700
>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>
>>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
>>>> On Wed, 13 Aug 2025 10:08:19 -0700
>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>   
>>>>> For zPCI devices we should drive a platform specific function reset
>>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>>>>> in error state.
>>>>>
>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>> ---
>>>>>    arch/s390/pci/pci.c              |  1 +
>>>>>    drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>>>>    drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>>>>    drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>>>>    4 files changed, 49 insertions(+)
> --- snip ---
>>>>>    
>>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>>>>> +{
>>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>>>> +	int rc = -EIO;
>>>>> +
>>>>> +	if (!zdev)
>>>>> +		return -ENODEV;
>>>>> +
>>>>> +	/*
>>>>> +	 * If we can't get the zdev->state_lock the device state is
>>>>> +	 * currently undergoing a transition and we bail out - just
>>>>> +	 * the same as if the device's state is not configured at all.
>>>>> +	 */
>>>>> +	if (!mutex_trylock(&zdev->state_lock))
>>>>> +		return rc;
>>>>> +
>>>>> +	/* We can reset only if the function is configured */
>>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>>>>> +		goto out;
>>>>> +
>>>>> +	rc = zpci_hot_reset_device(zdev);
>>>>> +	if (rc != 0)
>>>>> +		goto out;
>>>>> +
>>>>> +	if (!vdev->pci_saved_state) {
>>>>> +		pci_err(vdev->pdev, "No saved available for the device");
>>>>> +		rc = -EIO;
>>>>> +		goto out;
>>>>> +	}
>>>>> +
>>>>> +	pci_dev_lock(vdev->pdev);
>>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>>>>> +	pci_restore_state(vdev->pdev);
>>>>> +	pci_dev_unlock(vdev->pdev);
>>>>> +out:
>>>>> +	mutex_unlock(&zdev->state_lock);
>>>>> +	return rc;
>>>>> +}
>>>> This looks like it should be a device or arch specific reset
>>>> implemented in drivers/pci, not vfio.  Thanks,
>>>>
>>>> Alex
>>> Are you suggesting to move this to an arch specific function? One thing
>>> we need to do after the zpci_hot_reset_device, is to correctly restore
>>> the config space of the device. And for vfio-pci bound devices we want
>>> to restore the state of the device to when it was initially opened.
>> We generally rely on the abstraction of pci_reset_function() to select
>> the correct type of reset for a function scope reset.  We've gone to
>> quite a bit of effort to implement all device specific resets and
>> quirks in the PCI core to be re-used across the kernel.
>>
>> Calling zpci_hot_reset_device() directly seems contradictory to those
>> efforts.  Should pci_reset_function() call this universally on s390x
>> rather than providing access to FLR/PM/SBR reset?
>>
> I agree with you Alex. Still trying to figure out what's needed for
> this. We already do zpci_hot_reset_device() in reset_slot() from the
> s390_pci_hpc.c hotplug slot driver and that does get called via
> pci_reset_hotplug_slot() and pci_reset_function(). There are a few
> problems though that meant it didn't work for Farhan but I agree maybe
> we can fix them for the general case. For one pci_reset_function()
> via DEVICE_RESET first tries FLR but that won't work with the device in
> the error state and MMIO blocked. Sadly __pci_reset_function_locked()
> then concludes that other resets also won't work. So that's something
> we might want to improve in general, for example maybe we need
> something more like pci_dev_acpi_reset() with higher priority than FLR.

Yeah I did think of adding something like s390x CLP reset as part of the 
reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But 
that would introduce s390x specific code in pci core common code.

>
> Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> sure why that won't work as is. @Farhan do you know?

VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for 
majority of PCI devices on s390x as that would drive a bus reset. It was 
sufficient as most devices were single bus devices. But in the latest 
generation of machines (z17) we expose true SR-IOV and an OS can have 
access to both PF and VFs and so these are on the same bus and can have 
different ownership based on what is bound to vfio-pci.

My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per 
function reset mechanism, which maps well with what our architecture 
provides. On s390x we can drive a per function reset (via firmware) 
through the CLP instruction driven by the zpci_hot_reset_device(). And 
doing it as vfio zpci specific function would confine the s390x logic.

>>   Why is it
>> universally correct here given the ioctl previously made use of
>> standard reset mechanisms?
>>
>> The DEVICE_RESET ioctl is simply an in-place reset of the device,
>> without restoring the original device state.  So we're also subtly
>> changing that behavior here, presumably because we're targeting the
>> specific error recovery case.  Have you considered how this might
>> break non-error-recovery use cases?
>>
>> I wonder if we want a different reset mechanism for this use case
>> rather than these subtle semantic changes.
> I think an alternative to that, which Farhan actually had in the
> previous internal version, is to implement
> pci_error_handlers::reset_done() and do the pci_load_saved_state()
> there. That would only affect the error recovery case leaving other
> cases alone.
>
>
> Thanks,
> Niklas

The reason I abandoned reset_done() callback idea is because its not 
sufficient to recover the device correctly. Today before driving a reset 
we save the state of the device. When a device is in error state, any 
pci load/store (on s390x they are actual instructions :)) to config 
space would return an error value (0xffffffff). We don't have any checks 
in pci_save_state to prevent storing error values. And after a reset 
when we try to restore the config space (pci_dev_restore) we try to 
write the error value and this can be problematic. By the time the 
reset_done() callback is invoked, its already too late.

@Alex,
I am open to ideas/suggestions on this. Do we think we need a separate 
VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?

Thanks
Farhan
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Alex Williamson 1 month, 2 weeks ago
On Thu, 14 Aug 2025 09:33:47 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> > On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:  
> >> On Wed, 13 Aug 2025 14:52:24 -0700
> >> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>  
> >>> On 8/13/2025 1:30 PM, Alex Williamson wrote:  
> >>>> On Wed, 13 Aug 2025 10:08:19 -0700
> >>>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>>     
> >>>>> For zPCI devices we should drive a platform specific function reset
> >>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> >>>>> in error state.
> >>>>>
> >>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>>> ---
> >>>>>    arch/s390/pci/pci.c              |  1 +
> >>>>>    drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> >>>>>    drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> >>>>>    drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> >>>>>    4 files changed, 49 insertions(+)  
> > --- snip ---  
> >>>>>    
> >>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> >>>>> +{
> >>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> >>>>> +	int rc = -EIO;
> >>>>> +
> >>>>> +	if (!zdev)
> >>>>> +		return -ENODEV;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * If we can't get the zdev->state_lock the device state is
> >>>>> +	 * currently undergoing a transition and we bail out - just
> >>>>> +	 * the same as if the device's state is not configured at all.
> >>>>> +	 */
> >>>>> +	if (!mutex_trylock(&zdev->state_lock))
> >>>>> +		return rc;
> >>>>> +
> >>>>> +	/* We can reset only if the function is configured */
> >>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	rc = zpci_hot_reset_device(zdev);
> >>>>> +	if (rc != 0)
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	if (!vdev->pci_saved_state) {
> >>>>> +		pci_err(vdev->pdev, "No saved available for the device");
> >>>>> +		rc = -EIO;
> >>>>> +		goto out;
> >>>>> +	}
> >>>>> +
> >>>>> +	pci_dev_lock(vdev->pdev);
> >>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> >>>>> +	pci_restore_state(vdev->pdev);
> >>>>> +	pci_dev_unlock(vdev->pdev);
> >>>>> +out:
> >>>>> +	mutex_unlock(&zdev->state_lock);
> >>>>> +	return rc;
> >>>>> +}  
> >>>> This looks like it should be a device or arch specific reset
> >>>> implemented in drivers/pci, not vfio.  Thanks,
> >>>>
> >>>> Alex  
> >>> Are you suggesting to move this to an arch specific function? One thing
> >>> we need to do after the zpci_hot_reset_device, is to correctly restore
> >>> the config space of the device. And for vfio-pci bound devices we want
> >>> to restore the state of the device to when it was initially opened.  
> >> We generally rely on the abstraction of pci_reset_function() to select
> >> the correct type of reset for a function scope reset.  We've gone to
> >> quite a bit of effort to implement all device specific resets and
> >> quirks in the PCI core to be re-used across the kernel.
> >>
> >> Calling zpci_hot_reset_device() directly seems contradictory to those
> >> efforts.  Should pci_reset_function() call this universally on s390x
> >> rather than providing access to FLR/PM/SBR reset?
> >>  
> > I agree with you Alex. Still trying to figure out what's needed for
> > this. We already do zpci_hot_reset_device() in reset_slot() from the
> > s390_pci_hpc.c hotplug slot driver and that does get called via
> > pci_reset_hotplug_slot() and pci_reset_function(). There are a few
> > problems though that meant it didn't work for Farhan but I agree maybe
> > we can fix them for the general case. For one pci_reset_function()
> > via DEVICE_RESET first tries FLR but that won't work with the device in
> > the error state and MMIO blocked. Sadly __pci_reset_function_locked()
> > then concludes that other resets also won't work. So that's something
> > we might want to improve in general, for example maybe we need
> > something more like pci_dev_acpi_reset() with higher priority than FLR.  
> 
> Yeah I did think of adding something like s390x CLP reset as part of the 
> reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But 
> that would introduce s390x specific code in pci core common code.
> 
> >
> > Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> > sure why that won't work as is. @Farhan do you know?  
> 
> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for 
> majority of PCI devices on s390x as that would drive a bus reset. It was 
> sufficient as most devices were single bus devices. But in the latest 
> generation of machines (z17) we expose true SR-IOV and an OS can have 
> access to both PF and VFs and so these are on the same bus and can have 
> different ownership based on what is bound to vfio-pci.
> 
> My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per 
> function reset mechanism, which maps well with what our architecture 
> provides. On s390x we can drive a per function reset (via firmware) 
> through the CLP instruction driven by the zpci_hot_reset_device(). And 
> doing it as vfio zpci specific function would confine the s390x logic.
> 
> >>   Why is it
> >> universally correct here given the ioctl previously made use of
> >> standard reset mechanisms?
> >>
> >> The DEVICE_RESET ioctl is simply an in-place reset of the device,
> >> without restoring the original device state.  So we're also subtly
> >> changing that behavior here, presumably because we're targeting the
> >> specific error recovery case.  Have you considered how this might
> >> break non-error-recovery use cases?
> >>
> >> I wonder if we want a different reset mechanism for this use case
> >> rather than these subtle semantic changes.  
> > I think an alternative to that, which Farhan actually had in the
> > previous internal version, is to implement
> > pci_error_handlers::reset_done() and do the pci_load_saved_state()
> > there. That would only affect the error recovery case leaving other
> > cases alone.
> >
> >
> > Thanks,
> > Niklas  
> 
> The reason I abandoned reset_done() callback idea is because its not 
> sufficient to recover the device correctly. Today before driving a reset 
> we save the state of the device. When a device is in error state, any 
> pci load/store (on s390x they are actual instructions :)) to config 
> space would return an error value (0xffffffff). We don't have any checks 
> in pci_save_state to prevent storing error values. And after a reset 
> when we try to restore the config space (pci_dev_restore) we try to 
> write the error value and this can be problematic. By the time the 
> reset_done() callback is invoked, its already too late.

It's too late because we've re-written the error value back to config
space and as a result the device is broken?  What if
pci_restore_state() were a little smarter to detect that it has bad
read data from pci_save_state() and only restores state based on kernel
data?  Would that leave the device in a functional state that
reset_done() could restore the original saved state and push it out to
the device?

> @Alex,
> I am open to ideas/suggestions on this. Do we think we need a separate 
> VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?

Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
ioctl that doesn't have any flags, so it's not very extensible.

Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
return -ENOTTY if the device is in an error state and config space is
returning -1 such that we fall through to a slot reset that doesn't
care how broken the device is and you auto-magically get the zpci
function you want?  Follow-up with pushing the original state in
reset_done()?  Thanks,

Alex
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Farhan Ali 1 month, 2 weeks ago
On 8/14/2025 1:57 PM, Alex Williamson wrote:
> On Thu, 14 Aug 2025 09:33:47 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
>>> On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
>>>> On Wed, 13 Aug 2025 14:52:24 -0700
>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>   
>>>>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
>>>>>> On Wed, 13 Aug 2025 10:08:19 -0700
>>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>>      
>>>>>>> For zPCI devices we should drive a platform specific function reset
>>>>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>>>>>>> in error state.
>>>>>>>
>>>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>>>> ---
>>>>>>>     arch/s390/pci/pci.c              |  1 +
>>>>>>>     drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>>>>>>     drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>>>>>>     drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>>>>>>     4 files changed, 49 insertions(+)
>>> --- snip ---
>>>>>>>     
>>>>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>>>>>>> +{
>>>>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>>>>>> +	int rc = -EIO;
>>>>>>> +
>>>>>>> +	if (!zdev)
>>>>>>> +		return -ENODEV;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * If we can't get the zdev->state_lock the device state is
>>>>>>> +	 * currently undergoing a transition and we bail out - just
>>>>>>> +	 * the same as if the device's state is not configured at all.
>>>>>>> +	 */
>>>>>>> +	if (!mutex_trylock(&zdev->state_lock))
>>>>>>> +		return rc;
>>>>>>> +
>>>>>>> +	/* We can reset only if the function is configured */
>>>>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	rc = zpci_hot_reset_device(zdev);
>>>>>>> +	if (rc != 0)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	if (!vdev->pci_saved_state) {
>>>>>>> +		pci_err(vdev->pdev, "No saved available for the device");
>>>>>>> +		rc = -EIO;
>>>>>>> +		goto out;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	pci_dev_lock(vdev->pdev);
>>>>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>>>>>>> +	pci_restore_state(vdev->pdev);
>>>>>>> +	pci_dev_unlock(vdev->pdev);
>>>>>>> +out:
>>>>>>> +	mutex_unlock(&zdev->state_lock);
>>>>>>> +	return rc;
>>>>>>> +}
>>>>>> This looks like it should be a device or arch specific reset
>>>>>> implemented in drivers/pci, not vfio.  Thanks,
>>>>>>
>>>>>> Alex
>>>>> Are you suggesting to move this to an arch specific function? One thing
>>>>> we need to do after the zpci_hot_reset_device, is to correctly restore
>>>>> the config space of the device. And for vfio-pci bound devices we want
>>>>> to restore the state of the device to when it was initially opened.
>>>> We generally rely on the abstraction of pci_reset_function() to select
>>>> the correct type of reset for a function scope reset.  We've gone to
>>>> quite a bit of effort to implement all device specific resets and
>>>> quirks in the PCI core to be re-used across the kernel.
>>>>
>>>> Calling zpci_hot_reset_device() directly seems contradictory to those
>>>> efforts.  Should pci_reset_function() call this universally on s390x
>>>> rather than providing access to FLR/PM/SBR reset?
>>>>   
>>> I agree with you Alex. Still trying to figure out what's needed for
>>> this. We already do zpci_hot_reset_device() in reset_slot() from the
>>> s390_pci_hpc.c hotplug slot driver and that does get called via
>>> pci_reset_hotplug_slot() and pci_reset_function(). There are a few
>>> problems though that meant it didn't work for Farhan but I agree maybe
>>> we can fix them for the general case. For one pci_reset_function()
>>> via DEVICE_RESET first tries FLR but that won't work with the device in
>>> the error state and MMIO blocked. Sadly __pci_reset_function_locked()
>>> then concludes that other resets also won't work. So that's something
>>> we might want to improve in general, for example maybe we need
>>> something more like pci_dev_acpi_reset() with higher priority than FLR.
>> Yeah I did think of adding something like s390x CLP reset as part of the
>> reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But
>> that would introduce s390x specific code in pci core common code.
>>
>>> Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
>>> sure why that won't work as is. @Farhan do you know?
>> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for
>> majority of PCI devices on s390x as that would drive a bus reset. It was
>> sufficient as most devices were single bus devices. But in the latest
>> generation of machines (z17) we expose true SR-IOV and an OS can have
>> access to both PF and VFs and so these are on the same bus and can have
>> different ownership based on what is bound to vfio-pci.
>>
>> My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per
>> function reset mechanism, which maps well with what our architecture
>> provides. On s390x we can drive a per function reset (via firmware)
>> through the CLP instruction driven by the zpci_hot_reset_device(). And
>> doing it as vfio zpci specific function would confine the s390x logic.
>>
>>>>    Why is it
>>>> universally correct here given the ioctl previously made use of
>>>> standard reset mechanisms?
>>>>
>>>> The DEVICE_RESET ioctl is simply an in-place reset of the device,
>>>> without restoring the original device state.  So we're also subtly
>>>> changing that behavior here, presumably because we're targeting the
>>>> specific error recovery case.  Have you considered how this might
>>>> break non-error-recovery use cases?
>>>>
>>>> I wonder if we want a different reset mechanism for this use case
>>>> rather than these subtle semantic changes.
>>> I think an alternative to that, which Farhan actually had in the
>>> previous internal version, is to implement
>>> pci_error_handlers::reset_done() and do the pci_load_saved_state()
>>> there. That would only affect the error recovery case leaving other
>>> cases alone.
>>>
>>>
>>> Thanks,
>>> Niklas
>> The reason I abandoned reset_done() callback idea is because its not
>> sufficient to recover the device correctly. Today before driving a reset
>> we save the state of the device. When a device is in error state, any
>> pci load/store (on s390x they are actual instructions :)) to config
>> space would return an error value (0xffffffff). We don't have any checks
>> in pci_save_state to prevent storing error values. And after a reset
>> when we try to restore the config space (pci_dev_restore) we try to
>> write the error value and this can be problematic. By the time the
>> reset_done() callback is invoked, its already too late.
> It's too late because we've re-written the error value back to config
> space and as a result the device is broken?
>
>
Yes, exactly.

>   What if
> pci_restore_state() were a little smarter to detect that it has bad
> read data from pci_save_state() and only restores state based on kernel
> data?  Would that leave the device in a functional state that
> reset_done() could restore the original saved state and push it out to
> the device?

Yeah I think this could work. I can try something like this.

>> @Alex,
>> I am open to ideas/suggestions on this. Do we think we need a separate
>> VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?
> Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
> ioctl that doesn't have any flags, so it's not very extensible.
>
> Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
> return -ENOTTY if the device is in an error state and config space is
> returning -1 such that we fall through to a slot reset that doesn't
> care how broken the device is and you auto-magically get the zpci
> function you want?  Follow-up with pushing the original state in
> reset_done()?  Thanks,
>
> Alex
>
Yeah I can do that. I think adding some validation checks to the FLR/PM 
callbacks wouldn't be a bad idea if its acceptable for PCI maintainers.

If you are okay with a reset_done() callback, I will try to incorporate 
your suggestions and spin a new series.

Thanks
Farhan
Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Posted by Niklas Schnelle 1 month, 2 weeks ago
On Thu, 2025-08-14 at 09:33 -0700, Farhan Ali wrote:
> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> > On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
> > > On Wed, 13 Aug 2025 14:52:24 -0700
> > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > > 
> > > > On 8/13/2025 1:30 PM, Alex Williamson wrote:
> > > > > On Wed, 13 Aug 2025 10:08:19 -0700
> > > > > Farhan Ali <alifm@linux.ibm.com> wrote:
> > > > >   
> > > > > > For zPCI devices we should drive a platform specific function reset
> > > > > > as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> > > > > > in error state.
> > > > > > 
> > > > > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > > > > ---
> > > > > >    arch/s390/pci/pci.c              |  1 +
> > > > > >    drivers/vfio/pci/vfio_pci_core.c |  4 ++++
> > > > > >    drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
> > > > > >    drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> > > > > >    4 files changed, 49 insertions(+)
> 
--- snip ---
> > Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> > sure why that won't work as is. @Farhan do you know?
> 
> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for 
> majority of PCI devices on s390x as that would drive a bus reset. It was 
> sufficient as most devices were single bus devices. But in the latest 
> generation of machines (z17) we expose true SR-IOV and an OS can have 
> access to both PF and VFs and so these are on the same bus and can have 
> different ownership based on what is bound to vfio-pci.
> 

Talked to Farhan a bit off list. I think the problem boils down to
this. The s390 PCI support, due to there always being a hypervisor,
does hot and cold plug on a per PCI function basis. And so the hotplug
driver's reset_slot() is just a wrapper around zpci_hot_reset_device()
and also does per PCI function reset. 

Now when doing a VFIO_PCI_HOT_RESET this still doesn't give us a usable
per function reset because vfio_pci_ioctl_get_pci_hot_reset_info()'s
grouping assumes, quite naturally, that as a slot reset this will reset
an entire slot and so the ownership checks fail when we use this reset
on e.g. an SR-IOV capable device with PFs + VFs where s390 exposes
their PCI topology even while retaining per function hotplug. 

As some more background, we've had the virtual PCI topology for SR-IOV
capable devices since commit 44510d6fa0c0 ("s390/pci: Handling
multifunctions") added in v6.9 but only with z17 will they be generally
available.

Thanks,
Niklas