Update documentation describing sysfs node that could help to
configure isolation strategy for users in the user space. And
describing sysfs node that could read the device isolated state.
Signed-off-by: Kai Ye <yekai13@huawei.com>
---
Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
index 08f2591138af..50737c897ba3 100644
--- a/Documentation/ABI/testing/sysfs-driver-uacce
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -19,6 +19,33 @@ Contact: linux-accelerators@lists.ozlabs.org
Description: Available instances left of the device
Return -ENODEV if uacce_ops get_available_instances is not provided
+What: /sys/class/uacce/<dev_name>/isolate_strategy
+Date: Oct 2022
+KernelVersion: 6.1
+Contact: linux-accelerators@lists.ozlabs.org
+Description: (RW) Configure the frequency size for the hardware error
+ isolation strategy. This unit is the number of times. Number
+ of occurrences in a period, also means threshold. If the number
+ of device pci AER error exceeds the threshold in a time window,
+ the device is isolated. This size is a configured integer value.
+ The default is 0. The maximum value is 65535.
+
+ In the hisilicon accelerator engine, first we will
+ time-stamp every slot AER error. Then check the AER error log
+ when the device AER error occurred. if the device slot AER error
+ count exceeds the preset the number of times in one hour, the
+ isolated state will be set to true. So the device will be
+ isolated. And the AER error log that exceed one hour will be
+ cleared.
+
+What: /sys/class/uacce/<dev_name>/isolate
+Date: Oct 2022
+KernelVersion: 6.1
+Contact: linux-accelerators@lists.ozlabs.org
+Description: (R) A sysfs node that read the device isolated state. The value 1
+ means the device is unavailable. The 0 means the device is
+ available.
+
What: /sys/class/uacce/<dev_name>/algorithms
Date: Feb 2020
KernelVersion: 5.7
--
2.17.1
On Tue, Oct 25, 2022 at 12:39:30PM +0000, Kai Ye wrote: > Update documentation describing sysfs node that could help to > configure isolation strategy for users in the user space. And > describing sysfs node that could read the device isolated state. > > Signed-off-by: Kai Ye <yekai13@huawei.com> > --- > Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce > index 08f2591138af..50737c897ba3 100644 > --- a/Documentation/ABI/testing/sysfs-driver-uacce > +++ b/Documentation/ABI/testing/sysfs-driver-uacce > @@ -19,6 +19,33 @@ Contact: linux-accelerators@lists.ozlabs.org > Description: Available instances left of the device > Return -ENODEV if uacce_ops get_available_instances is not provided > > +What: /sys/class/uacce/<dev_name>/isolate_strategy > +Date: Oct 2022 > +KernelVersion: 6.1 > +Contact: linux-accelerators@lists.ozlabs.org > +Description: (RW) Configure the frequency size for the hardware error > + isolation strategy. This unit is the number of times. Number Number of times what? > + of occurrences in a period, also means threshold. If the number > + of device pci AER error exceeds the threshold in a time window, What is the time window? > + the device is isolated. This size is a configured integer value. > + The default is 0. The maximum value is 65535. > + > + In the hisilicon accelerator engine, first we will > + time-stamp every slot AER error. Then check the AER error log > + when the device AER error occurred. if the device slot AER error > + count exceeds the preset the number of times in one hour, the > + isolated state will be set to true. So the device will be > + isolated. And the AER error log that exceed one hour will be > + cleared. This seems like a very hardware-specific implementation here. And this is supposed to be a generic class? I feel this is getting really messy :( thanks, greg k-h
© 2016 - 2026 Red Hat, Inc.