iommu_hwpt_pgfaults represent fault messages that the userspace can
retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
iommu_hwpt_pgfault.
An iommu_hwpt_page_response is a response message that the userspace
should send to the kernel after finishing handling a group of fault
messages. The @dev_id, @pasid, and @grpid fields in the message
identify an outstanding iopf group for a device. The @cookie field,
which matches the cookie field of the last fault in the group, will
be used by the kernel to look up the pending message.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
include/uapi/linux/iommufd.h | 96 ++++++++++++++++++++++++++++++++++++
1 file changed, 96 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dfeaa2e649e..83b45dce94a4 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -692,4 +692,100 @@ struct iommu_hwpt_invalidate {
__u32 __reserved;
};
#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+/**
+ * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
+ * valid.
+ * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
+ */
+enum iommu_hwpt_pgfault_flags {
+ IOMMU_PGFAULT_FLAGS_PASID_VALID = (1 << 0),
+ IOMMU_PGFAULT_FLAGS_LAST_PAGE = (1 << 1),
+};
+
+/**
+ * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_PERM_READ: request for read permission
+ * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
+ * @IOMMU_PGFAULT_PERM_EXEC: (PCIE 10.4.1) request with a PASID that has the
+ * Execute Requested bit set in PASID TLP Prefix.
+ * @IOMMU_PGFAULT_PERM_PRIV: (PCIE 10.4.1) request with a PASID that has the
+ * Privileged Mode Requested bit set in PASID TLP
+ * Prefix.
+ */
+enum iommu_hwpt_pgfault_perm {
+ IOMMU_PGFAULT_PERM_READ = (1 << 0),
+ IOMMU_PGFAULT_PERM_WRITE = (1 << 1),
+ IOMMU_PGFAULT_PERM_EXEC = (1 << 2),
+ IOMMU_PGFAULT_PERM_PRIV = (1 << 3),
+};
+
+/**
+ * struct iommu_hwpt_pgfault - iommu page fault data
+ * @size: sizeof(struct iommu_hwpt_pgfault)
+ * @flags: Combination of enum iommu_hwpt_pgfault_flags
+ * @dev_id: id of the originated device
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: Combination of enum iommu_hwpt_pgfault_perm
+ * @addr: Page address
+ * @length: a hint of how much data the requestor is expecting to fetch. For
+ * example, if the PRI initiator knows it is going to do a 10MB
+ * transfer, it could fill in 10MB and the OS could pre-fault in
+ * 10MB of IOVA. It's default to 0 if there's no such hint.
+ * @cookie: kernel-managed cookie identifying a group of fault messages. The
+ * cookie number encoded in the last page fault of the group should
+ * be echoed back in the response message.
+ */
+struct iommu_hwpt_pgfault {
+ __u32 size;
+ __u32 flags;
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 perm;
+ __u64 addr;
+ __u32 length;
+ __u32 cookie;
+};
+
+/**
+ * enum iommufd_page_response_code - Return status of fault handlers
+ * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ * populated, retry the access. This is the
+ * "Success" defined in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_INVALID: General error. Drop all subsequent faults
+ * from this device if possible. This is the
+ * "Response Failure" in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_FAILURE: Could not handle this fault, don't retry the
+ * access. This is the "Invalid Request" in PCI
+ * 10.4.2.1.
+ */
+enum iommufd_page_response_code {
+ IOMMUFD_PAGE_RESP_SUCCESS = 0,
+ IOMMUFD_PAGE_RESP_INVALID,
+ IOMMUFD_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_hwpt_page_response - IOMMU page fault response
+ * @size: sizeof(struct iommu_hwpt_page_response)
+ * @flags: Must be set to 0
+ * @dev_id: device ID of target device for the response
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: One of response code in enum iommufd_page_response_code.
+ * @cookie: The kernel-managed cookie reported in the fault message.
+ */
+struct iommu_hwpt_page_response {
+ __u32 size;
+ __u32 flags;
+ __u32 dev_id;
+ __u32 pasid;
+ __u32 grpid;
+ __u32 code;
+ __u32 cookie;
+ __u32 reserved;
+};
#endif
--
2.34.1
> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
>
> iommu_hwpt_pgfaults represent fault messages that the userspace can
> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
> iommu_hwpt_pgfault.
Do you envision extending the same structure to report unrecoverable
fault in the future?
If yes this could be named more neutral e.g. iommu_hwpt_faults with
flags to indicate it's a recoverable PRI request.
If it's only for PRI probably iommu_hwpt_pgreqs is clearer.
> +
> +/**
> + * struct iommu_hwpt_pgfault - iommu page fault data
> + * @size: sizeof(struct iommu_hwpt_pgfault)
> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
> + * @dev_id: id of the originated device
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
> + * @addr: Page address
'Fault address'
> + * @length: a hint of how much data the requestor is expecting to fetch. For
> + * example, if the PRI initiator knows it is going to do a 10MB
> + * transfer, it could fill in 10MB and the OS could pre-fault in
> + * 10MB of IOVA. It's default to 0 if there's no such hint.
This is not clear to me and I don't remember PCIe spec defines such
mechanism.
> +/**
> + * enum iommufd_page_response_code - Return status of fault handlers
> + * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page
> tables
> + * populated, retry the access. This is the
> + * "Success" defined in PCI 10.4.2.1.
> + * @IOMMUFD_PAGE_RESP_INVALID: General error. Drop all subsequent
> faults
> + * from this device if possible. This is the
> + * "Response Failure" in PCI 10.4.2.1.
> + * @IOMMUFD_PAGE_RESP_FAILURE: Could not handle this fault, don't
> retry the
> + * access. This is the "Invalid Request" in PCI
> + * 10.4.2.1.
the comment for 'INVALID' and 'FAILURE' are misplaced. Also I'd more
use the spec words to be accurate.
> + */
> +enum iommufd_page_response_code {
> + IOMMUFD_PAGE_RESP_SUCCESS = 0,
> + IOMMUFD_PAGE_RESP_INVALID,
> + IOMMUFD_PAGE_RESP_FAILURE,
> +};
> +
On 2024/5/15 15:43, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, April 30, 2024 10:57 PM
>>
>> iommu_hwpt_pgfaults represent fault messages that the userspace can
>> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
>> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
>> iommu_hwpt_pgfault.
>
> Do you envision extending the same structure to report unrecoverable
> fault in the future?
I am not envisioning extending this to report unrecoverable faults in
the future. The unrecoverable faults are not always related to a hwpt,
and therefore it's more suitable to route them through a viommu object
which is under discussion in Nicolin's series.
>
> If yes this could be named more neutral e.g. iommu_hwpt_faults with
> flags to indicate it's a recoverable PRI request.
>
> If it's only for PRI probably iommu_hwpt_pgreqs is clearer.
>
>> +
>> +/**
>> + * struct iommu_hwpt_pgfault - iommu page fault data
>> + * @size: sizeof(struct iommu_hwpt_pgfault)
>> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
>> + * @dev_id: id of the originated device
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
>> + * @addr: Page address
>
> 'Fault address'
Okay.
>
>> + * @length: a hint of how much data the requestor is expecting to fetch. For
>> + * example, if the PRI initiator knows it is going to do a 10MB
>> + * transfer, it could fill in 10MB and the OS could pre-fault in
>> + * 10MB of IOVA. It's default to 0 if there's no such hint.
>
> This is not clear to me and I don't remember PCIe spec defines such
> mechanism.
This came up in a previous discussion. While it's not currently part of
the PCI specification and may not be in the future, we'd like to add
this mechanism for potential future advanced device features as it
offers significant optimization benefits.
>
>> +/**
>> + * enum iommufd_page_response_code - Return status of fault handlers
>> + * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page
>> tables
>> + * populated, retry the access. This is the
>> + * "Success" defined in PCI 10.4.2.1.
>> + * @IOMMUFD_PAGE_RESP_INVALID: General error. Drop all subsequent
>> faults
>> + * from this device if possible. This is the
>> + * "Response Failure" in PCI 10.4.2.1.
>> + * @IOMMUFD_PAGE_RESP_FAILURE: Could not handle this fault, don't
>> retry the
>> + * access. This is the "Invalid Request" in PCI
>> + * 10.4.2.1.
>
> the comment for 'INVALID' and 'FAILURE' are misplaced. Also I'd more
> use the spec words to be accurate.
Yes. Fixed.
>
>> + */
>> +enum iommufd_page_response_code {
>> + IOMMUFD_PAGE_RESP_SUCCESS = 0,
>> + IOMMUFD_PAGE_RESP_INVALID,
>> + IOMMUFD_PAGE_RESP_FAILURE,
>> +};
>> +
Best regards,
baolu
> From: Baolu Lu <baolu.lu@linux.intel.com> > Sent: Sunday, May 19, 2024 10:38 PM > > On 2024/5/15 15:43, Tian, Kevin wrote: > >> From: Lu Baolu <baolu.lu@linux.intel.com> > >> Sent: Tuesday, April 30, 2024 10:57 PM > >> > >> iommu_hwpt_pgfaults represent fault messages that the userspace can > >> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group, > >> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last > >> iommu_hwpt_pgfault. > > > > Do you envision extending the same structure to report unrecoverable > > fault in the future? > > I am not envisioning extending this to report unrecoverable faults in > the future. The unrecoverable faults are not always related to a hwpt, > and therefore it's more suitable to route them through a viommu object > which is under discussion in Nicolin's series. OK, I'll take a look at that series when reaching it in my TODO list. 😊 > >> + * @length: a hint of how much data the requestor is expecting to fetch. > For > >> + * example, if the PRI initiator knows it is going to do a 10MB > >> + * transfer, it could fill in 10MB and the OS could pre-fault in > >> + * 10MB of IOVA. It's default to 0 if there's no such hint. > > > > This is not clear to me and I don't remember PCIe spec defines such > > mechanism. > > This came up in a previous discussion. While it's not currently part of Can you provide a link to that discussion? > the PCI specification and may not be in the future, we'd like to add > this mechanism for potential future advanced device features as it > offers significant optimization benefits. > We design uAPI for real usages. It's a bit weird to introduce a format for unknown future features w/o an actual user to demonstrate its correctness.
On 5/20/24 11:24 AM, Tian, Kevin wrote: >> From: Baolu Lu <baolu.lu@linux.intel.com> >> Sent: Sunday, May 19, 2024 10:38 PM >> >> On 2024/5/15 15:43, Tian, Kevin wrote: >>>> From: Lu Baolu <baolu.lu@linux.intel.com> >>>> Sent: Tuesday, April 30, 2024 10:57 PM >>>> >>>> iommu_hwpt_pgfaults represent fault messages that the userspace can >>>> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group, >>>> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last >>>> iommu_hwpt_pgfault. >>> >>> Do you envision extending the same structure to report unrecoverable >>> fault in the future? >> >> I am not envisioning extending this to report unrecoverable faults in >> the future. The unrecoverable faults are not always related to a hwpt, >> and therefore it's more suitable to route them through a viommu object >> which is under discussion in Nicolin's series. > > OK, I'll take a look at that series when reaching it in my TODO list. 😊 > >>>> + * @length: a hint of how much data the requestor is expecting to fetch. >> For >>>> + * example, if the PRI initiator knows it is going to do a 10MB >>>> + * transfer, it could fill in 10MB and the OS could pre-fault in >>>> + * 10MB of IOVA. It's default to 0 if there's no such hint. >>> >>> This is not clear to me and I don't remember PCIe spec defines such >>> mechanism. >> >> This came up in a previous discussion. While it's not currently part of > > Can you provide a link to that discussion? https://lore.kernel.org/linux-iommu/20240322170410.GH66976@ziepe.ca/ > >> the PCI specification and may not be in the future, we'd like to add >> this mechanism for potential future advanced device features as it >> offers significant optimization benefits. >> > > We design uAPI for real usages. It's a bit weird to introduce a format > for unknown future features w/o an actual user to demonstrate its > correctness. Best regards, baolu
> From: Baolu Lu <baolu.lu@linux.intel.com> > Sent: Monday, May 20, 2024 11:33 AM > > On 5/20/24 11:24 AM, Tian, Kevin wrote: > >> From: Baolu Lu <baolu.lu@linux.intel.com> > >> Sent: Sunday, May 19, 2024 10:38 PM > >> > >> On 2024/5/15 15:43, Tian, Kevin wrote: > >>>> From: Lu Baolu <baolu.lu@linux.intel.com> > >>>> Sent: Tuesday, April 30, 2024 10:57 PM > >>>> > >>>> + * @length: a hint of how much data the requestor is expecting to > fetch. > >> For > >>>> + * example, if the PRI initiator knows it is going to do a 10MB > >>>> + * transfer, it could fill in 10MB and the OS could pre-fault in > >>>> + * 10MB of IOVA. It's default to 0 if there's no such hint. > >>> > >>> This is not clear to me and I don't remember PCIe spec defines such > >>> mechanism. > >> > >> This came up in a previous discussion. While it's not currently part of > > > > Can you provide a link to that discussion? > > https://lore.kernel.org/linux-iommu/20240322170410.GH66976@ziepe.ca/ > We can always extend uAPI for new usages, e.g. having a new flag bit to indicate the additional filed for carrying the number of pages. But requiring the user to handle non-zero length now (though trivial) is unnecessary burden. Do we want the response message to also carry a length field i.e. allowing the user to partially fix the fault?
On Mon, May 20, 2024 at 04:59:18AM +0000, Tian, Kevin wrote: > > From: Baolu Lu <baolu.lu@linux.intel.com> > > Sent: Monday, May 20, 2024 11:33 AM > > > > On 5/20/24 11:24 AM, Tian, Kevin wrote: > > >> From: Baolu Lu <baolu.lu@linux.intel.com> > > >> Sent: Sunday, May 19, 2024 10:38 PM > > >> > > >> On 2024/5/15 15:43, Tian, Kevin wrote: > > >>>> From: Lu Baolu <baolu.lu@linux.intel.com> > > >>>> Sent: Tuesday, April 30, 2024 10:57 PM > > >>>> > > >>>> + * @length: a hint of how much data the requestor is expecting to > > fetch. > > >> For > > >>>> + * example, if the PRI initiator knows it is going to do a 10MB > > >>>> + * transfer, it could fill in 10MB and the OS could pre-fault in > > >>>> + * 10MB of IOVA. It's default to 0 if there's no such hint. > > >>> > > >>> This is not clear to me and I don't remember PCIe spec defines such > > >>> mechanism. > > >> > > >> This came up in a previous discussion. While it's not currently part of > > > > > > Can you provide a link to that discussion? > > > > https://lore.kernel.org/linux-iommu/20240322170410.GH66976@ziepe.ca/ > > > > We can always extend uAPI for new usages, e.g. having a new flag > bit to indicate the additional filed for carrying the number of pages. > But requiring the user to handle non-zero length now (though trivial) > is unnecessary burden. It is tricky to extend this stuff since it comes out in read().. We'd have to have userspace negotiate a new format most likely. > Do we want the response message to also carry a length field i.e. > allowing the user to partially fix the fault? No, the device will discover this when it gets another fault :) Jason
> From: Jason Gunthorpe <jgg@ziepe.ca> > Sent: Friday, May 24, 2024 10:15 PM > > On Mon, May 20, 2024 at 04:59:18AM +0000, Tian, Kevin wrote: > > > From: Baolu Lu <baolu.lu@linux.intel.com> > > > Sent: Monday, May 20, 2024 11:33 AM > > > > > > On 5/20/24 11:24 AM, Tian, Kevin wrote: > > > >> From: Baolu Lu <baolu.lu@linux.intel.com> > > > >> Sent: Sunday, May 19, 2024 10:38 PM > > > >> > > > >> On 2024/5/15 15:43, Tian, Kevin wrote: > > > >>>> From: Lu Baolu <baolu.lu@linux.intel.com> > > > >>>> Sent: Tuesday, April 30, 2024 10:57 PM > > > >>>> > > > >>>> + * @length: a hint of how much data the requestor is expecting to > > > fetch. > > > >> For > > > >>>> + * example, if the PRI initiator knows it is going to do a 10MB > > > >>>> + * transfer, it could fill in 10MB and the OS could pre-fault in > > > >>>> + * 10MB of IOVA. It's default to 0 if there's no such hint. > > > >>> > > > >>> This is not clear to me and I don't remember PCIe spec defines such > > > >>> mechanism. > > > >> > > > >> This came up in a previous discussion. While it's not currently part of > > > > > > > > Can you provide a link to that discussion? > > > > > > https://lore.kernel.org/linux- > iommu/20240322170410.GH66976@ziepe.ca/ > > > > > > > We can always extend uAPI for new usages, e.g. having a new flag > > bit to indicate the additional filed for carrying the number of pages. > > But requiring the user to handle non-zero length now (though trivial) > > is unnecessary burden. > > It is tricky to extend this stuff since it comes out in read().. We'd > have to have userspace negotiate a new format most likely. > > > Do we want the response message to also carry a length field i.e. > > allowing the user to partially fix the fault? > > No, the device will discover this when it gets another fault :) > My worry was that we cannot predict how the spec will define this extension for multi-pages request/response. It could ask for additional things provided in the response message (though length may not be a good example) then we will have to extend the uAPI anyway. So I'm fine to keep this room if we can find an usage other than relying on future spec change. Otherwise IMHO it tries to guess the semantics of a future spec change for no good at this point. Then conservatively I'd vote for not doing it now. 😊
© 2016 - 2025 Red Hat, Inc.