[PATCH v6 05/10] iommufd: Add fault and response message definitions

Lu Baolu posted 10 patches 1 year, 8 months ago
There is a newer version of this series
[PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Lu Baolu 1 year, 8 months ago
iommu_hwpt_pgfaults represent fault messages that the userspace can
retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
iommu_hwpt_pgfault.

An iommu_hwpt_page_response is a response message that the userspace
should send to the kernel after finishing handling a group of fault
messages. The @dev_id, @pasid, and @grpid fields in the message
identify an outstanding iopf group for a device. The @cookie field,
which matches the cookie field of the last fault in the group, will
be used by the kernel to look up the pending message.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/uapi/linux/iommufd.h | 96 ++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dfeaa2e649e..2f34d66436fb 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -692,4 +692,100 @@ struct iommu_hwpt_invalidate {
 	__u32 __reserved;
 };
 #define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+/**
+ * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
+ *                                   valid.
+ * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
+ */
+enum iommu_hwpt_pgfault_flags {
+	IOMMU_PGFAULT_FLAGS_PASID_VALID		= (1 << 0),
+	IOMMU_PGFAULT_FLAGS_LAST_PAGE		= (1 << 1),
+};
+
+/**
+ * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_PERM_READ: request for read permission
+ * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
+ * @IOMMU_PGFAULT_PERM_EXEC: (PCIE 10.4.1) request with a PASID that has the
+ *                           Execute Requested bit set in PASID TLP Prefix.
+ * @IOMMU_PGFAULT_PERM_PRIV: (PCIE 10.4.1) request with a PASID that has the
+ *                           Privileged Mode Requested bit set in PASID TLP
+ *                           Prefix.
+ */
+enum iommu_hwpt_pgfault_perm {
+	IOMMU_PGFAULT_PERM_READ			= (1 << 0),
+	IOMMU_PGFAULT_PERM_WRITE		= (1 << 1),
+	IOMMU_PGFAULT_PERM_EXEC			= (1 << 2),
+	IOMMU_PGFAULT_PERM_PRIV			= (1 << 3),
+};
+
+/**
+ * struct iommu_hwpt_pgfault - iommu page fault data
+ * @size: sizeof(struct iommu_hwpt_pgfault)
+ * @flags: Combination of enum iommu_hwpt_pgfault_flags
+ * @dev_id: id of the originated device
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: Combination of enum iommu_hwpt_pgfault_perm
+ * @addr: Fault address
+ * @length: a hint of how much data the requestor is expecting to fetch. For
+ *          example, if the PRI initiator knows it is going to do a 10MB
+ *          transfer, it could fill in 10MB and the OS could pre-fault in
+ *          10MB of IOVA. It's default to 0 if there's no such hint.
+ * @cookie: kernel-managed cookie identifying a group of fault messages. The
+ *          cookie number encoded in the last page fault of the group should
+ *          be echoed back in the response message.
+ */
+struct iommu_hwpt_pgfault {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 perm;
+	__u64 addr;
+	__u32 length;
+	__u32 cookie;
+};
+
+/**
+ * enum iommufd_page_response_code - Return status of fault handlers
+ * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ *                             populated, retry the access. This is the
+ *                             "Success" defined in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
+ *                             access. This is the "Invalid Request" in PCI
+ *                             10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
+ *                             this device if possible. This is the "Response
+ *                             Failure" in PCI 10.4.2.1.
+ */
+enum iommufd_page_response_code {
+	IOMMUFD_PAGE_RESP_SUCCESS = 0,
+	IOMMUFD_PAGE_RESP_INVALID,
+	IOMMUFD_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_hwpt_page_response - IOMMU page fault response
+ * @size: sizeof(struct iommu_hwpt_page_response)
+ * @flags: Must be set to 0
+ * @dev_id: device ID of target device for the response
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: One of response code in enum iommufd_page_response_code.
+ * @cookie: The kernel-managed cookie reported in the fault message.
+ */
+struct iommu_hwpt_page_response {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 code;
+	__u32 cookie;
+	__u32 reserved;
+};
 #endif
-- 
2.34.1
Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Jason Gunthorpe 1 year, 8 months ago
On Mon, May 27, 2024 at 12:05:12PM +0800, Lu Baolu wrote:
> +/**
> + * struct iommu_hwpt_pgfault - iommu page fault data
> + * @size: sizeof(struct iommu_hwpt_pgfault)
> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
> + * @dev_id: id of the originated device
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
> + * @addr: Fault address
> + * @length: a hint of how much data the requestor is expecting to fetch. For
> + *          example, if the PRI initiator knows it is going to do a 10MB
> + *          transfer, it could fill in 10MB and the OS could pre-fault in
> + *          10MB of IOVA. It's default to 0 if there's no such hint.
> + * @cookie: kernel-managed cookie identifying a group of fault messages. The
> + *          cookie number encoded in the last page fault of the group should
> + *          be echoed back in the response message.
> + */
> +struct iommu_hwpt_pgfault {
> +	__u32 size;

Given we fail the system call if size is not exactly the right value
we should probably drop it here.

The ioctl to get the FD can someday specify the format of the fault
messages if we need to upgrade. If we want to change it down the road
then the old FD will be exactly as it is now, and the user will
request a new format FD that only works in whatever the new way is.

Jason
Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Baolu Lu 1 year, 8 months ago
On 6/12/24 9:52 PM, Jason Gunthorpe wrote:
> On Mon, May 27, 2024 at 12:05:12PM +0800, Lu Baolu wrote:
>> +/**
>> + * struct iommu_hwpt_pgfault - iommu page fault data
>> + * @size: sizeof(struct iommu_hwpt_pgfault)
>> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
>> + * @dev_id: id of the originated device
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
>> + * @addr: Fault address
>> + * @length: a hint of how much data the requestor is expecting to fetch. For
>> + *          example, if the PRI initiator knows it is going to do a 10MB
>> + *          transfer, it could fill in 10MB and the OS could pre-fault in
>> + *          10MB of IOVA. It's default to 0 if there's no such hint.
>> + * @cookie: kernel-managed cookie identifying a group of fault messages. The
>> + *          cookie number encoded in the last page fault of the group should
>> + *          be echoed back in the response message.
>> + */
>> +struct iommu_hwpt_pgfault {
>> +	__u32 size;
> Given we fail the system call if size is not exactly the right value
> we should probably drop it here.
> 
> The ioctl to get the FD can someday specify the format of the fault
> messages if we need to upgrade. If we want to change it down the road
> then the old FD will be exactly as it is now, and the user will
> request a new format FD that only works in whatever the new way is.

Okay, sure!

Best regards,
baolu
RE: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Tian, Kevin 1 year, 8 months ago
> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, May 27, 2024 12:05 PM
> 
> +
> +/**
> + * struct iommu_hwpt_page_response - IOMMU page fault response
> + * @size: sizeof(struct iommu_hwpt_page_response)
> + * @flags: Must be set to 0
> + * @dev_id: device ID of target device for the response
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @code: One of response code in enum iommufd_page_response_code.
> + * @cookie: The kernel-managed cookie reported in the fault message.
> + */
> +struct iommu_hwpt_page_response {
> +	__u32 size;
> +	__u32 flags;
> +	__u32 dev_id;
> +	__u32 pasid;
> +	__u32 grpid;
> +	__u32 code;
> +	__u32 cookie;
> +	__u32 reserved;
> +};

with the response queue per fault object we don't need all fields here,
e.g. dev_id, pasid, etc. Cookie is sufficient.
Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Baolu Lu 1 year, 8 months ago
On 6/5/24 4:28 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, May 27, 2024 12:05 PM
>>
>> +
>> +/**
>> + * struct iommu_hwpt_page_response - IOMMU page fault response
>> + * @size: sizeof(struct iommu_hwpt_page_response)
>> + * @flags: Must be set to 0
>> + * @dev_id: device ID of target device for the response
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @code: One of response code in enum iommufd_page_response_code.
>> + * @cookie: The kernel-managed cookie reported in the fault message.
>> + */
>> +struct iommu_hwpt_page_response {
>> +	__u32 size;
>> +	__u32 flags;
>> +	__u32 dev_id;
>> +	__u32 pasid;
>> +	__u32 grpid;
>> +	__u32 code;
>> +	__u32 cookie;
>> +	__u32 reserved;
>> +};
> 
> with the response queue per fault object we don't need all fields here,
> e.g. dev_id, pasid, etc. Cookie is sufficient.

I prefer not to mess the definition of user API data and the kernel
driver implementation. The kernel driver may change in the future, but
the user API will remain stable for a long time.

I am neutral about whether we could remove above fields, but I need all
maintainers to agree on this, given that this has undergone five rounds
of review. :-)

Best regards,
baolu
RE: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Tian, Kevin 1 year, 8 months ago
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, June 6, 2024 2:28 PM
> 
> On 6/5/24 4:28 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Monday, May 27, 2024 12:05 PM
> >>
> >> +
> >> +/**
> >> + * struct iommu_hwpt_page_response - IOMMU page fault response
> >> + * @size: sizeof(struct iommu_hwpt_page_response)
> >> + * @flags: Must be set to 0
> >> + * @dev_id: device ID of target device for the response
> >> + * @pasid: Process Address Space ID
> >> + * @grpid: Page Request Group Index
> >> + * @code: One of response code in enum
> iommufd_page_response_code.
> >> + * @cookie: The kernel-managed cookie reported in the fault message.
> >> + */
> >> +struct iommu_hwpt_page_response {
> >> +	__u32 size;
> >> +	__u32 flags;
> >> +	__u32 dev_id;
> >> +	__u32 pasid;
> >> +	__u32 grpid;
> >> +	__u32 code;
> >> +	__u32 cookie;
> >> +	__u32 reserved;
> >> +};
> >
> > with the response queue per fault object we don't need all fields here,
> > e.g. dev_id, pasid, etc. Cookie is sufficient.
> 
> I prefer not to mess the definition of user API data and the kernel
> driver implementation. The kernel driver may change in the future, but
> the user API will remain stable for a long time.

sure it remains stable for reasonable reason. Here we defined some
fields but they are even not used and checked in the kernel. IMHO it
suggests redundant definition. If there is value to keep them, do we
need to at least verify them same as the completion record?

> 
> I am neutral about whether we could remove above fields, but I need all
> maintainers to agree on this, given that this has undergone five rounds
> of review. :-)
> 

sure let's hear their opinions.
Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Jason Gunthorpe 1 year, 8 months ago
On Fri, Jun 07, 2024 at 09:38:38AM +0000, Tian, Kevin wrote:
> > From: Baolu Lu <baolu.lu@linux.intel.com>
> > Sent: Thursday, June 6, 2024 2:28 PM
> > 
> > On 6/5/24 4:28 PM, Tian, Kevin wrote:
> > >> From: Lu Baolu <baolu.lu@linux.intel.com>
> > >> Sent: Monday, May 27, 2024 12:05 PM
> > >>
> > >> +
> > >> +/**
> > >> + * struct iommu_hwpt_page_response - IOMMU page fault response
> > >> + * @size: sizeof(struct iommu_hwpt_page_response)
> > >> + * @flags: Must be set to 0
> > >> + * @dev_id: device ID of target device for the response
> > >> + * @pasid: Process Address Space ID
> > >> + * @grpid: Page Request Group Index
> > >> + * @code: One of response code in enum
> > iommufd_page_response_code.
> > >> + * @cookie: The kernel-managed cookie reported in the fault message.
> > >> + */
> > >> +struct iommu_hwpt_page_response {
> > >> +	__u32 size;
> > >> +	__u32 flags;
> > >> +	__u32 dev_id;
> > >> +	__u32 pasid;
> > >> +	__u32 grpid;
> > >> +	__u32 code;
> > >> +	__u32 cookie;
> > >> +	__u32 reserved;
> > >> +};
> > >
> > > with the response queue per fault object we don't need all fields here,
> > > e.g. dev_id, pasid, etc. Cookie is sufficient.

Wait, why did we make it per object? The fault FD is supposed to be
sharable across HWPTs.

> > I prefer not to mess the definition of user API data and the kernel
> > driver implementation. The kernel driver may change in the future, but
> > the user API will remain stable for a long time.
> 
> sure it remains stable for reasonable reason. Here we defined some
> fields but they are even not used and checked in the kernel. IMHO it
> suggests redundant definition. If there is value to keep them, do we
> need to at least verify them same as the completion record?

They are not here for the kernel, they are here for userspace.

A single HWPT and a single fault queue can be attached to multiple
devices we need to return the dev_id so that userspace can know which
device initiated the PRI. Same with PASID.

The only way we could remove them is if we are sure that no vIOMMU
requires RID or PASID in the virtual fault queue PRI fault message.. I
don't think that is true?

Cookie is not a replacement, cookie is an opaque value for the kernel
to use to match a response to a request.

Jason
Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Baolu Lu 1 year, 8 months ago
On 6/12/24 9:19 PM, Jason Gunthorpe wrote:
> On Fri, Jun 07, 2024 at 09:38:38AM +0000, Tian, Kevin wrote:
>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>> Sent: Thursday, June 6, 2024 2:28 PM
>>>
>>> On 6/5/24 4:28 PM, Tian, Kevin wrote:
>>>>> From: Lu Baolu<baolu.lu@linux.intel.com>
>>>>> Sent: Monday, May 27, 2024 12:05 PM
>>>>>
>>>>> +
>>>>> +/**
>>>>> + * struct iommu_hwpt_page_response - IOMMU page fault response
>>>>> + * @size: sizeof(struct iommu_hwpt_page_response)
>>>>> + * @flags: Must be set to 0
>>>>> + * @dev_id: device ID of target device for the response
>>>>> + * @pasid: Process Address Space ID
>>>>> + * @grpid: Page Request Group Index
>>>>> + * @code: One of response code in enum
>>> iommufd_page_response_code.
>>>>> + * @cookie: The kernel-managed cookie reported in the fault message.
>>>>> + */
>>>>> +struct iommu_hwpt_page_response {
>>>>> +	__u32 size;
>>>>> +	__u32 flags;
>>>>> +	__u32 dev_id;
>>>>> +	__u32 pasid;
>>>>> +	__u32 grpid;
>>>>> +	__u32 code;
>>>>> +	__u32 cookie;
>>>>> +	__u32 reserved;
>>>>> +};
>>>> with the response queue per fault object we don't need all fields here,
>>>> e.g. dev_id, pasid, etc. Cookie is sufficient.
> Wait, why did we make it per object? The fault FD is supposed to be
> sharable across HWPTs.

The fault FD is shareable across HWPTs. Kevin was suggesting that the
response queue (for all outstanding IOPFs awaiting responses) could be
put in the fault structure.

Best regards,
baolu
Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Jason Gunthorpe 1 year, 8 months ago
On Wed, Jun 12, 2024 at 10:19:46AM -0300, Jason Gunthorpe wrote:
> > > I prefer not to mess the definition of user API data and the kernel
> > > driver implementation. The kernel driver may change in the future, but
> > > the user API will remain stable for a long time.
> > 
> > sure it remains stable for reasonable reason. Here we defined some
> > fields but they are even not used and checked in the kernel. IMHO it
> > suggests redundant definition. If there is value to keep them, do we
> > need to at least verify them same as the completion record?
> 
> They are not here for the kernel, they are here for userspace.
> 
> A single HWPT and a single fault queue can be attached to multiple
> devices we need to return the dev_id so that userspace can know which
> device initiated the PRI. Same with PASID.
> 
> The only way we could remove them is if we are sure that no vIOMMU
> requires RID or PASID in the virtual fault queue PRI fault message.. I
> don't think that is true?
> 
> Cookie is not a replacement, cookie is an opaque value for the kernel
> to use to match a response to a request.

Oh I got this wrong, the above is the response, yeah we can ditch
everything but the cookie, and code right?

struct iommu_hwpt_page_response {
       __u32 cookie;
       __u32 code;
};

What is the workflow here? We expect the VMM will take the vIOMMU
response and match it to a request cookie for the kernel to complete?

Jason
Re: [PATCH v6 05/10] iommufd: Add fault and response message definitions
Posted by Baolu Lu 1 year, 8 months ago
On 6/12/24 9:50 PM, Jason Gunthorpe wrote:
> On Wed, Jun 12, 2024 at 10:19:46AM -0300, Jason Gunthorpe wrote:
>>>> I prefer not to mess the definition of user API data and the kernel
>>>> driver implementation. The kernel driver may change in the future, but
>>>> the user API will remain stable for a long time.
>>> sure it remains stable for reasonable reason. Here we defined some
>>> fields but they are even not used and checked in the kernel. IMHO it
>>> suggests redundant definition. If there is value to keep them, do we
>>> need to at least verify them same as the completion record?
>> They are not here for the kernel, they are here for userspace.
>>
>> A single HWPT and a single fault queue can be attached to multiple
>> devices we need to return the dev_id so that userspace can know which
>> device initiated the PRI. Same with PASID.
>>
>> The only way we could remove them is if we are sure that no vIOMMU
>> requires RID or PASID in the virtual fault queue PRI fault message.. I
>> don't think that is true?
>>
>> Cookie is not a replacement, cookie is an opaque value for the kernel
>> to use to match a response to a request.
> Oh I got this wrong, the above is the response, yeah we can ditch
> everything but the cookie, and code right?
> 
> struct iommu_hwpt_page_response {
>         __u32 cookie;
>         __u32 code;
> };
> 
> What is the workflow here? We expect the VMM will take the vIOMMU
> response and match it to a request cookie for the kernel to complete?

The workflow in the host kernel involves looking up the iopf_group using
the cookie value in the response queue and then responding to the
iopf_group with the code. Therefore, from the host kernel's perspective,
it is acceptable if the user only passes the cookie and code in the
response message.

Since you both agreed that the response message could be simplified, I
will implement the above structure in the new version.

Best regards,
baolu