[XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

Jiqian Chen posted 5 patches 5 months, 1 week ago
There is a newer version of this series
[XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Jiqian Chen 5 months, 1 week ago
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis don't get registered, it causes
the info of apic, pin and irq not be added into irq_2_pin list,
and the handler of irq_desc is not set, then when passthrough a
device, setting ioapic affinity and vector will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
The code link that will call this hypercall on linux kernel side is as follows:
https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ENOSYS;
         break;
 
+    case PHYSDEVOP_setup_gsi:
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
-- 
2.34.1
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Jan Beulich 5 months, 1 week ago
On 17.06.2024 11:00, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis don't get registered, it causes
> the info of apic, pin and irq not be added into irq_2_pin list,
> and the handler of irq_desc is not set, then when passthrough a
> device, setting ioapic affinity and vector will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> The code link that will call this hypercall on linux kernel side is as follows:
> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/

One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
one here:
"As to GSIs not being registered: If that's not a problem for Dom0's own
 operation, I think it'll also want/need explaining why what is sufficient for
 Dom0 alone isn't sufficient when pass-through comes into play."

Jan
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Chen, Jiqian 5 months, 1 week ago
On 2024/6/17 22:52, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> The gsi of a passthrough device must be configured for it to be
>> able to be mapped into a hvm domU.
>> But When dom0 is PVH, the gsis don't get registered, it causes
>> the info of apic, pin and irq not be added into irq_2_pin list,
>> and the handler of irq_desc is not set, then when passthrough a
>> device, setting ioapic affinity and vector will fail.
>>
>> To fix above problem, on Linux kernel side, a new code will
>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>> register gsi when dom0 is PVH.
>>
>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>> purpose.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> The code link that will call this hypercall on linux kernel side is as follows:
>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
> 
> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
> one here:
> "As to GSIs not being registered: If that's not a problem for Dom0's own
>  operation, I think it'll also want/need explaining why what is sufficient for
>  Dom0 alone isn't sufficient when pass-through comes into play."
I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
" it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
What description do you want me to add?

> 
> Jan
> 

-- 
Best regards,
Jiqian Chen.
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Jan Beulich 5 months ago
On 18.06.2024 08:57, Chen, Jiqian wrote:
> On 2024/6/17 22:52, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> The gsi of a passthrough device must be configured for it to be
>>> able to be mapped into a hvm domU.
>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>> and the handler of irq_desc is not set, then when passthrough a
>>> device, setting ioapic affinity and vector will fail.
>>>
>>> To fix above problem, on Linux kernel side, a new code will
>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>> register gsi when dom0 is PVH.
>>>
>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>> purpose.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> ---
>>> The code link that will call this hypercall on linux kernel side is as follows:
>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>
>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>> one here:
>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>  operation, I think it'll also want/need explaining why what is sufficient for
>>  Dom0 alone isn't sufficient when pass-through comes into play."
> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
> What description do you want me to add?

What I'd first like to have clarification on (i.e. before putting it in
the description one way or another): How come Dom0 alone gets away fine
without making the call, yet for passthrough-to-DomU it's needed? Is it
perhaps that it just so happened that for Dom0 things have been working
on systems where it was tested, but the call should in principle have been
there in this case, too [1]? That (to me at least) would make quite a
difference for both this patch's description and us accepting it.

Jan

[1] Alternative e.g. being that because of other actions PVH Dom0 takes,
like the IO-APIC RTE programming it does for IRQs it wants to use for
itself, the necessary information is already suitably conveyed to Xen in
that case. In such a case imo it's relevant to mention in the description.
Not the least because iirc the pciback driver sets up a fake IRQ handler
in such cases, which ought to lead to similar IO-APIC RTE programming, at
which point the question would again arise why the hypercall needs
exposing.
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Chen, Jiqian 5 months ago
On 2024/6/18 16:55, Jan Beulich wrote:
> On 18.06.2024 08:57, Chen, Jiqian wrote:
>> On 2024/6/17 22:52, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> The gsi of a passthrough device must be configured for it to be
>>>> able to be mapped into a hvm domU.
>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>> and the handler of irq_desc is not set, then when passthrough a
>>>> device, setting ioapic affinity and vector will fail.
>>>>
>>>> To fix above problem, on Linux kernel side, a new code will
>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>> register gsi when dom0 is PVH.
>>>>
>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>> purpose.
>>>>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> ---
>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>
>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>> one here:
>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>> What description do you want me to add?
> 
> What I'd first like to have clarification on (i.e. before putting it in
> the description one way or another): How come Dom0 alone gets away fine
> without making the call, yet for passthrough-to-DomU it's needed? Is it
> perhaps that it just so happened that for Dom0 things have been working
> on systems where it was tested, but the call should in principle have been
> there in this case, too [1]? That (to me at least) would make quite a
> difference for both this patch's description and us accepting it.
Oh, I think I know what's your concern now. Thanks.
First question, why gsi of device can work on PVH dom0:
Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
Second question, why gsi of passthrough can't work on PVH dom0:
Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.

> 
> Jan
> 
> [1] Alternative e.g. being that because of other actions PVH Dom0 takes,
> like the IO-APIC RTE programming it does for IRQs it wants to use for
> itself, the necessary information is already suitably conveyed to Xen in
> that case. In such a case imo it's relevant to mention in the description.
> Not the least because iirc the pciback driver sets up a fake IRQ handler
> in such cases, which ought to lead to similar IO-APIC RTE programming, at
> which point the question would again arise why the hypercall needs
> exposing.

-- 
Best regards,
Jiqian Chen.
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Jan Beulich 5 months ago
On 19.06.2024 09:53, Chen, Jiqian wrote:
> On 2024/6/18 16:55, Jan Beulich wrote:
>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>> The gsi of a passthrough device must be configured for it to be
>>>>> able to be mapped into a hvm domU.
>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>> device, setting ioapic affinity and vector will fail.
>>>>>
>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>> register gsi when dom0 is PVH.
>>>>>
>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>> purpose.
>>>>>
>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>> ---
>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>
>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>> one here:
>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>> What description do you want me to add?
>>
>> What I'd first like to have clarification on (i.e. before putting it in
>> the description one way or another): How come Dom0 alone gets away fine
>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>> perhaps that it just so happened that for Dom0 things have been working
>> on systems where it was tested, but the call should in principle have been
>> there in this case, too [1]? That (to me at least) would make quite a
>> difference for both this patch's description and us accepting it.
> Oh, I think I know what's your concern now. Thanks.
> First question, why gsi of device can work on PVH dom0:
> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
> Second question, why gsi of passthrough can't work on PVH dom0:
> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.

And why exactly would the fake IRQ handler not be set up by pciback? Its
setting up ought to lead to those same IO-APIC RTE writes that Xen
intercepts.

In any event, imo a summary of the above wants to be part of the patch
description.

Jan
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Chen, Jiqian 5 months ago
On 2024/6/19 16:06, Jan Beulich wrote:
> On 19.06.2024 09:53, Chen, Jiqian wrote:
>> On 2024/6/18 16:55, Jan Beulich wrote:
>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>> able to be mapped into a hvm domU.
>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>
>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>> register gsi when dom0 is PVH.
>>>>>>
>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>> purpose.
>>>>>>
>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>> ---
>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>
>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>> one here:
>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>> What description do you want me to add?
>>>
>>> What I'd first like to have clarification on (i.e. before putting it in
>>> the description one way or another): How come Dom0 alone gets away fine
>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>> perhaps that it just so happened that for Dom0 things have been working
>>> on systems where it was tested, but the call should in principle have been
>>> there in this case, too [1]? That (to me at least) would make quite a
>>> difference for both this patch's description and us accepting it.
>> Oh, I think I know what's your concern now. Thanks.
>> First question, why gsi of device can work on PVH dom0:
>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>> Second question, why gsi of passthrough can't work on PVH dom0:
>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
> 
> And why exactly would the fake IRQ handler not be set up by pciback? Its
> setting up ought to lead to those same IO-APIC RTE writes that Xen
> intercepts.
Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.

> 
> In any event, imo a summary of the above wants to be part of the patch
> description.
OK, will add into the commit message in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Jan Beulich 5 months ago
On 19.06.2024 10:51, Chen, Jiqian wrote:
> On 2024/6/19 16:06, Jan Beulich wrote:
>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>> able to be mapped into a hvm domU.
>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>
>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>> register gsi when dom0 is PVH.
>>>>>>>
>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>> purpose.
>>>>>>>
>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>> ---
>>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>>
>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>>> one here:
>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>>> What description do you want me to add?
>>>>
>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>> the description one way or another): How come Dom0 alone gets away fine
>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>> perhaps that it just so happened that for Dom0 things have been working
>>>> on systems where it was tested, but the call should in principle have been
>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>> difference for both this patch's description and us accepting it.
>>> Oh, I think I know what's your concern now. Thanks.
>>> First question, why gsi of device can work on PVH dom0:
>>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
>>
>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>> intercepts.
> Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.

I'm afraid I don't follow you here. Quoting from the function:

	enable =  dev_data->enable_intx;

	/* Asked to disable, but ISR isn't runnig */
	if (!enable && !dev_data->isr_on)
		return;

I.e. we bail if the request was to _disable_ and there is no ISR.

I can't exclude though that command_write()'s logic to set ->enable_intx
is insufficient. But in the common case one would surely expect at least
one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
IOW at some point I'd expect xen_pcibk_control_isr() to be called with
the second argument 0 and with ->enable_intx set.

> And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.

The sysfs interface is, according to my reading of the description
of the commit introducing it, merely for debugging / recovery purposes.
(It also looks to me as if this was partly broken: If one would use it,
thus clearing ->isr_on, a subsequent disable request would take exactly
that early bailing path quoted above, with nothing removing the IRQ
handler.)

That description also talks about both an edge vs level distinction in
behavior and one for shared vs non-shared, but neither in that commit
nor in present code I can spot any respective checks. Otherwise I could
understand that there are cases where the necessary information isn't
propagated to Xen.

Jan
Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
Posted by Chen, Jiqian 5 months ago
On 2024/6/19 17:49, Jan Beulich wrote:
> On 19.06.2024 10:51, Chen, Jiqian wrote:
>> On 2024/6/19 16:06, Jan Beulich wrote:
>>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>>> able to be mapped into a hvm domU.
>>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>>
>>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>>> register gsi when dom0 is PVH.
>>>>>>>>
>>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>>> purpose.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>>> ---
>>>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>>>
>>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>>>> one here:
>>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>>>> What description do you want me to add?
>>>>>
>>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>>> the description one way or another): How come Dom0 alone gets away fine
>>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>>> perhaps that it just so happened that for Dom0 things have been working
>>>>> on systems where it was tested, but the call should in principle have been
>>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>>> difference for both this patch's description and us accepting it.
>>>> Oh, I think I know what's your concern now. Thanks.
>>>> First question, why gsi of device can work on PVH dom0:
>>>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
>>>
>>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>>> intercepts.
>> Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
> 
> I'm afraid I don't follow you here. Quoting from the function:
> 
> 	enable =  dev_data->enable_intx;
> 
> 	/* Asked to disable, but ISR isn't runnig */
> 	if (!enable && !dev_data->isr_on)
> 		return;
> 
> I.e. we bail if the request was to _disable_ and there is no ISR.
I mean, after debugging the pcistub_probe callstack:
pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> xen_pcibk_control_isr(dev, 1 /*reset device*/)
and in xen_pcibk_control_isr code:
	if (reset) {
		dev_data->enable_intx = 0;
		dev_data->ack_intr = 0;
	}
	enable =  dev_data->enable_intx;

	/* Asked to disable, but ISR isn't runnig */
	if (!enable && !dev_data->isr_on)
		return;
"reset" is 1, so "dev_date->enable_intx" is set 0, then "enable" is 0, and then arrive the second "if" check, "enable" is 0 and "dev_date->isr_on" is also 0, so it return here.
It can't reach following code to install irq handle.

> 
> I can't exclude though that command_write()'s logic to set ->enable_intx
> is insufficient. But in the common case one would surely expect at least
> one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
> IOW at some point I'd expect xen_pcibk_control_isr() to be called with
> the second argument 0 and with ->enable_intx set.
I only see xen_pcibk_control_isr(dev, 0) is called in xen_pcibk_do_one_op, but xen_pcibk_do_one_op is not called during assigning a device to be passthrough.

> 
>> And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.
> 
> The sysfs interface is, according to my reading of the description
> of the commit introducing it, merely for debugging / recovery purposes.
> (It also looks to me as if this was partly broken: If one would use it,
> thus clearing ->isr_on, a subsequent disable request would take exactly
> that early bailing path quoted above, with nothing removing the IRQ
> handler.)
> 
> That description also talks about both an edge vs level distinction in
> behavior and one for shared vs non-shared, but neither in that commit
> nor in present code I can spot any respective checks. Otherwise I could
> understand that there are cases where the necessary information isn't
> propagated to Xen.
> 
> Jan

-- 
Best regards,
Jiqian Chen.