Hi all,
I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0
test with the brand new gitlab-ci runner offered by Qubes. It is an AMD
Zen3 system and we already have a few successful tests with it, see
automation/gitlab-ci/test.yaml.
We managed to narrow down the issue to a console problem. We are
currently using console=com1 com1=115200,8n1,pci,msi as Xen command line
options, it works with PV Dom0 and it is using a PCI UART card.
In the case of Dom0 PVH:
- it works without console=com1
- it works with console=com1 and with the patch appended below
- it doesn't work otherwise and crashes with this error:
https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK
What is the right way to fix it?
Keep in mind that I don't have access to the system except via gitlab-ci
pipelines. Marek (CCed) might have more info on the system and the PCI
UART he is using in case it's needed.
Many thanks for any help you can provide.
Cheers,
Stefano
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 212a9c49ae..57623bc091 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -429,17 +428,6 @@ static void __init cf_check ns16550_init_postirq(struct serial_port *port)
#ifdef NS16550_PCI
if ( uart->bar || uart->ps_bdf_enable )
{
- if ( uart->param && uart->param->mmio &&
- rangeset_add_range(mmio_ro_ranges, PFN_DOWN(uart->io_base),
- PFN_UP(uart->io_base + uart->io_size) - 1) )
- printk(XENLOG_INFO "Error while adding MMIO range of device to mmio_ro_ranges\n");
-
- if ( pci_ro_device(0, uart->ps_bdf[0],
- PCI_DEVFN(uart->ps_bdf[1], uart->ps_bdf[2])) )
- printk(XENLOG_INFO "Could not mark config space of %02x:%02x.%u read-only.\n",
- uart->ps_bdf[0], uart->ps_bdf[1],
- uart->ps_bdf[2]);
-
if ( uart->msi )
{
struct msi_info msi = {
On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > Hi all, > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > Zen3 system and we already have a few successful tests with it, see > automation/gitlab-ci/test.yaml. > > We managed to narrow down the issue to a console problem. We are > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > options, it works with PV Dom0 and it is using a PCI UART card. > > In the case of Dom0 PVH: > - it works without console=com1 > - it works with console=com1 and with the patch appended below > - it doesn't work otherwise and crashes with this error: > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK Jan also noticed this, and we have a ticket for it in gitlab: https://gitlab.com/xen-project/xen/-/issues/85 > What is the right way to fix it? I think the right fix is to simply avoid hidden devices from being handled by vPCI, in any case such devices won't work propewrly with vPCI because they are in use by Xen, and so any cached information by vPCI is likely to become stable as Xen can modify the device without vPCI noticing. I think the chunk below should help. It's not clear to me however how hidden devices should be handled, is the intention to completely hide such devices from dom0? Roger. --- diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 652807a4a454..0baef3a8d3a1 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) unsigned int i; int rc = 0; - if ( !has_vpci(pdev->domain) ) + if ( !has_vpci(pdev->domain) || + /* + * Ignore RO and hidden devices, those are in use by Xen and vPCI + * won't work on them. + */ + pci_get_pdev(dom_xen, pdev->sbdf) ) return 0; /* We should not get here twice for the same device. */
On 18.05.2023 12:34, Roger Pau Monné wrote: > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: >> I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 >> test with the brand new gitlab-ci runner offered by Qubes. It is an AMD >> Zen3 system and we already have a few successful tests with it, see >> automation/gitlab-ci/test.yaml. >> >> We managed to narrow down the issue to a console problem. We are >> currently using console=com1 com1=115200,8n1,pci,msi as Xen command line >> options, it works with PV Dom0 and it is using a PCI UART card. >> >> In the case of Dom0 PVH: >> - it works without console=com1 >> - it works with console=com1 and with the patch appended below >> - it doesn't work otherwise and crashes with this error: >> https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > Jan also noticed this, and we have a ticket for it in gitlab: > > https://gitlab.com/xen-project/xen/-/issues/85 > >> What is the right way to fix it? > > I think the right fix is to simply avoid hidden devices from being > handled by vPCI, in any case such devices won't work propewrly with > vPCI because they are in use by Xen, and so any cached information by > vPCI is likely to become stable as Xen can modify the device without > vPCI noticing. > > I think the chunk below should help. It's not clear to me however how > hidden devices should be handled, is the intention to completely hide > such devices from dom0? No, Dom0 should still be able to see them in a (mostly) r/o fashion. Hence my earlier RFC patch making vPCI actually deal with them. Jan
On Fri, 19 May 2023, Jan Beulich wrote: > On 18.05.2023 12:34, Roger Pau Monné wrote: > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > >> I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > >> test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > >> Zen3 system and we already have a few successful tests with it, see > >> automation/gitlab-ci/test.yaml. > >> > >> We managed to narrow down the issue to a console problem. We are > >> currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > >> options, it works with PV Dom0 and it is using a PCI UART card. > >> > >> In the case of Dom0 PVH: > >> - it works without console=com1 > >> - it works with console=com1 and with the patch appended below > >> - it doesn't work otherwise and crashes with this error: > >> https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > >> What is the right way to fix it? > > > > I think the right fix is to simply avoid hidden devices from being > > handled by vPCI, in any case such devices won't work propewrly with > > vPCI because they are in use by Xen, and so any cached information by > > vPCI is likely to become stable as Xen can modify the device without > > vPCI noticing. > > > > I think the chunk below should help. It's not clear to me however how > > hidden devices should be handled, is the intention to completely hide > > such devices from dom0? > > No, Dom0 should still be able to see them in a (mostly) r/o fashion. But why? If something is in-use by Xen (e.g. IOMMU, a serial PCI device, etc.) ideally Dom0 shouldn't even know of its existence because the device is not exposed to Dom0. Dom0 is not meant to use it. Why let Dom0 know it exists if Dom0 should not use it? In Xen on ARM, initially we didn't expose devices used by Xen to Dom0 at all. However to hide them completely we had to make complex device tree manipulations. Now instead we leave the device nodes in device tree as is, but we change the "status" property to "disabled". The idea is still that we completely hide Xen devices from Dom0, but because of implementation complexity, instead of completing taking away the corresponding nodes from device tree, we change them to disabled, which still leads to the same result: the guest OS will skip them. I am saying this without being familiar with the x86 PVH implementation, so pardon my ignorance here, but it seems to me that as we are moving toward an better architecture with PVH, once that allows any OS to be Dom0, not just Linux, we would want to also completely hide devices owned by Xen from Dom0. That way we don't need any workaround in the guest OS for it not to use them.
On 20.05.2023 00:44, Stefano Stabellini wrote: > On Fri, 19 May 2023, Jan Beulich wrote: >> On 18.05.2023 12:34, Roger Pau Monné wrote: >>> On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: >>>> I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 >>>> test with the brand new gitlab-ci runner offered by Qubes. It is an AMD >>>> Zen3 system and we already have a few successful tests with it, see >>>> automation/gitlab-ci/test.yaml. >>>> >>>> We managed to narrow down the issue to a console problem. We are >>>> currently using console=com1 com1=115200,8n1,pci,msi as Xen command line >>>> options, it works with PV Dom0 and it is using a PCI UART card. >>>> >>>> In the case of Dom0 PVH: >>>> - it works without console=com1 >>>> - it works with console=com1 and with the patch appended below >>>> - it doesn't work otherwise and crashes with this error: >>>> https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK >>> >>> Jan also noticed this, and we have a ticket for it in gitlab: >>> >>> https://gitlab.com/xen-project/xen/-/issues/85 >>> >>>> What is the right way to fix it? >>> >>> I think the right fix is to simply avoid hidden devices from being >>> handled by vPCI, in any case such devices won't work propewrly with >>> vPCI because they are in use by Xen, and so any cached information by >>> vPCI is likely to become stable as Xen can modify the device without >>> vPCI noticing. >>> >>> I think the chunk below should help. It's not clear to me however how >>> hidden devices should be handled, is the intention to completely hide >>> such devices from dom0? >> >> No, Dom0 should still be able to see them in a (mostly) r/o fashion. > > But why? If something is in-use by Xen (e.g. IOMMU, a serial PCI device, > etc.) ideally Dom0 shouldn't even know of its existence because the > device is not exposed to Dom0. Dom0 is not meant to use it. Why let Dom0 > know it exists if Dom0 should not use it? Because we want to disturb the topology Dom0 sees as little as possible. For example, imagine the device is func 0 of a multi-function device. How would Dom0 know to even look at the other functions, when it can't even read the relevant bit in func 0's config space? Jan > In Xen on ARM, initially we didn't expose devices used by Xen to Dom0 > at all. However to hide them completely we had to make complex device > tree manipulations. Now instead we leave the device nodes in device tree > as is, but we change the "status" property to "disabled". > > The idea is still that we completely hide Xen devices from Dom0, but > because of implementation complexity, instead of completing taking away > the corresponding nodes from device tree, we change them to disabled, > which still leads to the same result: the guest OS will skip them. > > I am saying this without being familiar with the x86 PVH implementation, > so pardon my ignorance here, but it seems to me that as we are moving > toward an better architecture with PVH, once that allows any OS to be > Dom0, not just Linux, we would want to also completely hide devices > owned by Xen from Dom0. > > That way we don't need any workaround in the guest OS for it not to use > them.
On Fri, May 19, 2023 at 09:22:58AM +0200, Jan Beulich wrote: > On 18.05.2023 12:34, Roger Pau Monné wrote: > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > >> I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > >> test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > >> Zen3 system and we already have a few successful tests with it, see > >> automation/gitlab-ci/test.yaml. > >> > >> We managed to narrow down the issue to a console problem. We are > >> currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > >> options, it works with PV Dom0 and it is using a PCI UART card. > >> > >> In the case of Dom0 PVH: > >> - it works without console=com1 > >> - it works with console=com1 and with the patch appended below > >> - it doesn't work otherwise and crashes with this error: > >> https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > >> What is the right way to fix it? > > > > I think the right fix is to simply avoid hidden devices from being > > handled by vPCI, in any case such devices won't work propewrly with > > vPCI because they are in use by Xen, and so any cached information by > > vPCI is likely to become stable as Xen can modify the device without > > vPCI noticing. > > > > I think the chunk below should help. It's not clear to me however how > > hidden devices should be handled, is the intention to completely hide > > such devices from dom0? > > No, Dom0 should still be able to see them in a (mostly) r/o fashion. > Hence my earlier RFC patch making vPCI actually deal with them. What's the difference between a hidden device and one that's marked RO then? Thanks, Roger.
On 19.05.2023 09:38, Roger Pau Monné wrote: > On Fri, May 19, 2023 at 09:22:58AM +0200, Jan Beulich wrote: >> On 18.05.2023 12:34, Roger Pau Monné wrote: >>> On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: >>>> I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 >>>> test with the brand new gitlab-ci runner offered by Qubes. It is an AMD >>>> Zen3 system and we already have a few successful tests with it, see >>>> automation/gitlab-ci/test.yaml. >>>> >>>> We managed to narrow down the issue to a console problem. We are >>>> currently using console=com1 com1=115200,8n1,pci,msi as Xen command line >>>> options, it works with PV Dom0 and it is using a PCI UART card. >>>> >>>> In the case of Dom0 PVH: >>>> - it works without console=com1 >>>> - it works with console=com1 and with the patch appended below >>>> - it doesn't work otherwise and crashes with this error: >>>> https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK >>> >>> Jan also noticed this, and we have a ticket for it in gitlab: >>> >>> https://gitlab.com/xen-project/xen/-/issues/85 >>> >>>> What is the right way to fix it? >>> >>> I think the right fix is to simply avoid hidden devices from being >>> handled by vPCI, in any case such devices won't work propewrly with >>> vPCI because they are in use by Xen, and so any cached information by >>> vPCI is likely to become stable as Xen can modify the device without >>> vPCI noticing. >>> >>> I think the chunk below should help. It's not clear to me however how >>> hidden devices should be handled, is the intention to completely hide >>> such devices from dom0? >> >> No, Dom0 should still be able to see them in a (mostly) r/o fashion. >> Hence my earlier RFC patch making vPCI actually deal with them. > > What's the difference between a hidden device and one that's marked RO > then? pci_hide_device() simply makes the device unavailable for assignment (by having it owned by DomXEN). pci_ro_device() additionally adds the device to the segment's ro_map, thus protecting its config space from Dom0 writes. And just to clarify - a PCI dev containing a UART isn't "hidden" in the above sense, but "r/o" (by virtue of calling pci_ro_device() on it). But the issue reported long ago (and now re-discovered by Stefano) is common to "r/o" and "hidden" devices (it's the "hidden" aspect that counts here, which is common for both). Jan
On Fri, May 19, 2023 at 10:20:24AM +0200, Jan Beulich wrote: > On 19.05.2023 09:38, Roger Pau Monné wrote: > > On Fri, May 19, 2023 at 09:22:58AM +0200, Jan Beulich wrote: > >> On 18.05.2023 12:34, Roger Pau Monné wrote: > >>> On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > >>>> I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > >>>> test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > >>>> Zen3 system and we already have a few successful tests with it, see > >>>> automation/gitlab-ci/test.yaml. > >>>> > >>>> We managed to narrow down the issue to a console problem. We are > >>>> currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > >>>> options, it works with PV Dom0 and it is using a PCI UART card. > >>>> > >>>> In the case of Dom0 PVH: > >>>> - it works without console=com1 > >>>> - it works with console=com1 and with the patch appended below > >>>> - it doesn't work otherwise and crashes with this error: > >>>> https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > >>> > >>> Jan also noticed this, and we have a ticket for it in gitlab: > >>> > >>> https://gitlab.com/xen-project/xen/-/issues/85 > >>> > >>>> What is the right way to fix it? > >>> > >>> I think the right fix is to simply avoid hidden devices from being > >>> handled by vPCI, in any case such devices won't work propewrly with > >>> vPCI because they are in use by Xen, and so any cached information by > >>> vPCI is likely to become stable as Xen can modify the device without > >>> vPCI noticing. > >>> > >>> I think the chunk below should help. It's not clear to me however how > >>> hidden devices should be handled, is the intention to completely hide > >>> such devices from dom0? > >> > >> No, Dom0 should still be able to see them in a (mostly) r/o fashion. > >> Hence my earlier RFC patch making vPCI actually deal with them. > > > > What's the difference between a hidden device and one that's marked RO > > then? > > pci_hide_device() simply makes the device unavailable for assignment > (by having it owned by DomXEN). pci_ro_device() additionally adds the > device to the segment's ro_map, thus protecting its config space from > Dom0 writes. But above you mention that hidden devices should be visible to dom0 "in a (mostly) r/o fashion.". I understand that for RO devices the whole config space of the device is RO, in which case we should simply avoid using vPCI for them. We however likely want to have the BARs of such devices permanently mapped into the dom0 physmap (if memory decoding is enabled). But for hidden devices it's not clear to me what needs to be RO, do we also need to protect the config space from dom0 accesses? It might be complicated for vPCI to deal with devices that have MSI-X interrupts in use by Xen for example. So I would suggest that at leeat for the time being we don't handle hidden devices with vPCI. We might want to do similar to RO devices and prevent write access to the config space for those also. > And just to clarify - a PCI dev containing a UART isn't "hidden" in the > above sense, but "r/o" (by virtue of calling pci_ro_device() on it). > But the issue reported long ago (and now re-discovered by Stefano) is > common to "r/o" and "hidden" devices (it's the "hidden" aspect that > counts here, which is common for both). Indeed, the issue is with any device assigned to dom_xen (or not assigned to the hardware domain, but accessible by the hardware domain from vPCI).
On 19.05.2023 10:55, Roger Pau Monné wrote: > On Fri, May 19, 2023 at 10:20:24AM +0200, Jan Beulich wrote: >> On 19.05.2023 09:38, Roger Pau Monné wrote: >>> On Fri, May 19, 2023 at 09:22:58AM +0200, Jan Beulich wrote: >>>> On 18.05.2023 12:34, Roger Pau Monné wrote: >>>>> On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: >>>>>> I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 >>>>>> test with the brand new gitlab-ci runner offered by Qubes. It is an AMD >>>>>> Zen3 system and we already have a few successful tests with it, see >>>>>> automation/gitlab-ci/test.yaml. >>>>>> >>>>>> We managed to narrow down the issue to a console problem. We are >>>>>> currently using console=com1 com1=115200,8n1,pci,msi as Xen command line >>>>>> options, it works with PV Dom0 and it is using a PCI UART card. >>>>>> >>>>>> In the case of Dom0 PVH: >>>>>> - it works without console=com1 >>>>>> - it works with console=com1 and with the patch appended below >>>>>> - it doesn't work otherwise and crashes with this error: >>>>>> https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK >>>>> >>>>> Jan also noticed this, and we have a ticket for it in gitlab: >>>>> >>>>> https://gitlab.com/xen-project/xen/-/issues/85 >>>>> >>>>>> What is the right way to fix it? >>>>> >>>>> I think the right fix is to simply avoid hidden devices from being >>>>> handled by vPCI, in any case such devices won't work propewrly with >>>>> vPCI because they are in use by Xen, and so any cached information by >>>>> vPCI is likely to become stable as Xen can modify the device without >>>>> vPCI noticing. >>>>> >>>>> I think the chunk below should help. It's not clear to me however how >>>>> hidden devices should be handled, is the intention to completely hide >>>>> such devices from dom0? >>>> >>>> No, Dom0 should still be able to see them in a (mostly) r/o fashion. >>>> Hence my earlier RFC patch making vPCI actually deal with them. >>> >>> What's the difference between a hidden device and one that's marked RO >>> then? >> >> pci_hide_device() simply makes the device unavailable for assignment >> (by having it owned by DomXEN). pci_ro_device() additionally adds the >> device to the segment's ro_map, thus protecting its config space from >> Dom0 writes. > > But above you mention that hidden devices should be visible to dom0 > "in a (mostly) r/o fashion.". I'm sorry for the confusion. My reply was in the context of the UART question here, which is a "r/o device" case. I didn't realize you were asking a question not directly related to such UART devices. > I understand that for RO devices the whole config space of the device > is RO, in which case we should simply avoid using vPCI for them. We > however likely want to have the BARs of such devices permanently > mapped into the dom0 physmap (if memory decoding is enabled). > > But for hidden devices it's not clear to me what needs to be RO, do we > also need to protect the config space from dom0 accesses? No, then they would be identical to r/o devices. Dom0 should be allowed to deal with hidden devices normally, I think, with the sole exception of them not being possible to assign to another domain (not even DomIO, if I'm not mistaken). > It might be complicated for vPCI to deal with devices that have MSI-X > interrupts in use by Xen for example. So I would suggest that at > leeat for the time being we don't handle hidden devices with vPCI. If any interrupts are in use on a device, I think it needs to be made "r/o", not just "hidden". Jan > We might want to do similar to RO devices and prevent write access to > the config space for those also. > >> And just to clarify - a PCI dev containing a UART isn't "hidden" in the >> above sense, but "r/o" (by virtue of calling pci_ro_device() on it). >> But the issue reported long ago (and now re-discovered by Stefano) is >> common to "r/o" and "hidden" devices (it's the "hidden" aspect that >> counts here, which is common for both). > > Indeed, the issue is with any device assigned to dom_xen (or not > assigned to the hardware domain, but accessible by the hardware domain > from vPCI).
On Thu, 18 May 2023, Roger Pau Monné wrote: > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > > Hi all, > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > > Zen3 system and we already have a few successful tests with it, see > > automation/gitlab-ci/test.yaml. > > > > We managed to narrow down the issue to a console problem. We are > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > > options, it works with PV Dom0 and it is using a PCI UART card. > > > > In the case of Dom0 PVH: > > - it works without console=com1 > > - it works with console=com1 and with the patch appended below > > - it doesn't work otherwise and crashes with this error: > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > Jan also noticed this, and we have a ticket for it in gitlab: > > https://gitlab.com/xen-project/xen/-/issues/85 > > > What is the right way to fix it? > > I think the right fix is to simply avoid hidden devices from being > handled by vPCI, in any case such devices won't work propewrly with > vPCI because they are in use by Xen, and so any cached information by > vPCI is likely to become stable as Xen can modify the device without > vPCI noticing. > > I think the chunk below should help. It's not clear to me however how > hidden devices should be handled, is the intention to completely hide > such devices from dom0? I like the idea but the patch below still failed: (XEN) Xen call trace: (XEN) [<ffff82d0402682b0>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d (XEN) [<ffff82d040268714>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 (XEN) [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a (XEN) [<ffff82d0404408b9>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 (XEN) [<ffff82d0404409b0>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b (XEN) [<ffff82d04027df08>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 (XEN) [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c (XEN) [<ffff82d04043cb1a>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd (XEN) [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53 (XEN) [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d (XEN) [<ffff82d040468914>] F construct_dom0+0x5c/0xb7 (XEN) [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 I haven't managed to figure out why yet. > --- > diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > index 652807a4a454..0baef3a8d3a1 100644 > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) > unsigned int i; > int rc = 0; > > - if ( !has_vpci(pdev->domain) ) > + if ( !has_vpci(pdev->domain) || > + /* > + * Ignore RO and hidden devices, those are in use by Xen and vPCI > + * won't work on them. > + */ > + pci_get_pdev(dom_xen, pdev->sbdf) ) > return 0; > > /* We should not get here twice for the same device. */ i
On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote: > On Thu, 18 May 2023, Roger Pau Monné wrote: > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > > > Hi all, > > > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > > > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > > > Zen3 system and we already have a few successful tests with it, see > > > automation/gitlab-ci/test.yaml. > > > > > > We managed to narrow down the issue to a console problem. We are > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > > > options, it works with PV Dom0 and it is using a PCI UART card. > > > > > > In the case of Dom0 PVH: > > > - it works without console=com1 > > > - it works with console=com1 and with the patch appended below > > > - it doesn't work otherwise and crashes with this error: > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > > > What is the right way to fix it? > > > > I think the right fix is to simply avoid hidden devices from being > > handled by vPCI, in any case such devices won't work propewrly with > > vPCI because they are in use by Xen, and so any cached information by > > vPCI is likely to become stable as Xen can modify the device without > > vPCI noticing. > > > > I think the chunk below should help. It's not clear to me however how > > hidden devices should be handled, is the intention to completely hide > > such devices from dom0? > > I like the idea but the patch below still failed: > > (XEN) Xen call trace: > (XEN) [<ffff82d0402682b0>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d > (XEN) [<ffff82d040268714>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 > (XEN) [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a > (XEN) [<ffff82d0404408b9>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 > (XEN) [<ffff82d0404409b0>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b > (XEN) [<ffff82d04027df08>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 > (XEN) [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c > (XEN) [<ffff82d04043cb1a>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd > (XEN) [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53 > (XEN) [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d > (XEN) [<ffff82d040468914>] F construct_dom0+0x5c/0xb7 > (XEN) [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d > (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 > > I haven't managed to figure out why yet. Do you have some other patches applied? I've tested this by manually hiding a device on my system and can confirm that without the fix I hit the ASSERT, but with the patch applied I no longer hit it. I have no idea how can you get into init_bars if the device is hidden and thus belongs to dom_xen. FWIW, I've used the following chunk to make a device RO and enable memory decoding: diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 07d1986d330a..e4de372af7c9 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1111,6 +1111,16 @@ static int __hwdom_init cf_check _setup_hwdom_pci_devices( { struct setup_hwdom *ctxt = arg; int bus, devfn; + pci_sbdf_t hide = { + .seg = 0, + .bus = 0, + .dev = 8, + .fn = 0, + }; + uint16_t cmd = pci_conf_read16(hide, PCI_COMMAND); + + pci_conf_write16(hide, PCI_COMMAND, cmd | PCI_COMMAND_MEMORY); + printk("hide dev: %d\n", pci_ro_device(0, 0, PCI_DEVFN(8, 0))); for ( bus = 0; bus < 256; bus++ ) {
On Fri, 19 May 2023, Roger Pau Monné wrote: > On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote: > > On Thu, 18 May 2023, Roger Pau Monné wrote: > > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > > > > Hi all, > > > > > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > > > > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > > > > Zen3 system and we already have a few successful tests with it, see > > > > automation/gitlab-ci/test.yaml. > > > > > > > > We managed to narrow down the issue to a console problem. We are > > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > > > > options, it works with PV Dom0 and it is using a PCI UART card. > > > > > > > > In the case of Dom0 PVH: > > > > - it works without console=com1 > > > > - it works with console=com1 and with the patch appended below > > > > - it doesn't work otherwise and crashes with this error: > > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > > > > > What is the right way to fix it? > > > > > > I think the right fix is to simply avoid hidden devices from being > > > handled by vPCI, in any case such devices won't work propewrly with > > > vPCI because they are in use by Xen, and so any cached information by > > > vPCI is likely to become stable as Xen can modify the device without > > > vPCI noticing. > > > > > > I think the chunk below should help. It's not clear to me however how > > > hidden devices should be handled, is the intention to completely hide > > > such devices from dom0? > > > > I like the idea but the patch below still failed: > > > > (XEN) Xen call trace: > > (XEN) [<ffff82d0402682b0>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d > > (XEN) [<ffff82d040268714>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 > > (XEN) [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a > > (XEN) [<ffff82d0404408b9>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 > > (XEN) [<ffff82d0404409b0>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b > > (XEN) [<ffff82d04027df08>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 > > (XEN) [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c > > (XEN) [<ffff82d04043cb1a>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd > > (XEN) [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53 > > (XEN) [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d > > (XEN) [<ffff82d040468914>] F construct_dom0+0x5c/0xb7 > > (XEN) [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d > > (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 > > > > I haven't managed to figure out why yet. > > Do you have some other patches applied? > > I've tested this by manually hiding a device on my system and can > confirm that without the fix I hit the ASSERT, but with the patch > applied I no longer hit it. I have no idea how can you get into > init_bars if the device is hidden and thus belongs to dom_xen. Unfortunately it doesn't work. Here are the full logs with interesting DEBUG messages (search for "DEBUG"): https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4318489116 https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/31c400caa7b86d4c14f9553138e02af18d3b3284 [...] (XEN) DEBUG ns16550_init_postirq 432 03:00.0 [...] (XEN) DEBUG vpci_add_handlers 75 0000:00:00.0 0^M (XEN) DEBUG vpci_add_handlers 75 0000:00:00.2 1^M (XEN) DEBUG vpci_add_handlers 78 0000:00:00.2^M (XEN) DEBUG vpci_add_handlers 75 0000:00:01.0 0^M (XEN) DEBUG vpci_add_handlers 75 0000:00:02.0 0^M (XEN) DEBUG vpci_add_handlers 75 0000:00:02.1 0^M Then crash on drivers/vpci/header.c#modify_bars vpci_add_handlers hasn't even been called yet for the interesing device, which is 03:00.0 (not 00:02.1). At that pointed I doubted myself on the previous test so I went back and re-run it again. I do confirm that the below patch instead (instead, not in addition) works: diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c index 212a9c49ae..24abfaae30 100644 --- a/xen/drivers/char/ns16550.c +++ b/xen/drivers/char/ns16550.c @@ -429,17 +429,6 @@ static void __init cf_check ns16550_init_postirq(struct serial_port *port) #ifdef NS16550_PCI if ( uart->bar || uart->ps_bdf_enable ) { - if ( uart->param && uart->param->mmio && - rangeset_add_range(mmio_ro_ranges, PFN_DOWN(uart->io_base), - PFN_UP(uart->io_base + uart->io_size) - 1) ) - printk(XENLOG_INFO "Error while adding MMIO range of device to mmio_ro_ranges\n"); - - if ( pci_ro_device(0, uart->ps_bdf[0], - PCI_DEVFN(uart->ps_bdf[1], uart->ps_bdf[2])) ) - printk(XENLOG_INFO "Could not mark config space of %02x:%02x.%u read-only.\n", - uart->ps_bdf[0], uart->ps_bdf[1], - uart->ps_bdf[2]); - if ( uart->msi ) { struct msi_info msi = {
On Fri, May 19, 2023 at 05:02:21PM -0700, Stefano Stabellini wrote: > On Fri, 19 May 2023, Roger Pau Monné wrote: > > On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote: > > > On Thu, 18 May 2023, Roger Pau Monné wrote: > > > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > > > > > Hi all, > > > > > > > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > > > > > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > > > > > Zen3 system and we already have a few successful tests with it, see > > > > > automation/gitlab-ci/test.yaml. > > > > > > > > > > We managed to narrow down the issue to a console problem. We are > > > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > > > > > options, it works with PV Dom0 and it is using a PCI UART card. > > > > > > > > > > In the case of Dom0 PVH: > > > > > - it works without console=com1 > > > > > - it works with console=com1 and with the patch appended below > > > > > - it doesn't work otherwise and crashes with this error: > > > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > > > > > > > What is the right way to fix it? > > > > > > > > I think the right fix is to simply avoid hidden devices from being > > > > handled by vPCI, in any case such devices won't work propewrly with > > > > vPCI because they are in use by Xen, and so any cached information by > > > > vPCI is likely to become stable as Xen can modify the device without > > > > vPCI noticing. > > > > > > > > I think the chunk below should help. It's not clear to me however how > > > > hidden devices should be handled, is the intention to completely hide > > > > such devices from dom0? > > > > > > I like the idea but the patch below still failed: > > > > > > (XEN) Xen call trace: > > > (XEN) [<ffff82d0402682b0>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d > > > (XEN) [<ffff82d040268714>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 > > > (XEN) [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a > > > (XEN) [<ffff82d0404408b9>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 > > > (XEN) [<ffff82d0404409b0>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b > > > (XEN) [<ffff82d04027df08>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 > > > (XEN) [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c > > > (XEN) [<ffff82d04043cb1a>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd > > > (XEN) [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53 > > > (XEN) [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d > > > (XEN) [<ffff82d040468914>] F construct_dom0+0x5c/0xb7 > > > (XEN) [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d > > > (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 > > > > > > I haven't managed to figure out why yet. > > > > Do you have some other patches applied? > > > > I've tested this by manually hiding a device on my system and can > > confirm that without the fix I hit the ASSERT, but with the patch > > applied I no longer hit it. I have no idea how can you get into > > init_bars if the device is hidden and thus belongs to dom_xen. > > Unfortunately it doesn't work. Here are the full logs with interesting > DEBUG messages (search for "DEBUG"): > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4318489116 > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/31c400caa7b86d4c14f9553138e02af18d3b3284 > > [...] > (XEN) DEBUG ns16550_init_postirq 432 03:00.0 > [...] > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.2 1^M > (XEN) DEBUG vpci_add_handlers 78 0000:00:00.2^M This device is not handled by vPCI either, and is not the console device. > (XEN) DEBUG vpci_add_handlers 75 0000:00:01.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:02.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:02.1 0^M > > Then crash on drivers/vpci/header.c#modify_bars Interesting. The crash however is a page fault instead of the previous assert: (XEN) ----[ Xen-4.18-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d040268312>] drivers/vpci/header.c#modify_bars+0x2b3/0x44d [...] (XEN) Xen call trace: (XEN) [<ffff82d040268312>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d (XEN) [<ffff82d040268776>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 (XEN) [<ffff82d040267412>] F vpci_add_handlers+0x134/0x16c (XEN) [<ffff82d0404408e5>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 (XEN) [<ffff82d0404409dc>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b (XEN) [<ffff82d04027df6a>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 (XEN) [<ffff82d040440e55>] F setup_hwdom_pci_devices+0x25/0x2c (XEN) [<ffff82d04043cb46>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd (XEN) [<ffff82d0404403f5>] F iommu_hwdom_init+0x49/0x53 (XEN) [<ffff82d04045177e>] F dom0_construct_pvh+0x160/0x138d (XEN) [<ffff82d040468934>] F construct_dom0+0x5c/0xb7 (XEN) [<ffff82d0404619e1>] F __start_xen+0x2423/0x272d (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 (XEN) (XEN) Pagetable walk from 000000000000002c: (XEN) L4[0x000] = 000000039015b063 ffffffffffffffff (XEN) L3[0x000] = 000000039015a063 ffffffffffffffff (XEN) L2[0x000] = 0000000390159063 ffffffffffffffff (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 000000000000002c (XEN) **************************************** Looks like a NULL pointer deref. Using addr2line it points at xen/drivers/vpci/header.c:314, which is: for_each_pdev ( pdev->domain, tmp ) { if ( tmp == pdev ) { /* * Need to store the device so it's not constified and defer_map * can modify it in case of error. */ dev = tmp; if ( !rom_only ) /* * If memory decoding is toggled avoid checking against the * same device, or else all regions will be removed from the * memory map in the unmap case. */ continue; } for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) { const struct vpci_bar *bar = &tmp->vpci->header.bars[i]; unsigned long start = PFN_DOWN(bar->addr); unsigned long end = PFN_DOWN(bar->addr + bar->size - 1); -> if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) || So we have a device added to the domain device list that doesn't have vPCI enabled. I'm unsure how we get into that situation in the current scenario, but Xen should be capable of coping with a domain having devices not handled by vPCI. Can you please try the following combined fix, it should also print the offending device. Thanks, Roger. --- diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index ec2e978a4e6b..0ff8e940fa8d 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) */ for_each_pdev ( pdev->domain, tmp ) { + if ( !tmp->vpci ) + { + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", + &tmp->sbdf, pdev->domain); + continue; + } + if ( tmp == pdev ) { /* diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 652807a4a454..0baef3a8d3a1 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) unsigned int i; int rc = 0; - if ( !has_vpci(pdev->domain) ) + if ( !has_vpci(pdev->domain) || + /* + * Ignore RO and hidden devices, those are in use by Xen and vPCI + * won't work on them. + */ + pci_get_pdev(dom_xen, pdev->sbdf) ) return 0; /* We should not get here twice for the same device. */
On Sat, 20 May 2023, Roger Pau Monné wrote: > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > index ec2e978a4e6b..0ff8e940fa8d 100644 > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) > */ > for_each_pdev ( pdev->domain, tmp ) > { > + if ( !tmp->vpci ) > + { > + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", > + &tmp->sbdf, pdev->domain); > + continue; > + } > + > if ( tmp == pdev ) > { > /* > diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > index 652807a4a454..0baef3a8d3a1 100644 > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) > unsigned int i; > int rc = 0; > > - if ( !has_vpci(pdev->domain) ) > + if ( !has_vpci(pdev->domain) || > + /* > + * Ignore RO and hidden devices, those are in use by Xen and vPCI > + * won't work on them. > + */ > + pci_get_pdev(dom_xen, pdev->sbdf) ) > return 0; > > /* We should not get here twice for the same device. */ Now this patch works! Thank you!! :-) You can check the full logs here https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4329259080 Is the patch ready to be upstreamed aside from the commit message?
On 23.05.2023 00:20, Stefano Stabellini wrote: > On Sat, 20 May 2023, Roger Pau Monné wrote: >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >> index ec2e978a4e6b..0ff8e940fa8d 100644 >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) >> */ >> for_each_pdev ( pdev->domain, tmp ) >> { >> + if ( !tmp->vpci ) >> + { >> + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", >> + &tmp->sbdf, pdev->domain); >> + continue; >> + } >> + >> if ( tmp == pdev ) >> { >> /* >> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c >> index 652807a4a454..0baef3a8d3a1 100644 >> --- a/xen/drivers/vpci/vpci.c >> +++ b/xen/drivers/vpci/vpci.c >> @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) >> unsigned int i; >> int rc = 0; >> >> - if ( !has_vpci(pdev->domain) ) >> + if ( !has_vpci(pdev->domain) || >> + /* >> + * Ignore RO and hidden devices, those are in use by Xen and vPCI >> + * won't work on them. >> + */ >> + pci_get_pdev(dom_xen, pdev->sbdf) ) >> return 0; >> >> /* We should not get here twice for the same device. */ > > > Now this patch works! Thank you!! :-) > > You can check the full logs here > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4329259080 > > Is the patch ready to be upstreamed aside from the commit message? I don't think so. vPCI ought to work on "r/o" devices. Out of curiosity, have you also tried my (hackish and hence RFC) patch [1]? Jan [1] https://lists.xen.org/archives/html/xen-devel/2021-08/msg01489.html
On Tue, 23 May 2023, Jan Beulich wrote: > On 23.05.2023 00:20, Stefano Stabellini wrote: > > On Sat, 20 May 2023, Roger Pau Monné wrote: > >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > >> index ec2e978a4e6b..0ff8e940fa8d 100644 > >> --- a/xen/drivers/vpci/header.c > >> +++ b/xen/drivers/vpci/header.c > >> @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) > >> */ > >> for_each_pdev ( pdev->domain, tmp ) > >> { > >> + if ( !tmp->vpci ) > >> + { > >> + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", > >> + &tmp->sbdf, pdev->domain); > >> + continue; > >> + } > >> + > >> if ( tmp == pdev ) > >> { > >> /* > >> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > >> index 652807a4a454..0baef3a8d3a1 100644 > >> --- a/xen/drivers/vpci/vpci.c > >> +++ b/xen/drivers/vpci/vpci.c > >> @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) > >> unsigned int i; > >> int rc = 0; > >> > >> - if ( !has_vpci(pdev->domain) ) > >> + if ( !has_vpci(pdev->domain) || > >> + /* > >> + * Ignore RO and hidden devices, those are in use by Xen and vPCI > >> + * won't work on them. > >> + */ > >> + pci_get_pdev(dom_xen, pdev->sbdf) ) > >> return 0; > >> > >> /* We should not get here twice for the same device. */ > > > > > > Now this patch works! Thank you!! :-) > > > > You can check the full logs here > > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4329259080 > > > > Is the patch ready to be upstreamed aside from the commit message? > > I don't think so. vPCI ought to work on "r/o" devices. Out of curiosity, > have you also tried my (hackish and hence RFC) patch [1]? > > [1] https://lists.xen.org/archives/html/xen-devel/2021-08/msg01489.html I don't know the code well enough to discuss what is the best solution. I'll let you and Roger figure it out. I would only kindly request to solve this in few days so that we can enable the real hardware PVH test in gitlab-ci as soon as possible. I think it is critical as it will allow us to catch many real issues going forward. For sure I can test your patch. BTW it is also really easy for you to do it your simply by pushing a branch to a repo on gitlab-ci and watch for the results. If you are interested let me know I can give you a tutorial, you just need to create a repo, and register the gitlab runner and voila'. This is the outcome: https://gitlab.com/xen-project/people/sstabellini/xen/-/pipelines/876808194 (XEN) PCI add device 0000:00:00.0 (XEN) PCI add device 0000:00:00.2 (XEN) PCI add device 0000:00:01.0 (XEN) PCI add device 0000:00:02.0 (XEN) Assertion 'd == dom_xen && system_state < SYS_STATE_active' failed at drivers/vpci/header.c:313 (XEN) ----[ Xen-4.18-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 14 (XEN) RIP: e008:[<ffff82d04026839e>] drivers/vpci/header.c#modify_bars+0x3ba/0x4a3 (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v0) (XEN) rax: ffff830390164000 rbx: ffff83038bd8a7f0 rcx: 0000000000000010 (XEN) rdx: ffff83038e3b7fff rsi: ffff83038e3c83a0 rdi: ffff83038e3c8398 (XEN) rbp: ffff83038e3b7c08 rsp: ffff83038e3b7b98 r8: 0000000000000001 (XEN) r9: ffff83038dcfa000 r10: 000000000000000e r11: 0000000000000000 (XEN) r12: 0000000000000007 r13: 00000000000dcc03 r14: ffff83039016ad10 (XEN) r15: 00000000000dcc00 cr0: 000000008005003b cr4: 0000000000f506e0 (XEN) cr3: 000000038ddfe000 cr2: ffff88814e3ff000 (XEN) fsb: 0000000000000000 gsb: ffff888149a00000 gss: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d04026839e> (drivers/vpci/header.c#modify_bars+0x3ba/0x4a3): (XEN) 3d c4 d1 37 00 02 76 c0 <0f> 0b 4c 8b 75 b0 0f ae e8 48 83 7d 98 00 74 2b (XEN) Xen stack trace from rsp=ffff83038e3b7b98: (XEN) 0000000000000002 ffff830390150230 ffff830390164000 ffff830390164158 (XEN) ffff830390150230 00ff830300000000 ffff83038dcf8b00 0000000000000003 (XEN) 0000000000000206 ffff83038bd8c010 0000000000000000 0000000000000002 (XEN) 0000000000000002 0000000000000004 ffff83038e3b7c18 ffff82d040268909 (XEN) ffff83038e3b7ca0 ffff82d040267998 000000118e3b7ca0 0000000000117803 (XEN) 0000000400000000 ffff830390150230 0000000000000000 0000000000000000 (XEN) 0000000400000002 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000002 0000000000000000 0000000000000004 0000000000000000 (XEN) ffff82d04041df40 ffff83038e3b7cd0 ffff82d0402cb649 0000001140332c9e (XEN) ffff83038e3b7d88 0000000000000000 ffff83038dd094a0 ffff83038e3b7d30 (XEN) ffff82d0402caa72 ffff83038e3b7ce0 00000cfc00000002 0000000000000000 (XEN) 0000000000000002 0000000000000000 ffff83038dd094a0 ffff83038e3b7d88 (XEN) 0000000000000001 0000000000000000 0000000000000000 ffff83038e3b7d58 (XEN) ffff82d0402cab08 0000000000000002 ffff83038dcfa000 ffff83038e3b7e28 (XEN) ffff83038e3b7dd0 ffff82d0402ba4ee 0000000000000cfc 0000000000000000 (XEN) ffff83038e38f000 0000000000000000 0000000000000cfc 0000000000000000 (XEN) 0000000200000001 0001000000000000 0000000000000002 0000000000000002 (XEN) 0000000000000000 ffff83038e3b7e44 ffff83038dcfa000 ffff83038e3b7e10 (XEN) ffff82d0402ba871 0000000000000000 ffff83038e3b7e44 0000000000000002 (XEN) 0000000000000cfc ffff83038dcf6000 0000000000000000 ffff83038e3b7e30 (XEN) Xen call trace: (XEN) [<ffff82d04026839e>] R drivers/vpci/header.c#modify_bars+0x3ba/0x4a3 (XEN) [<ffff82d040268909>] F drivers/vpci/header.c#cmd_write+0x2c/0x3b (XEN) [<ffff82d040267998>] F vpci_write+0x14c/0x268 (XEN) [<ffff82d0402cb649>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7 (XEN) [<ffff82d0402caa72>] F hvm_process_io_intercept+0x203/0x26f (XEN) [<ffff82d0402cab08>] F hvm_io_intercept+0x2a/0x4c (XEN) [<ffff82d0402ba4ee>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29a/0x5ee (XEN) [<ffff82d0402ba871>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a (XEN) [<ffff82d0402bbd10>] F hvmemul_do_pio_buffer+0x33/0x35 (XEN) [<ffff82d0402cb41d>] F handle_pio+0x6d/0x1b8 (XEN) [<ffff82d0402a2e4d>] F svm_vmexit_handler+0x10fe/0x18e2 (XEN) [<ffff82d0402034dc>] F svm_stgi_label+0x5/0x15 You can also check how I applied the patch here: https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/851e76bf0a1cf6f040b6e90795d216ebfcc069cc
On 24.05.2023 03:13, Stefano Stabellini wrote: > On Tue, 23 May 2023, Jan Beulich wrote: >> On 23.05.2023 00:20, Stefano Stabellini wrote: >>> On Sat, 20 May 2023, Roger Pau Monné wrote: >>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >>>> index ec2e978a4e6b..0ff8e940fa8d 100644 >>>> --- a/xen/drivers/vpci/header.c >>>> +++ b/xen/drivers/vpci/header.c >>>> @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) >>>> */ >>>> for_each_pdev ( pdev->domain, tmp ) >>>> { >>>> + if ( !tmp->vpci ) >>>> + { >>>> + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", >>>> + &tmp->sbdf, pdev->domain); >>>> + continue; >>>> + } >>>> + >>>> if ( tmp == pdev ) >>>> { >>>> /* >>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c >>>> index 652807a4a454..0baef3a8d3a1 100644 >>>> --- a/xen/drivers/vpci/vpci.c >>>> +++ b/xen/drivers/vpci/vpci.c >>>> @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) >>>> unsigned int i; >>>> int rc = 0; >>>> >>>> - if ( !has_vpci(pdev->domain) ) >>>> + if ( !has_vpci(pdev->domain) || >>>> + /* >>>> + * Ignore RO and hidden devices, those are in use by Xen and vPCI >>>> + * won't work on them. >>>> + */ >>>> + pci_get_pdev(dom_xen, pdev->sbdf) ) >>>> return 0; >>>> >>>> /* We should not get here twice for the same device. */ >>> >>> >>> Now this patch works! Thank you!! :-) >>> >>> You can check the full logs here >>> https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4329259080 >>> >>> Is the patch ready to be upstreamed aside from the commit message? >> >> I don't think so. vPCI ought to work on "r/o" devices. Out of curiosity, >> have you also tried my (hackish and hence RFC) patch [1]? >> >> [1] https://lists.xen.org/archives/html/xen-devel/2021-08/msg01489.html > > I don't know the code well enough to discuss what is the best solution. > I'll let you and Roger figure it out. I would only kindly request to > solve this in few days so that we can enable the real hardware PVH test > in gitlab-ci as soon as possible. I think it is critical as it will > allow us to catch many real issues going forward. > > For sure I can test your patch. BTW it is also really easy for you to do > it your simply by pushing a branch to a repo on gitlab-ci and watch for > the results. If you are interested let me know I can give you a > tutorial, you just need to create a repo, and register the gitlab runner > and voila'. > > This is the outcome: > > https://gitlab.com/xen-project/people/sstabellini/xen/-/pipelines/876808194 > > > (XEN) PCI add device 0000:00:00.0 > (XEN) PCI add device 0000:00:00.2 > (XEN) PCI add device 0000:00:01.0 > (XEN) PCI add device 0000:00:02.0 > (XEN) Assertion 'd == dom_xen && system_state < SYS_STATE_active' failed at drivers/vpci/header.c:313 I've sent an updated RFC patch, integrating a variant of Roger's patch at the same time. Jan
On 24.05.2023 03:13, Stefano Stabellini wrote: > On Tue, 23 May 2023, Jan Beulich wrote: >> On 23.05.2023 00:20, Stefano Stabellini wrote: >>> On Sat, 20 May 2023, Roger Pau Monné wrote: >>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >>>> index ec2e978a4e6b..0ff8e940fa8d 100644 >>>> --- a/xen/drivers/vpci/header.c >>>> +++ b/xen/drivers/vpci/header.c >>>> @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) >>>> */ >>>> for_each_pdev ( pdev->domain, tmp ) >>>> { >>>> + if ( !tmp->vpci ) >>>> + { >>>> + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", >>>> + &tmp->sbdf, pdev->domain); >>>> + continue; >>>> + } >>>> + >>>> if ( tmp == pdev ) >>>> { >>>> /* >>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c >>>> index 652807a4a454..0baef3a8d3a1 100644 >>>> --- a/xen/drivers/vpci/vpci.c >>>> +++ b/xen/drivers/vpci/vpci.c >>>> @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) >>>> unsigned int i; >>>> int rc = 0; >>>> >>>> - if ( !has_vpci(pdev->domain) ) >>>> + if ( !has_vpci(pdev->domain) || >>>> + /* >>>> + * Ignore RO and hidden devices, those are in use by Xen and vPCI >>>> + * won't work on them. >>>> + */ >>>> + pci_get_pdev(dom_xen, pdev->sbdf) ) >>>> return 0; >>>> >>>> /* We should not get here twice for the same device. */ >>> >>> >>> Now this patch works! Thank you!! :-) >>> >>> You can check the full logs here >>> https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4329259080 >>> >>> Is the patch ready to be upstreamed aside from the commit message? >> >> I don't think so. vPCI ought to work on "r/o" devices. Out of curiosity, >> have you also tried my (hackish and hence RFC) patch [1]? >> >> [1] https://lists.xen.org/archives/html/xen-devel/2021-08/msg01489.html > > I don't know the code well enough to discuss what is the best solution. > I'll let you and Roger figure it out. I would only kindly request to > solve this in few days so that we can enable the real hardware PVH test > in gitlab-ci as soon as possible. I think it is critical as it will > allow us to catch many real issues going forward. Funny. The problem has been pending for almost two years, and now you expect it to be addressed within a few days? > For sure I can test your patch. BTW it is also really easy for you to do > it your simply by pushing a branch to a repo on gitlab-ci and watch for > the results. If you are interested let me know I can give you a > tutorial, you just need to create a repo, and register the gitlab runner > and voila'. > > This is the outcome: > > https://gitlab.com/xen-project/people/sstabellini/xen/-/pipelines/876808194 > > > (XEN) PCI add device 0000:00:00.0 > (XEN) PCI add device 0000:00:00.2 > (XEN) PCI add device 0000:00:01.0 > (XEN) PCI add device 0000:00:02.0 > (XEN) Assertion 'd == dom_xen && system_state < SYS_STATE_active' failed at drivers/vpci/header.c:313 So this is an assertion my patch adds. The right side of the && may be too strict, but it's been too long to recall why exactly I thought the case should occur only before Dom0 starts. You may want to retry with that 2nd half of the condition dropped. Meanwhile I'll see to refresh my memory as to the reasons for the assertion in its present shape. Jan
On Tue, May 23, 2023 at 08:44:48AM +0200, Jan Beulich wrote: > On 23.05.2023 00:20, Stefano Stabellini wrote: > > On Sat, 20 May 2023, Roger Pau Monné wrote: > >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > >> index ec2e978a4e6b..0ff8e940fa8d 100644 > >> --- a/xen/drivers/vpci/header.c > >> +++ b/xen/drivers/vpci/header.c > >> @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) > >> */ > >> for_each_pdev ( pdev->domain, tmp ) > >> { > >> + if ( !tmp->vpci ) > >> + { > >> + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", > >> + &tmp->sbdf, pdev->domain); > >> + continue; > >> + } > >> + > >> if ( tmp == pdev ) > >> { > >> /* > >> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > >> index 652807a4a454..0baef3a8d3a1 100644 > >> --- a/xen/drivers/vpci/vpci.c > >> +++ b/xen/drivers/vpci/vpci.c > >> @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) > >> unsigned int i; > >> int rc = 0; > >> > >> - if ( !has_vpci(pdev->domain) ) > >> + if ( !has_vpci(pdev->domain) || > >> + /* > >> + * Ignore RO and hidden devices, those are in use by Xen and vPCI > >> + * won't work on them. > >> + */ > >> + pci_get_pdev(dom_xen, pdev->sbdf) ) > >> return 0; > >> > >> /* We should not get here twice for the same device. */ > > > > > > Now this patch works! Thank you!! :-) > > > > You can check the full logs here > > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4329259080 > > > > Is the patch ready to be upstreamed aside from the commit message? > > I don't think so. vPCI ought to work on "r/o" devices. Out of curiosity, > have you also tried my (hackish and hence RFC) patch [1]? For r/o devices there should be no need of vPCI handlers, reading the config space of such devices can be done directly. There's some work to be done for hidden devices, as for those dom0 has write access to the config space and thus needs vPCI to be setup properly. The change to modify_bars() in order to handle devices without vpci populated is a bugfix, as it's already possible to have devices assigned to a domain that don't have vpci setup, if the call to vpci_add_handlers() in setup_one_hwdom_device() fails. That one could go in separately of the rest of the work in order to enable support for hidden devices. Roger.
On 23.05.2023 12:59, Roger Pau Monné wrote: > On Tue, May 23, 2023 at 08:44:48AM +0200, Jan Beulich wrote: >> On 23.05.2023 00:20, Stefano Stabellini wrote: >>> On Sat, 20 May 2023, Roger Pau Monné wrote: >>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >>>> index ec2e978a4e6b..0ff8e940fa8d 100644 >>>> --- a/xen/drivers/vpci/header.c >>>> +++ b/xen/drivers/vpci/header.c >>>> @@ -289,6 +289,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) >>>> */ >>>> for_each_pdev ( pdev->domain, tmp ) >>>> { >>>> + if ( !tmp->vpci ) >>>> + { >>>> + printk(XENLOG_G_WARNING "%pp: not handled by vPCI for %pd\n", >>>> + &tmp->sbdf, pdev->domain); >>>> + continue; >>>> + } >>>> + >>>> if ( tmp == pdev ) >>>> { >>>> /* >>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c >>>> index 652807a4a454..0baef3a8d3a1 100644 >>>> --- a/xen/drivers/vpci/vpci.c >>>> +++ b/xen/drivers/vpci/vpci.c >>>> @@ -72,7 +72,12 @@ int vpci_add_handlers(struct pci_dev *pdev) >>>> unsigned int i; >>>> int rc = 0; >>>> >>>> - if ( !has_vpci(pdev->domain) ) >>>> + if ( !has_vpci(pdev->domain) || >>>> + /* >>>> + * Ignore RO and hidden devices, those are in use by Xen and vPCI >>>> + * won't work on them. >>>> + */ >>>> + pci_get_pdev(dom_xen, pdev->sbdf) ) >>>> return 0; >>>> >>>> /* We should not get here twice for the same device. */ >>> >>> >>> Now this patch works! Thank you!! :-) >>> >>> You can check the full logs here >>> https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4329259080 >>> >>> Is the patch ready to be upstreamed aside from the commit message? >> >> I don't think so. vPCI ought to work on "r/o" devices. Out of curiosity, >> have you also tried my (hackish and hence RFC) patch [1]? > > For r/o devices there should be no need of vPCI handlers, reading the > config space of such devices can be done directly. > > There's some work to be done for hidden devices, as for those dom0 has > write access to the config space and thus needs vPCI to be setup > properly. But then isn't it going to complicate things when dealing with r/o and hidden devices differently? > The change to modify_bars() in order to handle devices without vpci > populated is a bugfix, as it's already possible to have devices > assigned to a domain that don't have vpci setup, if the call to > vpci_add_handlers() in setup_one_hwdom_device() fails. That one could > go in separately of the rest of the work in order to enable support > for hidden devices. You saying "assigned to a domain" makes this sound more problematic than it probably is: If it really was any domain other than Dom0, I think there would be a security concern. Yet even for Dom0 I wonder what good can come out of there not being proper vPCI setup for a device. Jan
On Sat, May 20, 2023 at 12:28:59PM +0200, Roger Pau Monné wrote: > On Fri, May 19, 2023 at 05:02:21PM -0700, Stefano Stabellini wrote: > > On Fri, 19 May 2023, Roger Pau Monné wrote: > > > On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote: > > > > On Thu, 18 May 2023, Roger Pau Monné wrote: > > > > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > > > > > > Hi all, > > > > > > > > > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > > > > > > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > > > > > > Zen3 system and we already have a few successful tests with it, see > > > > > > automation/gitlab-ci/test.yaml. > > > > > > > > > > > > We managed to narrow down the issue to a console problem. We are > > > > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > > > > > > options, it works with PV Dom0 and it is using a PCI UART card. > > > > > > > > > > > > In the case of Dom0 PVH: > > > > > > - it works without console=com1 > > > > > > - it works with console=com1 and with the patch appended below > > > > > > - it doesn't work otherwise and crashes with this error: > > > > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > > > > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > > > > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > > > > > > > > > What is the right way to fix it? > > > > > > > > > > I think the right fix is to simply avoid hidden devices from being > > > > > handled by vPCI, in any case such devices won't work propewrly with > > > > > vPCI because they are in use by Xen, and so any cached information by > > > > > vPCI is likely to become stable as Xen can modify the device without > > > > > vPCI noticing. > > > > > > > > > > I think the chunk below should help. It's not clear to me however how > > > > > hidden devices should be handled, is the intention to completely hide > > > > > such devices from dom0? > > > > > > > > I like the idea but the patch below still failed: > > > > > > > > (XEN) Xen call trace: > > > > (XEN) [<ffff82d0402682b0>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d > > > > (XEN) [<ffff82d040268714>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 > > > > (XEN) [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a > > > > (XEN) [<ffff82d0404408b9>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 > > > > (XEN) [<ffff82d0404409b0>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b > > > > (XEN) [<ffff82d04027df08>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 > > > > (XEN) [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c > > > > (XEN) [<ffff82d04043cb1a>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd > > > > (XEN) [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53 > > > > (XEN) [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d > > > > (XEN) [<ffff82d040468914>] F construct_dom0+0x5c/0xb7 > > > > (XEN) [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d > > > > (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 > > > > > > > > I haven't managed to figure out why yet. > > > > > > Do you have some other patches applied? > > > > > > I've tested this by manually hiding a device on my system and can > > > confirm that without the fix I hit the ASSERT, but with the patch > > > applied I no longer hit it. I have no idea how can you get into > > > init_bars if the device is hidden and thus belongs to dom_xen. > > > > Unfortunately it doesn't work. Here are the full logs with interesting > > DEBUG messages (search for "DEBUG"): > > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4318489116 > > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/31c400caa7b86d4c14f9553138e02af18d3b3284 > > > > [...] > > (XEN) DEBUG ns16550_init_postirq 432 03:00.0 > > [...] > > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.0 0^M > > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.2 1^M > > (XEN) DEBUG vpci_add_handlers 78 0000:00:00.2^M > > This device is not handled by vPCI either, and is not the console > device. That's IOMMU. Full lspci: 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Root Complex [1022:14b5] (rev 01) 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h IOMMU [1022:14b6] 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe GPP Bridge [1022:14ba] 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe GPP Bridge [1022:14ba] 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe GPP Bridge [1022:14ba] 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel [1022:14cd] 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h Internal PCIe GPP Bridge [1022:14b9] (rev 10) 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h Internal PCIe GPP Bridge [1022:14b9] (rev 10) 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h-19h Internal PCIe GPP Bridge [1022:14b9] (rev 10) 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71) 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51) 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 0 [1022:1679] 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 1 [1022:167a] 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 2 [1022:167b] 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 3 [1022:167c] 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 4 [1022:167d] 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 5 [1022:167e] 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 6 [1022:167f] 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 7 [1022:1680] 01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03) 02:00.0 Network controller [0280]: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz [14c3:0608] 03:00.0 Serial controller [0700]: Exar Corp. XR17V3521 Dual PCIe UART [13a8:0352] (rev 03) 34:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev 0a) 34:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640] 34:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] VanGogh PSP/CCP [1022:1649] 34:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #3 [1022:161d] 34:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #4 [1022:161e] 34:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 60) 34:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller [1022:15e3] 35:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev a1) 36:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #8 [1022:161f] 36:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #5 [1022:15d6] 36:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #6 [1022:15d7] 36:00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4/Thunderbolt NHI controller #1 [1022:162e] -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
On Fri, May 19, 2023 at 05:02:21PM -0700, Stefano Stabellini wrote: > On Fri, 19 May 2023, Roger Pau Monné wrote: > > On Thu, May 18, 2023 at 06:46:52PM -0700, Stefano Stabellini wrote: > > > On Thu, 18 May 2023, Roger Pau Monné wrote: > > > > On Wed, May 17, 2023 at 05:59:31PM -0700, Stefano Stabellini wrote: > > > > > Hi all, > > > > > > > > > > I have run into another PVH Dom0 issue. I am trying to enable a PVH Dom0 > > > > > test with the brand new gitlab-ci runner offered by Qubes. It is an AMD > > > > > Zen3 system and we already have a few successful tests with it, see > > > > > automation/gitlab-ci/test.yaml. > > > > > > > > > > We managed to narrow down the issue to a console problem. We are > > > > > currently using console=com1 com1=115200,8n1,pci,msi as Xen command line > > > > > options, it works with PV Dom0 and it is using a PCI UART card. > > > > > > > > > > In the case of Dom0 PVH: > > > > > - it works without console=com1 > > > > > - it works with console=com1 and with the patch appended below > > > > > - it doesn't work otherwise and crashes with this error: > > > > > https://matrix-client.matrix.org/_matrix/media/r0/download/invisiblethingslab.com/uzcmldIqHptFZuxqsJtviLZK > > > > > > > > Jan also noticed this, and we have a ticket for it in gitlab: > > > > > > > > https://gitlab.com/xen-project/xen/-/issues/85 > > > > > > > > > What is the right way to fix it? > > > > > > > > I think the right fix is to simply avoid hidden devices from being > > > > handled by vPCI, in any case such devices won't work propewrly with > > > > vPCI because they are in use by Xen, and so any cached information by > > > > vPCI is likely to become stable as Xen can modify the device without > > > > vPCI noticing. > > > > > > > > I think the chunk below should help. It's not clear to me however how > > > > hidden devices should be handled, is the intention to completely hide > > > > such devices from dom0? > > > > > > I like the idea but the patch below still failed: > > > > > > (XEN) Xen call trace: > > > (XEN) [<ffff82d0402682b0>] R drivers/vpci/header.c#modify_bars+0x2b3/0x44d > > > (XEN) [<ffff82d040268714>] F drivers/vpci/header.c#init_bars+0x2ca/0x372 > > > (XEN) [<ffff82d0402673b3>] F vpci_add_handlers+0xd5/0x10a > > > (XEN) [<ffff82d0404408b9>] F drivers/passthrough/pci.c#setup_one_hwdom_device+0x73/0x97 > > > (XEN) [<ffff82d0404409b0>] F drivers/passthrough/pci.c#_setup_hwdom_pci_devices+0x63/0x15b > > > (XEN) [<ffff82d04027df08>] F drivers/passthrough/pci.c#pci_segments_iterate+0x43/0x69 > > > (XEN) [<ffff82d040440e29>] F setup_hwdom_pci_devices+0x25/0x2c > > > (XEN) [<ffff82d04043cb1a>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0xd4/0xdd > > > (XEN) [<ffff82d0404403c9>] F iommu_hwdom_init+0x49/0x53 > > > (XEN) [<ffff82d04045175e>] F dom0_construct_pvh+0x160/0x138d > > > (XEN) [<ffff82d040468914>] F construct_dom0+0x5c/0xb7 > > > (XEN) [<ffff82d0404619c1>] F __start_xen+0x2423/0x272d > > > (XEN) [<ffff82d040203344>] F __high_start+0x94/0xa0 > > > > > > I haven't managed to figure out why yet. > > > > Do you have some other patches applied? > > > > I've tested this by manually hiding a device on my system and can > > confirm that without the fix I hit the ASSERT, but with the patch > > applied I no longer hit it. I have no idea how can you get into > > init_bars if the device is hidden and thus belongs to dom_xen. > > Unfortunately it doesn't work. Here are the full logs with interesting > DEBUG messages (search for "DEBUG"): > https://gitlab.com/xen-project/people/sstabellini/xen/-/jobs/4318489116 > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/31c400caa7b86d4c14f9553138e02af18d3b3284 > > [...] > (XEN) DEBUG ns16550_init_postirq 432 03:00.0 > [...] > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:00.2 1^M > (XEN) DEBUG vpci_add_handlers 78 0000:00:00.2^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:01.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:02.0 0^M > (XEN) DEBUG vpci_add_handlers 75 0000:00:02.1 0^M > > Then crash on drivers/vpci/header.c#modify_bars > > vpci_add_handlers hasn't even been called yet for the interesing device, > which is 03:00.0 (not 00:02.1). This device is behind a bridge, could it maybe be related to marking (part of) BAR of such device as R/O? FWIW, the bridge is at 00:02.4. > At that pointed I doubted myself on the previous test so I went back and > re-run it again. I do confirm that the below patch instead (instead, not > in addition) works: > > > diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c > index 212a9c49ae..24abfaae30 100644 > --- a/xen/drivers/char/ns16550.c > +++ b/xen/drivers/char/ns16550.c > @@ -429,17 +429,6 @@ static void __init cf_check ns16550_init_postirq(struct serial_port *port) > #ifdef NS16550_PCI > if ( uart->bar || uart->ps_bdf_enable ) > { > - if ( uart->param && uart->param->mmio && > - rangeset_add_range(mmio_ro_ranges, PFN_DOWN(uart->io_base), > - PFN_UP(uart->io_base + uart->io_size) - 1) ) > - printk(XENLOG_INFO "Error while adding MMIO range of device to mmio_ro_ranges\n"); > - > - if ( pci_ro_device(0, uart->ps_bdf[0], > - PCI_DEVFN(uart->ps_bdf[1], uart->ps_bdf[2])) ) > - printk(XENLOG_INFO "Could not mark config space of %02x:%02x.%u read-only.\n", > - uart->ps_bdf[0], uart->ps_bdf[1], > - uart->ps_bdf[2]); > - > if ( uart->msi ) > { > struct msi_info msi = { -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
© 2016 - 2024 Red Hat, Inc.