[v1] xen/x86: PVH Dom0 fixes and fallout adjustments

[PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Jan Beulich 4 years, 5 months ago

In order to try to debug hypervisor side breakage from XSA-378 I found
myself urged to finally give PVH Dom0 a try. Sadly things didn't work
quite as expected. In the course of investigating these issues I actually
spotted one piece of PV Dom0 breakage as well, a fix for which is also
included here.

There are two immediate remaining issues (also mentioned in affected
patches):

1) It is not clear to me how PCI device reporting is to work. PV Dom0
   reports devices as they're discovered, including ones the hypervisor
   may not have been able to discover itself (ones on segments other
   than 0 or hotplugged ones). The respective hypercall, however, is
   inaccessible to PVH Dom0. Depending on the answer to this, either
   the hypervisor will need changing (to permit the call) or patch 2
   here will need further refinement.

2) Dom0, unlike in the PV case, cannot access the screen (to use as a
   console) when in a non-default mode (i.e. not 80x25 text), as the
   necessary information (in particular about VESA-bases LFB modes) is
   not communicated. On the hypervisor side this looks like deliberate
   behavior, but it is unclear to me what the intentions were towards
   an alternative model. (X may be able to access the screen depending
   on whether it has a suitable driver besides the presently unusable
   /dev/fb<N> based one.)

1: xen/x86: prevent PVH type from getting clobbered
2: xen/x86: allow PVH Dom0 without XEN_PV=y
3: xen/x86: make "earlyprintk=xen" work better for PVH Dom0
4: xen/x86: allow "earlyprintk=xen" to work for PV Dom0
5: xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
6: xen/x86: generalize preferred console model from PV to PVH Dom0
7: xen/x86: hook up xen_banner() also for PVH 
8: x86/PVH: adjust function/data placement
9: xen/x86: adjust data placement

Jan

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Roger Pau Monné 4 years, 4 months ago

On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
> In order to try to debug hypervisor side breakage from XSA-378 I found
> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
> quite as expected. In the course of investigating these issues I actually
> spotted one piece of PV Dom0 breakage as well, a fix for which is also
> included here.
> 
> There are two immediate remaining issues (also mentioned in affected
> patches):
> 
> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
>    reports devices as they're discovered, including ones the hypervisor
>    may not have been able to discover itself (ones on segments other
>    than 0 or hotplugged ones). The respective hypercall, however, is
>    inaccessible to PVH Dom0. Depending on the answer to this, either
>    the hypervisor will need changing (to permit the call) or patch 2
>    here will need further refinement.

I would rather prefer if we could limit the hypercall usage to only
report hotplugged segments to Xen. Then Xen would have to scan the
segment when reported and add any devices found.

Such hypercall must be used before dom0 tries to access any device, as
otherwise the BARs won't be mapped in the second stage translation and
the traps for the MCFG area won't be setup either.

> 
> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
>    console) when in a non-default mode (i.e. not 80x25 text), as the
>    necessary information (in particular about VESA-bases LFB modes) is
>    not communicated. On the hypervisor side this looks like deliberate
>    behavior, but it is unclear to me what the intentions were towards
>    an alternative model. (X may be able to access the screen depending
>    on whether it has a suitable driver besides the presently unusable
>    /dev/fb<N> based one.)

I had to admit most of my boxes are headless servers, albeit I have
one NUC I can use to test gfx stuff, so I don't really use gfx output
with Xen.

As I understand such information is fetched from the BIOS and passed
into Xen, which should then hand it over to the dom0 kernel?

I guess the only way for Linux dom0 kernel to fetch that information
would be to emulate the BIOS or drop into realmode and issue the BIOS
calls?

Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
info using the PV EFI interface?

Thanks, Roger.

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Jan Beulich 4 years, 4 months ago

On 14.09.2021 10:32, Roger Pau Monné wrote:
> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
>> In order to try to debug hypervisor side breakage from XSA-378 I found
>> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
>> quite as expected. In the course of investigating these issues I actually
>> spotted one piece of PV Dom0 breakage as well, a fix for which is also
>> included here.
>>
>> There are two immediate remaining issues (also mentioned in affected
>> patches):
>>
>> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
>>    reports devices as they're discovered, including ones the hypervisor
>>    may not have been able to discover itself (ones on segments other
>>    than 0 or hotplugged ones). The respective hypercall, however, is
>>    inaccessible to PVH Dom0. Depending on the answer to this, either
>>    the hypervisor will need changing (to permit the call) or patch 2
>>    here will need further refinement.
> 
> I would rather prefer if we could limit the hypercall usage to only
> report hotplugged segments to Xen. Then Xen would have to scan the
> segment when reported and add any devices found.
> 
> Such hypercall must be used before dom0 tries to access any device, as
> otherwise the BARs won't be mapped in the second stage translation and
> the traps for the MCFG area won't be setup either.

This might work if hotplugging would only ever be of segments, and not
of individual devices. Yet the latter is, I think, a common case (as
far as hotplugging itself is "common").

Also don't forget about SR-IOV VFs - they would typically not be there
when booting. They would materialize when the PF driver initializes
the device. This is, I think, something that can be dealt with by
intercepting writes to the SR-IOV capability. But I wonder whether
there might be other cases where devices become "visible" only while
the Dom0 kernel is already running.

>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
>>    console) when in a non-default mode (i.e. not 80x25 text), as the
>>    necessary information (in particular about VESA-bases LFB modes) is
>>    not communicated. On the hypervisor side this looks like deliberate
>>    behavior, but it is unclear to me what the intentions were towards
>>    an alternative model. (X may be able to access the screen depending
>>    on whether it has a suitable driver besides the presently unusable
>>    /dev/fb<N> based one.)
> 
> I had to admit most of my boxes are headless servers, albeit I have
> one NUC I can use to test gfx stuff, so I don't really use gfx output
> with Xen.
> 
> As I understand such information is fetched from the BIOS and passed
> into Xen, which should then hand it over to the dom0 kernel?

That's how PV Dom0 learns of the information, yes. See
fill_console_start_info(). (I'm in the process of eliminating the
need for some of the "fetch from BIOS" in Xen right now, but that's
not going to get us as far as being able to delete that code, no
matter how much in particular Andrew would like that to happen.)

> I guess the only way for Linux dom0 kernel to fetch that information
> would be to emulate the BIOS or drop into realmode and issue the BIOS
> calls?

Native Linux gets this information passed from the boot loader, I think
(except in the EFI case, as per below).

> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
> info using the PV EFI interface?

There it's EFI boot services functions which can be invoked before
leaving boot services (in the native case). Aiui the PVH entry point
lives logically past any EFI boot services interaction, and hence
using them is not an option (if there was EFI firmware present in Dom0
in the first place, which I consider difficult all by itself - this
can't be the physical system's firmware, but I also don't see where
virtual firmware would be taken from).

There is no PV EFI interface to obtain video information. With the
needed information getting passed via start_info, PV has no need for
such, and I would be hesitant to add a fundamentally redundant
interface for PVH. The more that the information needed isn't EFI-
specific at all.

Jan

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Roger Pau Monné 4 years, 4 months ago

On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
> On 14.09.2021 10:32, Roger Pau Monné wrote:
> > On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
> >> In order to try to debug hypervisor side breakage from XSA-378 I found
> >> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
> >> quite as expected. In the course of investigating these issues I actually
> >> spotted one piece of PV Dom0 breakage as well, a fix for which is also
> >> included here.
> >>
> >> There are two immediate remaining issues (also mentioned in affected
> >> patches):
> >>
> >> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
> >>    reports devices as they're discovered, including ones the hypervisor
> >>    may not have been able to discover itself (ones on segments other
> >>    than 0 or hotplugged ones). The respective hypercall, however, is
> >>    inaccessible to PVH Dom0. Depending on the answer to this, either
> >>    the hypervisor will need changing (to permit the call) or patch 2
> >>    here will need further refinement.
> > 
> > I would rather prefer if we could limit the hypercall usage to only
> > report hotplugged segments to Xen. Then Xen would have to scan the
> > segment when reported and add any devices found.
> > 
> > Such hypercall must be used before dom0 tries to access any device, as
> > otherwise the BARs won't be mapped in the second stage translation and
> > the traps for the MCFG area won't be setup either.
> 
> This might work if hotplugging would only ever be of segments, and not
> of individual devices. Yet the latter is, I think, a common case (as
> far as hotplugging itself is "common").

Right, I agree to use hypercalls to report either hotplugged segments
or devices. However I would like to avoid mandating usage of the
hypercall for non-hotplug stuff, as then OSes not having hotplug
support don't really need to care about making use of those
hypercalls.

> Also don't forget about SR-IOV VFs - they would typically not be there
> when booting. They would materialize when the PF driver initializes
> the device. This is, I think, something that can be dealt with by
> intercepting writes to the SR-IOV capability.

My plan was to indeed trap SR-IOV capability accesses, see:

https://lore.kernel.org/xen-devel/20180717094830.54806-1-roger.pau@citrix.com/

I just don't have time ATM to continue this work.

> But I wonder whether
> there might be other cases where devices become "visible" only while
> the Dom0 kernel is already running.

I would consider those kind of hotplug devices, and hence would
require the use of the hypercall in order to notify Xen about them.

> >> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
> >>    console) when in a non-default mode (i.e. not 80x25 text), as the
> >>    necessary information (in particular about VESA-bases LFB modes) is
> >>    not communicated. On the hypervisor side this looks like deliberate
> >>    behavior, but it is unclear to me what the intentions were towards
> >>    an alternative model. (X may be able to access the screen depending
> >>    on whether it has a suitable driver besides the presently unusable
> >>    /dev/fb<N> based one.)
> > 
> > I had to admit most of my boxes are headless servers, albeit I have
> > one NUC I can use to test gfx stuff, so I don't really use gfx output
> > with Xen.
> > 
> > As I understand such information is fetched from the BIOS and passed
> > into Xen, which should then hand it over to the dom0 kernel?
> 
> That's how PV Dom0 learns of the information, yes. See
> fill_console_start_info(). (I'm in the process of eliminating the
> need for some of the "fetch from BIOS" in Xen right now, but that's
> not going to get us as far as being able to delete that code, no
> matter how much in particular Andrew would like that to happen.)
> 
> > I guess the only way for Linux dom0 kernel to fetch that information
> > would be to emulate the BIOS or drop into realmode and issue the BIOS
> > calls?
> 
> Native Linux gets this information passed from the boot loader, I think
> (except in the EFI case, as per below).
> 
> > Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
> > info using the PV EFI interface?
> 
> There it's EFI boot services functions which can be invoked before
> leaving boot services (in the native case). Aiui the PVH entry point
> lives logically past any EFI boot services interaction, and hence
> using them is not an option (if there was EFI firmware present in Dom0
> in the first place, which I consider difficult all by itself - this
> can't be the physical system's firmware, but I also don't see where
> virtual firmware would be taken from).
> 
> There is no PV EFI interface to obtain video information. With the
> needed information getting passed via start_info, PV has no need for
> such, and I would be hesitant to add a fundamentally redundant
> interface for PVH. The more that the information needed isn't EFI-
> specific at all.

I think our only option is to expand the HVM start info information to
convey that data from Xen into dom0.

Thanks, Roger.

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Jan Beulich 4 years, 4 months ago

On 14.09.2021 13:15, Roger Pau Monné wrote:
> On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
>> On 14.09.2021 10:32, Roger Pau Monné wrote:
>>> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
>>>> In order to try to debug hypervisor side breakage from XSA-378 I found
>>>> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
>>>> quite as expected. In the course of investigating these issues I actually
>>>> spotted one piece of PV Dom0 breakage as well, a fix for which is also
>>>> included here.
>>>>
>>>> There are two immediate remaining issues (also mentioned in affected
>>>> patches):
>>>>
>>>> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
>>>>    reports devices as they're discovered, including ones the hypervisor
>>>>    may not have been able to discover itself (ones on segments other
>>>>    than 0 or hotplugged ones). The respective hypercall, however, is
>>>>    inaccessible to PVH Dom0. Depending on the answer to this, either
>>>>    the hypervisor will need changing (to permit the call) or patch 2
>>>>    here will need further refinement.
>>>
>>> I would rather prefer if we could limit the hypercall usage to only
>>> report hotplugged segments to Xen. Then Xen would have to scan the
>>> segment when reported and add any devices found.
>>>
>>> Such hypercall must be used before dom0 tries to access any device, as
>>> otherwise the BARs won't be mapped in the second stage translation and
>>> the traps for the MCFG area won't be setup either.
>>
>> This might work if hotplugging would only ever be of segments, and not
>> of individual devices. Yet the latter is, I think, a common case (as
>> far as hotplugging itself is "common").
> 
> Right, I agree to use hypercalls to report either hotplugged segments
> or devices. However I would like to avoid mandating usage of the
> hypercall for non-hotplug stuff, as then OSes not having hotplug
> support don't really need to care about making use of those
> hypercalls.
> 
>> Also don't forget about SR-IOV VFs - they would typically not be there
>> when booting. They would materialize when the PF driver initializes
>> the device. This is, I think, something that can be dealt with by
>> intercepting writes to the SR-IOV capability.
> 
> My plan was to indeed trap SR-IOV capability accesses, see:
> 
> https://lore.kernel.org/xen-devel/20180717094830.54806-1-roger.pau@citrix.com/
> 
> I just don't have time ATM to continue this work.
> 
>> But I wonder whether
>> there might be other cases where devices become "visible" only while
>> the Dom0 kernel is already running.
> 
> I would consider those kind of hotplug devices, and hence would
> require the use of the hypercall in order to notify Xen about them.

So what does this mean for the one patch? Should drivers/xen/pci.c
then be built for PVH (and then have logic added to filter boot
time device discovery), or should I restrict this to be PV-only (and
PVH would get some completely different logic added later)?

>>>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
>>>>    console) when in a non-default mode (i.e. not 80x25 text), as the
>>>>    necessary information (in particular about VESA-bases LFB modes) is
>>>>    not communicated. On the hypervisor side this looks like deliberate
>>>>    behavior, but it is unclear to me what the intentions were towards
>>>>    an alternative model. (X may be able to access the screen depending
>>>>    on whether it has a suitable driver besides the presently unusable
>>>>    /dev/fb<N> based one.)
>>>
>>> I had to admit most of my boxes are headless servers, albeit I have
>>> one NUC I can use to test gfx stuff, so I don't really use gfx output
>>> with Xen.
>>>
>>> As I understand such information is fetched from the BIOS and passed
>>> into Xen, which should then hand it over to the dom0 kernel?
>>
>> That's how PV Dom0 learns of the information, yes. See
>> fill_console_start_info(). (I'm in the process of eliminating the
>> need for some of the "fetch from BIOS" in Xen right now, but that's
>> not going to get us as far as being able to delete that code, no
>> matter how much in particular Andrew would like that to happen.)
>>
>>> I guess the only way for Linux dom0 kernel to fetch that information
>>> would be to emulate the BIOS or drop into realmode and issue the BIOS
>>> calls?
>>
>> Native Linux gets this information passed from the boot loader, I think
>> (except in the EFI case, as per below).
>>
>>> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
>>> info using the PV EFI interface?
>>
>> There it's EFI boot services functions which can be invoked before
>> leaving boot services (in the native case). Aiui the PVH entry point
>> lives logically past any EFI boot services interaction, and hence
>> using them is not an option (if there was EFI firmware present in Dom0
>> in the first place, which I consider difficult all by itself - this
>> can't be the physical system's firmware, but I also don't see where
>> virtual firmware would be taken from).
>>
>> There is no PV EFI interface to obtain video information. With the
>> needed information getting passed via start_info, PV has no need for
>> such, and I would be hesitant to add a fundamentally redundant
>> interface for PVH. The more that the information needed isn't EFI-
>> specific at all.
> 
> I think our only option is to expand the HVM start info information to
> convey that data from Xen into dom0.

PHV doesn't use the ordinary start_info, does it?

Jan

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Roger Pau Monné 4 years, 4 months ago

On Tue, Sep 14, 2021 at 01:58:29PM +0200, Jan Beulich wrote:
> On 14.09.2021 13:15, Roger Pau Monné wrote:
> > On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
> >> On 14.09.2021 10:32, Roger Pau Monné wrote:
> >>> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
> >>>> In order to try to debug hypervisor side breakage from XSA-378 I found
> >>>> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
> >>>> quite as expected. In the course of investigating these issues I actually
> >>>> spotted one piece of PV Dom0 breakage as well, a fix for which is also
> >>>> included here.
> >>>>
> >>>> There are two immediate remaining issues (also mentioned in affected
> >>>> patches):
> >>>>
> >>>> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
> >>>>    reports devices as they're discovered, including ones the hypervisor
> >>>>    may not have been able to discover itself (ones on segments other
> >>>>    than 0 or hotplugged ones). The respective hypercall, however, is
> >>>>    inaccessible to PVH Dom0. Depending on the answer to this, either
> >>>>    the hypervisor will need changing (to permit the call) or patch 2
> >>>>    here will need further refinement.
> >>>
> >>> I would rather prefer if we could limit the hypercall usage to only
> >>> report hotplugged segments to Xen. Then Xen would have to scan the
> >>> segment when reported and add any devices found.
> >>>
> >>> Such hypercall must be used before dom0 tries to access any device, as
> >>> otherwise the BARs won't be mapped in the second stage translation and
> >>> the traps for the MCFG area won't be setup either.
> >>
> >> This might work if hotplugging would only ever be of segments, and not
> >> of individual devices. Yet the latter is, I think, a common case (as
> >> far as hotplugging itself is "common").
> > 
> > Right, I agree to use hypercalls to report either hotplugged segments
> > or devices. However I would like to avoid mandating usage of the
> > hypercall for non-hotplug stuff, as then OSes not having hotplug
> > support don't really need to care about making use of those
> > hypercalls.
> > 
> >> Also don't forget about SR-IOV VFs - they would typically not be there
> >> when booting. They would materialize when the PF driver initializes
> >> the device. This is, I think, something that can be dealt with by
> >> intercepting writes to the SR-IOV capability.
> > 
> > My plan was to indeed trap SR-IOV capability accesses, see:
> > 
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2F20180717094830.54806-1-roger.pau%40citrix.com%2F&amp;data=04%7C01%7Croger.pau%40citrix.com%7C35d2502d0128484e229e08d97777087f%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637672175399546062%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sSeE%2F4wEo5%2Fplkj2yH%2B1kpHi5c15lxJxeUxx6Cbyr4s%3D&amp;reserved=0
> > 
> > I just don't have time ATM to continue this work.
> > 
> >> But I wonder whether
> >> there might be other cases where devices become "visible" only while
> >> the Dom0 kernel is already running.
> > 
> > I would consider those kind of hotplug devices, and hence would
> > require the use of the hypercall in order to notify Xen about them.
> 
> So what does this mean for the one patch? Should drivers/xen/pci.c
> then be built for PVH (and then have logic added to filter boot
> time device discovery), or should I restrict this to be PV-only (and
> PVH would get some completely different logic added later)?

I think we can reuse the same hypercalls for PVH, and maybe the same
code in Linux. For PVH we just need to be careful to make the
hypercalls before attempting to access the BARs (or the PCI
configuration space for the device) since there won't be any traps
setup, and BARs won't be mapped on the p2m.

It might be easier for Linux to just report every device it finds to
Xen, like it's currently done for PV dom0, instead of filtering on
whether the device has been hotplugged.

> >>>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
> >>>>    console) when in a non-default mode (i.e. not 80x25 text), as the
> >>>>    necessary information (in particular about VESA-bases LFB modes) is
> >>>>    not communicated. On the hypervisor side this looks like deliberate
> >>>>    behavior, but it is unclear to me what the intentions were towards
> >>>>    an alternative model. (X may be able to access the screen depending
> >>>>    on whether it has a suitable driver besides the presently unusable
> >>>>    /dev/fb<N> based one.)
> >>>
> >>> I had to admit most of my boxes are headless servers, albeit I have
> >>> one NUC I can use to test gfx stuff, so I don't really use gfx output
> >>> with Xen.
> >>>
> >>> As I understand such information is fetched from the BIOS and passed
> >>> into Xen, which should then hand it over to the dom0 kernel?
> >>
> >> That's how PV Dom0 learns of the information, yes. See
> >> fill_console_start_info(). (I'm in the process of eliminating the
> >> need for some of the "fetch from BIOS" in Xen right now, but that's
> >> not going to get us as far as being able to delete that code, no
> >> matter how much in particular Andrew would like that to happen.)
> >>
> >>> I guess the only way for Linux dom0 kernel to fetch that information
> >>> would be to emulate the BIOS or drop into realmode and issue the BIOS
> >>> calls?
> >>
> >> Native Linux gets this information passed from the boot loader, I think
> >> (except in the EFI case, as per below).
> >>
> >>> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
> >>> info using the PV EFI interface?
> >>
> >> There it's EFI boot services functions which can be invoked before
> >> leaving boot services (in the native case). Aiui the PVH entry point
> >> lives logically past any EFI boot services interaction, and hence
> >> using them is not an option (if there was EFI firmware present in Dom0
> >> in the first place, which I consider difficult all by itself - this
> >> can't be the physical system's firmware, but I also don't see where
> >> virtual firmware would be taken from).
> >>
> >> There is no PV EFI interface to obtain video information. With the
> >> needed information getting passed via start_info, PV has no need for
> >> such, and I would be hesitant to add a fundamentally redundant
> >> interface for PVH. The more that the information needed isn't EFI-
> >> specific at all.
> > 
> > I think our only option is to expand the HVM start info information to
> > convey that data from Xen into dom0.
> 
> PHV doesn't use the ordinary start_info, does it?

No, it's HVM start info as described in:

xen/include/public/arch-x86/hvm/start_info.h

We have already extended it once to add a memory map, we could extend
it another time to add the video information.

Roger.

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Jan Beulich 4 years, 4 months ago

On 14.09.2021 14:41, Roger Pau Monné wrote:
> On Tue, Sep 14, 2021 at 01:58:29PM +0200, Jan Beulich wrote:
>> On 14.09.2021 13:15, Roger Pau Monné wrote:
>>> On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
>>>> On 14.09.2021 10:32, Roger Pau Monné wrote:
>>>>> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
>>>>>> In order to try to debug hypervisor side breakage from XSA-378 I found
>>>>>> myself urged to finally give PVH Dom0 a try. Sadly things didn't work
>>>>>> quite as expected. In the course of investigating these issues I actually
>>>>>> spotted one piece of PV Dom0 breakage as well, a fix for which is also
>>>>>> included here.
>>>>>>
>>>>>> There are two immediate remaining issues (also mentioned in affected
>>>>>> patches):
>>>>>>
>>>>>> 1) It is not clear to me how PCI device reporting is to work. PV Dom0
>>>>>>    reports devices as they're discovered, including ones the hypervisor
>>>>>>    may not have been able to discover itself (ones on segments other
>>>>>>    than 0 or hotplugged ones). The respective hypercall, however, is
>>>>>>    inaccessible to PVH Dom0. Depending on the answer to this, either
>>>>>>    the hypervisor will need changing (to permit the call) or patch 2
>>>>>>    here will need further refinement.
>>>>>
>>>>> I would rather prefer if we could limit the hypercall usage to only
>>>>> report hotplugged segments to Xen. Then Xen would have to scan the
>>>>> segment when reported and add any devices found.
>>>>>
>>>>> Such hypercall must be used before dom0 tries to access any device, as
>>>>> otherwise the BARs won't be mapped in the second stage translation and
>>>>> the traps for the MCFG area won't be setup either.
>>>>
>>>> This might work if hotplugging would only ever be of segments, and not
>>>> of individual devices. Yet the latter is, I think, a common case (as
>>>> far as hotplugging itself is "common").
>>>
>>> Right, I agree to use hypercalls to report either hotplugged segments
>>> or devices. However I would like to avoid mandating usage of the
>>> hypercall for non-hotplug stuff, as then OSes not having hotplug
>>> support don't really need to care about making use of those
>>> hypercalls.
>>>
>>>> Also don't forget about SR-IOV VFs - they would typically not be there
>>>> when booting. They would materialize when the PF driver initializes
>>>> the device. This is, I think, something that can be dealt with by
>>>> intercepting writes to the SR-IOV capability.
>>>
>>> My plan was to indeed trap SR-IOV capability accesses, see:
>>>
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2F20180717094830.54806-1-roger.pau%40citrix.com%2F&amp;data=04%7C01%7Croger.pau%40citrix.com%7C35d2502d0128484e229e08d97777087f%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637672175399546062%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sSeE%2F4wEo5%2Fplkj2yH%2B1kpHi5c15lxJxeUxx6Cbyr4s%3D&amp;reserved=0
>>>
>>> I just don't have time ATM to continue this work.
>>>
>>>> But I wonder whether
>>>> there might be other cases where devices become "visible" only while
>>>> the Dom0 kernel is already running.
>>>
>>> I would consider those kind of hotplug devices, and hence would
>>> require the use of the hypercall in order to notify Xen about them.
>>
>> So what does this mean for the one patch? Should drivers/xen/pci.c
>> then be built for PVH (and then have logic added to filter boot
>> time device discovery), or should I restrict this to be PV-only (and
>> PVH would get some completely different logic added later)?
> 
> I think we can reuse the same hypercalls for PVH, and maybe the same
> code in Linux. For PVH we just need to be careful to make the
> hypercalls before attempting to access the BARs (or the PCI
> configuration space for the device) since there won't be any traps
> setup, and BARs won't be mapped on the p2m.
> 
> It might be easier for Linux to just report every device it finds to
> Xen, like it's currently done for PV dom0, instead of filtering on
> whether the device has been hotplugged.

Okay. I'll leave the Linux patch as is then and instead make a Xen
patch to actually let through the necessary function(s) in
hvm_physdev_op().

>>>>>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
>>>>>>    console) when in a non-default mode (i.e. not 80x25 text), as the
>>>>>>    necessary information (in particular about VESA-bases LFB modes) is
>>>>>>    not communicated. On the hypervisor side this looks like deliberate
>>>>>>    behavior, but it is unclear to me what the intentions were towards
>>>>>>    an alternative model. (X may be able to access the screen depending
>>>>>>    on whether it has a suitable driver besides the presently unusable
>>>>>>    /dev/fb<N> based one.)
>>>>>
>>>>> I had to admit most of my boxes are headless servers, albeit I have
>>>>> one NUC I can use to test gfx stuff, so I don't really use gfx output
>>>>> with Xen.
>>>>>
>>>>> As I understand such information is fetched from the BIOS and passed
>>>>> into Xen, which should then hand it over to the dom0 kernel?
>>>>
>>>> That's how PV Dom0 learns of the information, yes. See
>>>> fill_console_start_info(). (I'm in the process of eliminating the
>>>> need for some of the "fetch from BIOS" in Xen right now, but that's
>>>> not going to get us as far as being able to delete that code, no
>>>> matter how much in particular Andrew would like that to happen.)
>>>>
>>>>> I guess the only way for Linux dom0 kernel to fetch that information
>>>>> would be to emulate the BIOS or drop into realmode and issue the BIOS
>>>>> calls?
>>>>
>>>> Native Linux gets this information passed from the boot loader, I think
>>>> (except in the EFI case, as per below).
>>>>
>>>>> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
>>>>> info using the PV EFI interface?
>>>>
>>>> There it's EFI boot services functions which can be invoked before
>>>> leaving boot services (in the native case). Aiui the PVH entry point
>>>> lives logically past any EFI boot services interaction, and hence
>>>> using them is not an option (if there was EFI firmware present in Dom0
>>>> in the first place, which I consider difficult all by itself - this
>>>> can't be the physical system's firmware, but I also don't see where
>>>> virtual firmware would be taken from).
>>>>
>>>> There is no PV EFI interface to obtain video information. With the
>>>> needed information getting passed via start_info, PV has no need for
>>>> such, and I would be hesitant to add a fundamentally redundant
>>>> interface for PVH. The more that the information needed isn't EFI-
>>>> specific at all.
>>>
>>> I think our only option is to expand the HVM start info information to
>>> convey that data from Xen into dom0.
>>
>> PHV doesn't use the ordinary start_info, does it?
> 
> No, it's HVM start info as described in:
> 
> xen/include/public/arch-x86/hvm/start_info.h
> 
> We have already extended it once to add a memory map, we could extend
> it another time to add the video information.

Okay, I'll try to make a(nother) patch along these lines. Since there's
a DomU counterpart in PV's start_info - where does that information get
passed for PVH? (I'm mainly wondering whether there's another approach
to consider.)

Jan

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Roger Pau Monné 4 years, 4 months ago

On Tue, Sep 14, 2021 at 05:13:52PM +0200, Jan Beulich wrote:
> On 14.09.2021 14:41, Roger Pau Monné wrote:
> > On Tue, Sep 14, 2021 at 01:58:29PM +0200, Jan Beulich wrote:
> >> On 14.09.2021 13:15, Roger Pau Monné wrote:
> >>> On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
> >>>> On 14.09.2021 10:32, Roger Pau Monné wrote:
> >>>>> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
> >>>>>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
> >>>>>>    console) when in a non-default mode (i.e. not 80x25 text), as the
> >>>>>>    necessary information (in particular about VESA-bases LFB modes) is
> >>>>>>    not communicated. On the hypervisor side this looks like deliberate
> >>>>>>    behavior, but it is unclear to me what the intentions were towards
> >>>>>>    an alternative model. (X may be able to access the screen depending
> >>>>>>    on whether it has a suitable driver besides the presently unusable
> >>>>>>    /dev/fb<N> based one.)
> >>>>>
> >>>>> I had to admit most of my boxes are headless servers, albeit I have
> >>>>> one NUC I can use to test gfx stuff, so I don't really use gfx output
> >>>>> with Xen.
> >>>>>
> >>>>> As I understand such information is fetched from the BIOS and passed
> >>>>> into Xen, which should then hand it over to the dom0 kernel?
> >>>>
> >>>> That's how PV Dom0 learns of the information, yes. See
> >>>> fill_console_start_info(). (I'm in the process of eliminating the
> >>>> need for some of the "fetch from BIOS" in Xen right now, but that's
> >>>> not going to get us as far as being able to delete that code, no
> >>>> matter how much in particular Andrew would like that to happen.)
> >>>>
> >>>>> I guess the only way for Linux dom0 kernel to fetch that information
> >>>>> would be to emulate the BIOS or drop into realmode and issue the BIOS
> >>>>> calls?
> >>>>
> >>>> Native Linux gets this information passed from the boot loader, I think
> >>>> (except in the EFI case, as per below).
> >>>>
> >>>>> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
> >>>>> info using the PV EFI interface?
> >>>>
> >>>> There it's EFI boot services functions which can be invoked before
> >>>> leaving boot services (in the native case). Aiui the PVH entry point
> >>>> lives logically past any EFI boot services interaction, and hence
> >>>> using them is not an option (if there was EFI firmware present in Dom0
> >>>> in the first place, which I consider difficult all by itself - this
> >>>> can't be the physical system's firmware, but I also don't see where
> >>>> virtual firmware would be taken from).
> >>>>
> >>>> There is no PV EFI interface to obtain video information. With the
> >>>> needed information getting passed via start_info, PV has no need for
> >>>> such, and I would be hesitant to add a fundamentally redundant
> >>>> interface for PVH. The more that the information needed isn't EFI-
> >>>> specific at all.
> >>>
> >>> I think our only option is to expand the HVM start info information to
> >>> convey that data from Xen into dom0.
> >>
> >> PHV doesn't use the ordinary start_info, does it?
> > 
> > No, it's HVM start info as described in:
> > 
> > xen/include/public/arch-x86/hvm/start_info.h
> > 
> > We have already extended it once to add a memory map, we could extend
> > it another time to add the video information.
> 
> Okay, I'll try to make a(nother) patch along these lines. Since there's
> a DomU counterpart in PV's start_info - where does that information get
> passed for PVH? (I'm mainly wondering whether there's another approach
> to consider.)

We don't pass the video information at all for PVH, neither in domU or
dom0 modes if that's what you mean. Not sure what video information we
could pass for domU anyway, as that would be a PV framebuffer that
would need setup ATM. Maybe we could at some point provide some kind
of emulated or passed through card.

The information contained in start_info that's needed for PVH is
passed using hvm params, just like it's done for plain HVM guests. We
could pass the video information in a hvm param I guess, but it would
require stealing guest memory to store it (and mark as reserved in
the memory map). Not sure that's better than expanding HVM start info.

Maybe there's another hypercall interface I'm missing we could use to
propagate that information to dom0?

Thanks, Roger.

Re: [PATCH 0/9] xen/x86: PVH Dom0 fixes and fallout adjustments

Posted by Jan Beulich 4 years, 4 months ago

On 14.09.2021 18:27, Roger Pau Monné wrote:
> On Tue, Sep 14, 2021 at 05:13:52PM +0200, Jan Beulich wrote:
>> On 14.09.2021 14:41, Roger Pau Monné wrote:
>>> On Tue, Sep 14, 2021 at 01:58:29PM +0200, Jan Beulich wrote:
>>>> On 14.09.2021 13:15, Roger Pau Monné wrote:
>>>>> On Tue, Sep 14, 2021 at 11:03:23AM +0200, Jan Beulich wrote:
>>>>>> On 14.09.2021 10:32, Roger Pau Monné wrote:
>>>>>>> On Tue, Sep 07, 2021 at 12:04:34PM +0200, Jan Beulich wrote:
>>>>>>>> 2) Dom0, unlike in the PV case, cannot access the screen (to use as a
>>>>>>>>    console) when in a non-default mode (i.e. not 80x25 text), as the
>>>>>>>>    necessary information (in particular about VESA-bases LFB modes) is
>>>>>>>>    not communicated. On the hypervisor side this looks like deliberate
>>>>>>>>    behavior, but it is unclear to me what the intentions were towards
>>>>>>>>    an alternative model. (X may be able to access the screen depending
>>>>>>>>    on whether it has a suitable driver besides the presently unusable
>>>>>>>>    /dev/fb<N> based one.)
>>>>>>>
>>>>>>> I had to admit most of my boxes are headless servers, albeit I have
>>>>>>> one NUC I can use to test gfx stuff, so I don't really use gfx output
>>>>>>> with Xen.
>>>>>>>
>>>>>>> As I understand such information is fetched from the BIOS and passed
>>>>>>> into Xen, which should then hand it over to the dom0 kernel?
>>>>>>
>>>>>> That's how PV Dom0 learns of the information, yes. See
>>>>>> fill_console_start_info(). (I'm in the process of eliminating the
>>>>>> need for some of the "fetch from BIOS" in Xen right now, but that's
>>>>>> not going to get us as far as being able to delete that code, no
>>>>>> matter how much in particular Andrew would like that to happen.)
>>>>>>
>>>>>>> I guess the only way for Linux dom0 kernel to fetch that information
>>>>>>> would be to emulate the BIOS or drop into realmode and issue the BIOS
>>>>>>> calls?
>>>>>>
>>>>>> Native Linux gets this information passed from the boot loader, I think
>>>>>> (except in the EFI case, as per below).
>>>>>>
>>>>>>> Is that an issue on UEFI also, or there dom0 can fetch the framebuffer
>>>>>>> info using the PV EFI interface?
>>>>>>
>>>>>> There it's EFI boot services functions which can be invoked before
>>>>>> leaving boot services (in the native case). Aiui the PVH entry point
>>>>>> lives logically past any EFI boot services interaction, and hence
>>>>>> using them is not an option (if there was EFI firmware present in Dom0
>>>>>> in the first place, which I consider difficult all by itself - this
>>>>>> can't be the physical system's firmware, but I also don't see where
>>>>>> virtual firmware would be taken from).
>>>>>>
>>>>>> There is no PV EFI interface to obtain video information. With the
>>>>>> needed information getting passed via start_info, PV has no need for
>>>>>> such, and I would be hesitant to add a fundamentally redundant
>>>>>> interface for PVH. The more that the information needed isn't EFI-
>>>>>> specific at all.
>>>>>
>>>>> I think our only option is to expand the HVM start info information to
>>>>> convey that data from Xen into dom0.
>>>>
>>>> PHV doesn't use the ordinary start_info, does it?
>>>
>>> No, it's HVM start info as described in:
>>>
>>> xen/include/public/arch-x86/hvm/start_info.h
>>>
>>> We have already extended it once to add a memory map, we could extend
>>> it another time to add the video information.
>>
>> Okay, I'll try to make a(nother) patch along these lines. Since there's
>> a DomU counterpart in PV's start_info - where does that information get
>> passed for PVH? (I'm mainly wondering whether there's another approach
>> to consider.)
> 
> We don't pass the video information at all for PVH, neither in domU or
> dom0 modes if that's what you mean. Not sure what video information we
> could pass for domU anyway, as that would be a PV framebuffer that
> would need setup ATM. Maybe we could at some point provide some kind
> of emulated or passed through card.
> 
> The information contained in start_info that's needed for PVH is
> passed using hvm params, just like it's done for plain HVM guests.

This is what I was referring to; I'm sorry for having been unclear.
It's no video _mode_ information, but information on hot to get at
the console.

> We
> could pass the video information in a hvm param I guess, but it would
> require stealing guest memory to store it (and mark as reserved in
> the memory map). Not sure that's better than expanding HVM start info.

I don't think it would be; a param doesn't seem a good fit here,
and I have to admit I'm not even convinced its a good fit for
xenstore and console coordinates (that's fine for HVM; the only
reason I can see for PVH to use the same is the expectation of
the line between both to become increasingly blurred).

> Maybe there's another hypercall interface I'm missing we could use to
> propagate that information to dom0?

I don't think there is; if anything we'd have to add something.

Jan