[PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Thomas Huth posted 6 patches 1 week ago
Test docker-quick@centos7 failed
Test docker-mingw@fedora failed
Test checkpatch failed
Test FreeBSD failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200728183734.7838-1-thuth@redhat.com
Maintainers: Thomas Huth <thuth@redhat.com>, Christian Borntraeger <borntraeger@de.ibm.com>, Cornelia Huck <cohuck@redhat.com>
pc-bios/s390-ccw/Makefile        |   7 +-
pc-bios/s390-ccw/bootmap.c       |  34 ++++--
pc-bios/s390-ccw/main.c          | 172 +++++++++++++++++++------------
pc-bios/s390-ccw/s390-ccw.h      |   2 +-
pc-bios/s390-ccw/virtio-blkdev.c |   7 +-
pc-bios/s390-ccw/virtio-scsi.c   |  25 +++--
pc-bios/s390-ccw/virtio-scsi.h   |   2 +-
7 files changed, 157 insertions(+), 92 deletions(-)

[PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Posted by Thomas Huth 1 week ago
If the user did not specify a "bootindex" property, the s390-ccw bios
tries to find a bootable device on its own. Unfortunately, it alwasy
stops at the very first device that it can find, no matter whether it's
bootable or not. That causes some weird behavior, for example while

 qemu-system-s390x -hda bootable.qcow2

boots perfectly fine, the bios refuses to work if you just specify
a virtio-scsi controller in front of it:

 qemu-system-s390x -device virtio-scsi -hda bootable.qcow2

Since this is quite uncomfortable and confusing for the users, and
all major firmwares on other architectures correctly boot in such
cases, too, let's also try to teach the s390-ccw bios how to boot
in such cases.

For this, we have to get rid of the various panic()s and IPL_assert()
statements at the "low-level" function and let the main code handle
the decision instead whether a boot from a device should fail or not,
so that the main code can continue searching in case it wants to.

 Thomas

Thomas Huth (6):
  pc-bios/s390-ccw/Makefile: Compile with -std=gnu99, -fwrapv and
    -fno-common
  pc-bios/s390-ccw: Move ipl-related code from main() into a separate
    function
  pc-bios/s390-ccw: Move the inner logic of find_subch() to a separate
    function
  pc-bios/s390-ccw: Do not bail out early if not finding a SCSI disk
  pc-bios/s390-ccw: Scan through all boot devices if none has been
    specified
  pc-bios/s390-ccw: Allow booting in case the first virtio-blk disk is
    bad

 pc-bios/s390-ccw/Makefile        |   7 +-
 pc-bios/s390-ccw/bootmap.c       |  34 ++++--
 pc-bios/s390-ccw/main.c          | 172 +++++++++++++++++++------------
 pc-bios/s390-ccw/s390-ccw.h      |   2 +-
 pc-bios/s390-ccw/virtio-blkdev.c |   7 +-
 pc-bios/s390-ccw/virtio-scsi.c   |  25 +++--
 pc-bios/s390-ccw/virtio-scsi.h   |   2 +-
 7 files changed, 157 insertions(+), 92 deletions(-)

-- 
2.18.1


Re: [PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Posted by Viktor Mihajlovski 1 week ago

On 7/28/20 8:37 PM, Thomas Huth wrote:
> If the user did not specify a "bootindex" property, the s390-ccw bios
> tries to find a bootable device on its own. Unfortunately, it alwasy
> stops at the very first device that it can find, no matter whether it's
> bootable or not. That causes some weird behavior, for example while
> 
>   qemu-system-s390x -hda bootable.qcow2
> 
> boots perfectly fine, the bios refuses to work if you just specify
> a virtio-scsi controller in front of it:
> 
>   qemu-system-s390x -device virtio-scsi -hda bootable.qcow2
> 
> Since this is quite uncomfortable and confusing for the users, and
> all major firmwares on other architectures correctly boot in such
> cases, too, let's also try to teach the s390-ccw bios how to boot
> in such cases.
> 
> For this, we have to get rid of the various panic()s and IPL_assert()
> statements at the "low-level" function and let the main code handle
> the decision instead whether a boot from a device should fail or not,
> so that the main code can continue searching in case it wants to.
> 

Looking at it from an architectural perspective: If an IPL Information 
Block specifying the boot device has been set and can be retrieved using 
Diagnose 308 it has to be respected, even if the device doesn't contain 
a bootable program. The boot has to fail in this case.

I had not the bandwidth to follow all code paths, but I gather that this 
is still the case with the series. So one can argue that these changes 
are taking care of an undefined situation (real hardware will always 
have the IPIB set).

As long as the architecture is not violated, I can live with the 
proposed changes. I however would like to point out that this only 
covers a corner case (no -boot or -device ..,bootindex specified). A VM 
defined and started with libvirt will always specify the boot device. 
Please don't create the impression that this patches will lead to the 
same behavior as on other platforms. It is still not possible to have an 
order list of potential boot devices in an architecture compliant way.

[...]

-- 
Kind Regards,
    Viktor

Re: [PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Posted by Thomas Huth 1 week ago
On 29/07/2020 13.42, Viktor Mihajlovski wrote:
> 
> 
> On 7/28/20 8:37 PM, Thomas Huth wrote:
>> If the user did not specify a "bootindex" property, the s390-ccw bios
>> tries to find a bootable device on its own. Unfortunately, it alwasy
>> stops at the very first device that it can find, no matter whether it's
>> bootable or not. That causes some weird behavior, for example while
>>
>>   qemu-system-s390x -hda bootable.qcow2
>>
>> boots perfectly fine, the bios refuses to work if you just specify
>> a virtio-scsi controller in front of it:
>>
>>   qemu-system-s390x -device virtio-scsi -hda bootable.qcow2
>>
>> Since this is quite uncomfortable and confusing for the users, and
>> all major firmwares on other architectures correctly boot in such
>> cases, too, let's also try to teach the s390-ccw bios how to boot
>> in such cases.
>>
>> For this, we have to get rid of the various panic()s and IPL_assert()
>> statements at the "low-level" function and let the main code handle
>> the decision instead whether a boot from a device should fail or not,
>> so that the main code can continue searching in case it wants to.
>>
> 
> Looking at it from an architectural perspective: If an IPL Information
> Block specifying the boot device has been set and can be retrieved using
> Diagnose 308 it has to be respected, even if the device doesn't contain
> a bootable program. The boot has to fail in this case.
> 
> I had not the bandwidth to follow all code paths, but I gather that this
> is still the case with the series.

Right. Just to be sure, I just double-checked with:

... -device virtio-blk,drive=baddrive,bootindex=1 \
    -device virtio-blk,drive=gooddrive

and indeed, the s390-ccw bios only checks the "baddrive" here and
refuses to boot.

> So one can argue that these changes
> are taking care of an undefined situation (real hardware will always
> have the IPIB set).
> 
> As long as the architecture is not violated, I can live with the
> proposed changes.

Thanks!

> I however would like to point out that this only
> covers a corner case (no -boot or -device ..,bootindex specified).

Sure. We™ should/could maybe also add some more documentation to

 https://www.qemu.org/docs/master/system/target-s390x.html

to make it more clear to the "unexperienced" qemu-system-s390x users
that "bootindex" is the preferred / architected way of booting there.

> Please don't create the impression that this patches will lead to the
> same behavior as on other platforms.
Ok, I'll try to state that more clearly in the cover letter of v2.

 Thomas


Re: [PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Posted by Cornelia Huck 1 week ago
[restored cc:s]

On Wed, 29 Jul 2020 13:42:05 +0200
Viktor Mihajlovski <mihajlov@linux.ibm.com> wrote:

> On 7/28/20 8:37 PM, Thomas Huth wrote:
> > If the user did not specify a "bootindex" property, the s390-ccw bios
> > tries to find a bootable device on its own. Unfortunately, it alwasy
> > stops at the very first device that it can find, no matter whether it's
> > bootable or not. That causes some weird behavior, for example while
> > 
> >   qemu-system-s390x -hda bootable.qcow2
> > 
> > boots perfectly fine, the bios refuses to work if you just specify
> > a virtio-scsi controller in front of it:
> > 
> >   qemu-system-s390x -device virtio-scsi -hda bootable.qcow2
> > 
> > Since this is quite uncomfortable and confusing for the users, and
> > all major firmwares on other architectures correctly boot in such
> > cases, too, let's also try to teach the s390-ccw bios how to boot
> > in such cases.
> > 
> > For this, we have to get rid of the various panic()s and IPL_assert()
> > statements at the "low-level" function and let the main code handle
> > the decision instead whether a boot from a device should fail or not,
> > so that the main code can continue searching in case it wants to.
> >   
> 
> Looking at it from an architectural perspective: If an IPL Information 
> Block specifying the boot device has been set and can be retrieved using 
> Diagnose 308 it has to be respected, even if the device doesn't contain 
> a bootable program. The boot has to fail in this case.
> 
> I had not the bandwidth to follow all code paths, but I gather that this 
> is still the case with the series. So one can argue that these changes 
> are taking care of an undefined situation (real hardware will always 
> have the IPIB set).
> 
> As long as the architecture is not violated, I can live with the 
> proposed changes. I however would like to point out that this only 
> covers a corner case (no -boot or -device ..,bootindex specified). A VM 
> defined and started with libvirt will always specify the boot device. 
> Please don't create the impression that this patches will lead to the 
> same behavior as on other platforms. It is still not possible to have an 
> order list of potential boot devices in an architecture compliant way.

Yes, libvirt will always add this parameter. Still, I've seen confusion
generated by this behaviour, so this change sounds like a good idea to
me.

(Is there any possibility to enhance the architecture to provide a list
of devices in the future?)


Re: [PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Posted by Cornelia Huck 1 week ago
On Tue, 28 Jul 2020 20:37:28 +0200
Thomas Huth <thuth@redhat.com> wrote:

> If the user did not specify a "bootindex" property, the s390-ccw bios
> tries to find a bootable device on its own. Unfortunately, it alwasy
> stops at the very first device that it can find, no matter whether it's
> bootable or not. That causes some weird behavior, for example while
> 
>  qemu-system-s390x -hda bootable.qcow2
> 
> boots perfectly fine, the bios refuses to work if you just specify
> a virtio-scsi controller in front of it:
> 
>  qemu-system-s390x -device virtio-scsi -hda bootable.qcow2
> 
> Since this is quite uncomfortable and confusing for the users, and
> all major firmwares on other architectures correctly boot in such
> cases, too, let's also try to teach the s390-ccw bios how to boot
> in such cases.

I think this makes sense: If you IPL an LPAR or a z/VM guest, you are
always required to explicitly specify a device to load from (at least
as far as I remember). For QEMU, we have decided to allow starting
without an explicitly specified boot device as well, so we have already
deviated from the other s390 procedures. Let's just make that case
behave the same as everywhere else.

> 
> For this, we have to get rid of the various panic()s and IPL_assert()
> statements at the "low-level" function and let the main code handle
> the decision instead whether a boot from a device should fail or not,
> so that the main code can continue searching in case it wants to.
> 
>  Thomas
> 
> Thomas Huth (6):
>   pc-bios/s390-ccw/Makefile: Compile with -std=gnu99, -fwrapv and
>     -fno-common
>   pc-bios/s390-ccw: Move ipl-related code from main() into a separate
>     function
>   pc-bios/s390-ccw: Move the inner logic of find_subch() to a separate
>     function
>   pc-bios/s390-ccw: Do not bail out early if not finding a SCSI disk
>   pc-bios/s390-ccw: Scan through all boot devices if none has been
>     specified
>   pc-bios/s390-ccw: Allow booting in case the first virtio-blk disk is
>     bad
> 
>  pc-bios/s390-ccw/Makefile        |   7 +-
>  pc-bios/s390-ccw/bootmap.c       |  34 ++++--
>  pc-bios/s390-ccw/main.c          | 172 +++++++++++++++++++------------
>  pc-bios/s390-ccw/s390-ccw.h      |   2 +-
>  pc-bios/s390-ccw/virtio-blkdev.c |   7 +-
>  pc-bios/s390-ccw/virtio-scsi.c   |  25 +++--
>  pc-bios/s390-ccw/virtio-scsi.h   |   2 +-
>  7 files changed, 157 insertions(+), 92 deletions(-)
> 


Re: [PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Posted by Janosch Frank 1 week ago
On 7/28/20 8:37 PM, Thomas Huth wrote:
> If the user did not specify a "bootindex" property, the s390-ccw bios
> tries to find a bootable device on its own. Unfortunately, it alwasy
> stops at the very first device that it can find, no matter whether it's
> bootable or not. That causes some weird behavior, for example while
> 
>  qemu-system-s390x -hda bootable.qcow2
> 
> boots perfectly fine, the bios refuses to work if you just specify
> a virtio-scsi controller in front of it:
> 
>  qemu-system-s390x -device virtio-scsi -hda bootable.qcow2
> 
> Since this is quite uncomfortable and confusing for the users, and
> all major firmwares on other architectures correctly boot in such
> cases, too, let's also try to teach the s390-ccw bios how to boot
> in such cases.
> 
> For this, we have to get rid of the various panic()s and IPL_assert()
> statements at the "low-level" function and let the main code handle
> the decision instead whether a boot from a device should fail or not,
> so that the main code can continue searching in case it wants to.

Are you planning to add a/re-use an existing test for this?
The PXE test already helped me quite a lot to find my mistakes for the
bios cleanup series.

> 
>  Thomas
> 
> Thomas Huth (6):
>   pc-bios/s390-ccw/Makefile: Compile with -std=gnu99, -fwrapv and
>     -fno-common
>   pc-bios/s390-ccw: Move ipl-related code from main() into a separate
>     function
>   pc-bios/s390-ccw: Move the inner logic of find_subch() to a separate
>     function
>   pc-bios/s390-ccw: Do not bail out early if not finding a SCSI disk
>   pc-bios/s390-ccw: Scan through all boot devices if none has been
>     specified
>   pc-bios/s390-ccw: Allow booting in case the first virtio-blk disk is
>     bad
> 
>  pc-bios/s390-ccw/Makefile        |   7 +-
>  pc-bios/s390-ccw/bootmap.c       |  34 ++++--
>  pc-bios/s390-ccw/main.c          | 172 +++++++++++++++++++------------
>  pc-bios/s390-ccw/s390-ccw.h      |   2 +-
>  pc-bios/s390-ccw/virtio-blkdev.c |   7 +-
>  pc-bios/s390-ccw/virtio-scsi.c   |  25 +++--
>  pc-bios/s390-ccw/virtio-scsi.h   |   2 +-
>  7 files changed, 157 insertions(+), 92 deletions(-)
> 


Re: [PATCH for-5.2 0/6] Continue booting in case the first device is not bootable

Posted by Thomas Huth 1 week ago
On 04/08/2020 16.49, Janosch Frank wrote:
> On 7/28/20 8:37 PM, Thomas Huth wrote:
>> If the user did not specify a "bootindex" property, the s390-ccw bios
>> tries to find a bootable device on its own. Unfortunately, it alwasy
>> stops at the very first device that it can find, no matter whether it's
>> bootable or not. That causes some weird behavior, for example while
>>
>>  qemu-system-s390x -hda bootable.qcow2
>>
>> boots perfectly fine, the bios refuses to work if you just specify
>> a virtio-scsi controller in front of it:
>>
>>  qemu-system-s390x -device virtio-scsi -hda bootable.qcow2
>>
>> Since this is quite uncomfortable and confusing for the users, and
>> all major firmwares on other architectures correctly boot in such
>> cases, too, let's also try to teach the s390-ccw bios how to boot
>> in such cases.
>>
>> For this, we have to get rid of the various panic()s and IPL_assert()
>> statements at the "low-level" function and let the main code handle
>> the decision instead whether a boot from a device should fail or not,
>> so that the main code can continue searching in case it wants to.
> 
> Are you planning to add a/re-use an existing test for this?

Not yet, but maybe the cdrom-test could be used for testing some
scenarios. I'll have a look...

 Thomas