fwcfg: Wrong callback behaviour after fw_cfg_modify_bytes_read

Christian A. Ehrhardt posted 1 patch 2 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/YldFMTbFLUcdFIfa@cae.in-ulm.de
Maintainers: "Philippe Mathieu-Daudé" <f4bug@amsat.org>, Gerd Hoffmann <kraxel@redhat.com>
fwcfg: Wrong callback behaviour after fw_cfg_modify_bytes_read
Posted by Christian A. Ehrhardt 2 years ago

Hi,

there's a long story behind this (see below). However, I'll start with
the result:

fw_cfg_modify_bytes_read() sets the callback data of an existing
fw_cfg file to NULL but leaves the actual callbacks in place.
Additioanlly, this function sets ->allow_write to false for no
good reason AFAICS.

For most callbacks, the callback will just crash on the NULL pointer
in ->callback_opaque if this path is ever hit. 

I think the following patch is required (I can properly format it
if you agree). I'm not 100% sure about the "allow_write" part, tough:

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index e5f3c98184..b8b6d8fe10 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -742,8 +742,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
     ptr = s->entries[arch][key].data;
     s->entries[arch][key].data = data;
     s->entries[arch][key].len = len;
-    s->entries[arch][key].callback_opaque = NULL;
-    s->entries[arch][key].allow_write = false;
 
     return ptr;
 }

Oppinions?


For those interesed here's the somewhat longer story and the reason
why the diff actually matters:

We are running Windows in a Q35 based machine in UEFI mode with OVMF.
In some situations we saw that the Windows guest would hang in the
Windows boot loader after a guest initiated reboot of the virtual
machine. A hard "system_reset" would trigger the same bug.

The guest was hanging in a loop trying to read from unassigned
I/O port 0xb008. This is the default port used for the ACPI
PM timer on PIIX based machines (but remember that we use Q35 where
the PM timer lives at 0x608 instead).

It turned out that after the reboot OVMF would try to read the
ACPI tables from FWCFG but commands in the table-loader file
could not be executed correctly and OVMF falls back to some hard
coded PIIX based default.

ACPI tables and the table-loader data is initially generated
during setup but this data is re-generated via an FWCFG callback
(acpi_update_build) when the first of these files is accessed.
The tables generated at this later time differ slightly from those
generated during initial setup.

In our case these differences required a resize of the table-loader
romfile. This resize calls fw_cfg_modify_file() via the resize
hook of the memory region that contains the FWCFG file.
As described above this clears the ->callback_opaque data that
points to the build_state.

After a reboot rom_reset will restore the original contents of
the linker-loader file. In theory, this is only temporary. However,
due to the missing callback_opaque data the first call to
acpi_update_build() will do nothing. As a result the OVMF guest
reads an outdated version of the table-loader file. The actual
tables are properly re-generated on the next access to a different
FWCFG file that did not go through a resize. But at this point the
guest has already read the outdated table-loader data and trying to
apply this to the re-generated ACPI tables results in errors.

This results in broken ACPI tables as discussed above.

       regards    Christian