[PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()

Shameer Kolothum via posted 1 patch 1 year, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220825161842.841-1-shameerali.kolothum.thodi@huawei.com
Maintainers: "Philippe Mathieu-Daudé" <f4bug@amsat.org>, Gerd Hoffmann <kraxel@redhat.com>
There is a newer version of this series
hw/nvram/fw_cfg.c | 1 -
1 file changed, 1 deletion(-)
[PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Shameer Kolothum via 1 year, 8 months ago
Hi

On arm/virt platform, Chen Xiang reported a Guest crash while
attempting the below steps,

1. Launch the Guest with nvdimm=on
2. Hot-add a NVDIMM dev
3. Reboot
4. Guest boots fine.
5. Reboot again.
6. Guest boot fails.

QEMU_EFI reports the below error:
ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
OnRootBridgesConnected: InstallAcpiTables: Protocol Error

Debugging shows that on first reboot(after hot-adding NVDIMM),
Qemu updates the etc/table-loader len,

qemu_ram_resize()
  fw_cfg_modify_file()
     fw_cfg_modify_bytes_read()

And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
the "key" entry to NULL. Because of this, on the second reboot,
virt_acpi_build_update() is called with a NULL "build_state" and
returns without updating the ACPI tables. This seems to be 
upsetting the firmware.

To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read().

Reported-by: chenxiang <chenxiang66@hisilicon.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
I am still not very convinced this is the root cause of the issue.
Though it looks like setting callback_opaque to NULL while updating
the file size is wrong, what puzzles me is that on the second reboot
we don't have any ACPI table size changes and ideally firmware should
see the updated tables from the first reboot itself.

Please take a look and let me know.

Thanks,
Shameer

---
 hw/nvram/fw_cfg.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index d605f3f45a..dfe8404c01 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
     ptr = s->entries[arch][key].data;
     s->entries[arch][key].data = data;
     s->entries[arch][key].len = len;
-    s->entries[arch][key].callback_opaque = NULL;
     s->entries[arch][key].allow_write = false;
 
     return ptr;
-- 
2.17.1


Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Igor Mammedov 1 year, 8 months ago
On Thu, 25 Aug 2022 17:18:42 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> Hi
> 
> On arm/virt platform, Chen Xiang reported a Guest crash while
> attempting the below steps,
> 
> 1. Launch the Guest with nvdimm=on
> 2. Hot-add a NVDIMM dev
> 3. Reboot
> 4. Guest boots fine.
> 5. Reboot again.
> 6. Guest boot fails.
> 
> QEMU_EFI reports the below error:
> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> 
> Debugging shows that on first reboot(after hot-adding NVDIMM),
> Qemu updates the etc/table-loader len,
> 
> qemu_ram_resize()
>   fw_cfg_modify_file()
>      fw_cfg_modify_bytes_read()
> 
> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> the "key" entry to NULL. Because of this, on the second reboot,
> virt_acpi_build_update() is called with a NULL "build_state" and
> returns without updating the ACPI tables. This seems to be 
> upsetting the firmware.
> 
> To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read().

Fixes: bdbb5b1706d165 ("fw_cfg: add fw_cfg_machine_reset function")
Acked-by: Igor Mammedov <imammedo@redhat.com>

CCing Gerd to have a second set of eyes on it

> Reported-by: chenxiang <chenxiang66@hisilicon.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
> I am still not very convinced this is the root cause of the issue.
> Though it looks like setting callback_opaque to NULL while updating
> the file size is wrong, what puzzles me is that on the second reboot
> we don't have any ACPI table size changes and ideally firmware should
> see the updated tables from the first reboot itself.
> 
> Please take a look and let me know.
> 
> Thanks,
> Shameer
> 
> ---
>  hw/nvram/fw_cfg.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index d605f3f45a..dfe8404c01 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
>      ptr = s->entries[arch][key].data;
>      s->entries[arch][key].data = data;
>      s->entries[arch][key].len = len;
> -    s->entries[arch][key].callback_opaque = NULL;

>      s->entries[arch][key].allow_write = false;

As Christian have mentioned, this also looks bogus.
perhaps another patch to fix that as well.

>      return ptr;
Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Gerd Hoffmann 1 year, 8 months ago
> > QEMU_EFI reports the below error:
> > ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> > OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > 
> > Debugging shows that on first reboot(after hot-adding NVDIMM),
> > Qemu updates the etc/table-loader len,
> > 
> > qemu_ram_resize()
> >   fw_cfg_modify_file()
> >      fw_cfg_modify_bytes_read()
> > 
> > And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> > the "key" entry to NULL. Because of this, on the second reboot,
> > virt_acpi_build_update() is called with a NULL "build_state" and
> > returns without updating the ACPI tables. This seems to be 
> > upsetting the firmware.
> > 
> > To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read().
> 
> Fixes: bdbb5b1706d165 ("fw_cfg: add fw_cfg_machine_reset function")
> Acked-by: Igor Mammedov <imammedo@redhat.com>
> 
> CCing Gerd to have a second set of eyes on it

Hmm.  Original patch clears both 'callback_opaque' and 'callback' (where
'callback' used to be what 'select_cb' is today I think).  Not fully
sure what the motivation was for that.  Maybe because using both
fw_cfg_modify*() calls and a callback for update-on-read for a given
entry looks pointless.  Should that be the case there are better ways to
catch that, like having fw_cfg_modify_bytes_read() throw an error in
case select_cb is not NULL instead of silently clearing the callback.

In any case clearing callback_opaque only is obviously wrong, so
Acked-by: Gerd Hoffmann <kraxel@redhat.com>

take care,
  Gerd
Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Laszlo Ersek 1 year, 8 months ago
On 08/25/22 18:18, Shameer Kolothum wrote:
> Hi
> 
> On arm/virt platform, Chen Xiang reported a Guest crash while
> attempting the below steps,
> 
> 1. Launch the Guest with nvdimm=on
> 2. Hot-add a NVDIMM dev
> 3. Reboot
> 4. Guest boots fine.
> 5. Reboot again.
> 6. Guest boot fails.
> 
> QEMU_EFI reports the below error:
> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> 
> Debugging shows that on first reboot(after hot-adding NVDIMM),
> Qemu updates the etc/table-loader len,
> 
> qemu_ram_resize()
>   fw_cfg_modify_file()
>      fw_cfg_modify_bytes_read()
> 
> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> the "key" entry to NULL. Because of this, on the second reboot,
> virt_acpi_build_update() is called with a NULL "build_state" and
> returns without updating the ACPI tables. This seems to be 
> upsetting the firmware.
> 
> To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read().
> 
> Reported-by: chenxiang <chenxiang66@hisilicon.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
> I am still not very convinced this is the root cause of the issue.
> Though it looks like setting callback_opaque to NULL while updating
> the file size is wrong, what puzzles me is that on the second reboot
> we don't have any ACPI table size changes and ideally firmware should
> see the updated tables from the first reboot itself.
> 
> Please take a look and let me know.
> 
> Thanks,
> Shameer
> 
> ---
>  hw/nvram/fw_cfg.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index d605f3f45a..dfe8404c01 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
>      ptr = s->entries[arch][key].data;
>      s->entries[arch][key].data = data;
>      s->entries[arch][key].len = len;
> -    s->entries[arch][key].callback_opaque = NULL;
>      s->entries[arch][key].allow_write = false;
>  
>      return ptr;
> 

I vaguely recall seeing the same issue report years ago (also in
relation to hot-adding NVDIMM). However, I have no capacity to
participate in the discussion. Making this remark just for clarity.

Laszlo


Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Laszlo Ersek 1 year, 8 months ago
+Ard +Gerd, one pointer at the bottom

On 08/26/22 13:59, Laszlo Ersek wrote:
> On 08/25/22 18:18, Shameer Kolothum wrote:
>> Hi
>>
>> On arm/virt platform, Chen Xiang reported a Guest crash while
>> attempting the below steps,
>>
>> 1. Launch the Guest with nvdimm=on
>> 2. Hot-add a NVDIMM dev
>> 3. Reboot
>> 4. Guest boots fine.
>> 5. Reboot again.
>> 6. Guest boot fails.
>>
>> QEMU_EFI reports the below error:
>> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
>> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
>>
>> Debugging shows that on first reboot(after hot-adding NVDIMM),
>> Qemu updates the etc/table-loader len,
>>
>> qemu_ram_resize()
>>   fw_cfg_modify_file()
>>      fw_cfg_modify_bytes_read()
>>
>> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
>> the "key" entry to NULL. Because of this, on the second reboot,
>> virt_acpi_build_update() is called with a NULL "build_state" and
>> returns without updating the ACPI tables. This seems to be 
>> upsetting the firmware.
>>
>> To fix this, don't change the callback_opaque in fw_cfg_modify_bytes_read().
>>
>> Reported-by: chenxiang <chenxiang66@hisilicon.com>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> ---
>> I am still not very convinced this is the root cause of the issue.
>> Though it looks like setting callback_opaque to NULL while updating
>> the file size is wrong, what puzzles me is that on the second reboot
>> we don't have any ACPI table size changes and ideally firmware should
>> see the updated tables from the first reboot itself.
>>
>> Please take a look and let me know.
>>
>> Thanks,
>> Shameer
>>
>> ---
>>  hw/nvram/fw_cfg.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
>> index d605f3f45a..dfe8404c01 100644
>> --- a/hw/nvram/fw_cfg.c
>> +++ b/hw/nvram/fw_cfg.c
>> @@ -728,7 +728,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
>>      ptr = s->entries[arch][key].data;
>>      s->entries[arch][key].data = data;
>>      s->entries[arch][key].len = len;
>> -    s->entries[arch][key].callback_opaque = NULL;
>>      s->entries[arch][key].allow_write = false;
>>  
>>      return ptr;
>>
> 
> I vaguely recall seeing the same issue report years ago (also in
> relation to hot-adding NVDIMM). However, I have no capacity to
> participate in the discussion. Making this remark just for clarity.

The earlier report I've had in mind was from Shameer as well:

http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3FB328@lhreml524-mbs.china.huawei.com


RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Shameerali Kolothum Thodi via 1 year, 8 months ago

> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 26 August 2022 13:15
> To: 'Laszlo Ersek' <lersek@redhat.com>; qemu-devel@nongnu.org;
> qemu-arm@nongnu.org
> Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm
> <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; Ard
> Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann
> <kraxel@redhat.com>
> Subject: RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> fw_cfg_modify_bytes_read()
> 
> 
> 
> > -----Original Message-----
> > From: Laszlo Ersek [mailto:lersek@redhat.com]
> > Sent: 26 August 2022 13:07
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>;
> > qemu-devel@nongnu.org; qemu-arm@nongnu.org
> > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm
> > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>;
> Ard
> > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann
> > <kraxel@redhat.com>
> > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > fw_cfg_modify_bytes_read()
> >
> > +Ard +Gerd, one pointer at the bottom
> >
> > On 08/26/22 13:59, Laszlo Ersek wrote:
> > > On 08/25/22 18:18, Shameer Kolothum wrote:
> > >> Hi
> > >>
> > >> On arm/virt platform, Chen Xiang reported a Guest crash while
> > >> attempting the below steps,
> > >>
> > >> 1. Launch the Guest with nvdimm=on
> > >> 2. Hot-add a NVDIMM dev
> > >> 3. Reboot
> > >> 4. Guest boots fine.
> > >> 5. Reboot again.
> > >> 6. Guest boot fails.
> > >>
> > >> QEMU_EFI reports the below error:
> > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > >>
> > >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> > >> Qemu updates the etc/table-loader len,
> > >>
> > >> qemu_ram_resize()
> > >>   fw_cfg_modify_file()
> > >>      fw_cfg_modify_bytes_read()
> > >>
> > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> > >> the "key" entry to NULL. Because of this, on the second reboot,
> > >> virt_acpi_build_update() is called with a NULL "build_state" and
> > >> returns without updating the ACPI tables. This seems to be
> > >> upsetting the firmware.
> > >>
> > >> To fix this, don't change the callback_opaque in
> > fw_cfg_modify_bytes_read().
> > >>
> > >> Reported-by: chenxiang <chenxiang66@hisilicon.com>
> > >> Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi@huawei.com>
> > >> ---
> > >> I am still not very convinced this is the root cause of the issue.
> > >> Though it looks like setting callback_opaque to NULL while updating
> > >> the file size is wrong, what puzzles me is that on the second reboot
> > >> we don't have any ACPI table size changes and ideally firmware should
> > >> see the updated tables from the first reboot itself.
> > >>
> > >> Please take a look and let me know.
> > >>
> > >> Thanks,
> > >> Shameer
> > >>
> > >> ---
> > >>  hw/nvram/fw_cfg.c | 1 -
> > >>  1 file changed, 1 deletion(-)
> > >>
> > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > >> index d605f3f45a..dfe8404c01 100644
> > >> --- a/hw/nvram/fw_cfg.c
> > >> +++ b/hw/nvram/fw_cfg.c
> > >> @@ -728,7 +728,6 @@ static void
> > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> > >>      ptr = s->entries[arch][key].data;
> > >>      s->entries[arch][key].data = data;
> > >>      s->entries[arch][key].len = len;
> > >> -    s->entries[arch][key].callback_opaque = NULL;
> > >>      s->entries[arch][key].allow_write = false;
> > >>
> > >>      return ptr;
> > >>
> > >
> > > I vaguely recall seeing the same issue report years ago (also in
> > > relation to hot-adding NVDIMM). However, I have no capacity to
> > > participate in the discussion. Making this remark just for clarity.
> >
> > The earlier report I've had in mind was from Shameer as well:
> >
> >
> http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> > B328@lhreml524-mbs.china.huawei.com
> 
> Right. That was a slightly different issue though. It was basically ACPI table
> size not
> getting updated on the first reboot of Guest after we hot-add NVDIMM dev.
> The error
> from firmware was different in that case,
> 
> ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> 
> And it was fixed with this series here,
> https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.3
> 0664-1-shameerali.kolothum.thodi@huawei.com/
> 
> The current issue only happens on the second reboot of the Guest as
> described in
> the steps above.
> 

[+Christian]

I just found that a similar issue was reported here sometime back on Q35/Windows
setup,
https://patchew.org/QEMU/YldFMTbFLUcdFIfa@cae.in-ulm.de/

But there are no further discussions on that thread.

Thanks,
Shameer

Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Christian A. Ehrhardt 1 year, 8 months ago
Hi,

Shameer: Thanks for bringing this to my attention.

Some comments inline.

On Tue, Aug 30, 2022 at 06:43:56AM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Shameerali Kolothum Thodi
> > Sent: 26 August 2022 13:15
> > To: 'Laszlo Ersek' <lersek@redhat.com>; qemu-devel@nongnu.org;
> > qemu-arm@nongnu.org
> > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm
> > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; Ard
> > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann
> > <kraxel@redhat.com>
> > Subject: RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > fw_cfg_modify_bytes_read()
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Laszlo Ersek [mailto:lersek@redhat.com]
> > > Sent: 26 August 2022 13:07
> > > To: Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>;
> > > qemu-devel@nongnu.org; qemu-arm@nongnu.org
> > > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm
> > > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>;
> > Ard
> > > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann
> > > <kraxel@redhat.com>
> > > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > > fw_cfg_modify_bytes_read()
> > >
> > > +Ard +Gerd, one pointer at the bottom
> > >
> > > On 08/26/22 13:59, Laszlo Ersek wrote:
> > > > On 08/25/22 18:18, Shameer Kolothum wrote:
> > > >> Hi
> > > >>
> > > >> On arm/virt platform, Chen Xiang reported a Guest crash while
> > > >> attempting the below steps,
> > > >>
> > > >> 1. Launch the Guest with nvdimm=on
> > > >> 2. Hot-add a NVDIMM dev
> > > >> 3. Reboot
> > > >> 4. Guest boots fine.
> > > >> 5. Reboot again.
> > > >> 6. Guest boot fails.
> > > >>
> > > >> QEMU_EFI reports the below error:
> > > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> > > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > > >>
> > > >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> > > >> Qemu updates the etc/table-loader len,
> > > >>
> > > >> qemu_ram_resize()
> > > >>   fw_cfg_modify_file()
> > > >>      fw_cfg_modify_bytes_read()
> > > >>
> > > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> > > >> the "key" entry to NULL. Because of this, on the second reboot,
> > > >> virt_acpi_build_update() is called with a NULL "build_state" and
> > > >> returns without updating the ACPI tables. This seems to be
> > > >> upsetting the firmware.
> > > >>
> > > >> To fix this, don't change the callback_opaque in
> > > fw_cfg_modify_bytes_read().
> > > >>
> > > >> Reported-by: chenxiang <chenxiang66@hisilicon.com>
> > > >> Signed-off-by: Shameer Kolothum
> > > <shameerali.kolothum.thodi@huawei.com>
> > > >> ---
> > > >> I am still not very convinced this is the root cause of the issue.
> > > >> Though it looks like setting callback_opaque to NULL while updating
> > > >> the file size is wrong, what puzzles me is that on the second reboot
> > > >> we don't have any ACPI table size changes and ideally firmware should
> > > >> see the updated tables from the first reboot itself.
> > > >>
> > > >> Please take a look and let me know.
> > > >>
> > > >> Thanks,
> > > >> Shameer
> > > >>
> > > >> ---
> > > >>  hw/nvram/fw_cfg.c | 1 -
> > > >>  1 file changed, 1 deletion(-)
> > > >>
> > > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > > >> index d605f3f45a..dfe8404c01 100644
> > > >> --- a/hw/nvram/fw_cfg.c
> > > >> +++ b/hw/nvram/fw_cfg.c
> > > >> @@ -728,7 +728,6 @@ static void
> > > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> > > >>      ptr = s->entries[arch][key].data;
> > > >>      s->entries[arch][key].data = data;
> > > >>      s->entries[arch][key].len = len;
> > > >> -    s->entries[arch][key].callback_opaque = NULL;
> > > >>      s->entries[arch][key].allow_write = false;
> > > >>
> > > >>      return ptr;
> > > >>


The code as it stands clears callback_opaque (the data pointer
of the callbacks) while leaving the actual callbacks in place.
I think it is obvious that this cannot be correct.

IMHO the change to allow_write is wrong for similar reasons but
I don't think that this matters in practice.

If this path is hit for the table-loader file the ACPI tables in the
guest will be corrupt.


> > > > I vaguely recall seeing the same issue report years ago (also in
> > > > relation to hot-adding NVDIMM). However, I have no capacity to
> > > > participate in the discussion. Making this remark just for clarity.
> > >
> > > The earlier report I've had in mind was from Shameer as well:
> > >
> > >
> > http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> > > B328@lhreml524-mbs.china.huawei.com
> > 
> > Right. That was a slightly different issue though. It was basically ACPI table
> > size not
> > getting updated on the first reboot of Guest after we hot-add NVDIMM dev.
> > The error
> > from firmware was different in that case,
> > 
> > ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> > OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > 
> > And it was fixed with this series here,
> > https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.3
> > 0664-1-shameerali.kolothum.thodi@huawei.com/
> > 
> > The current issue only happens on the second reboot of the Guest as
> > described in
> > the steps above.
> > 
> 
> [+Christian]
> 
> I just found that a similar issue was reported here sometime back on Q35/Windows
> setup,
> https://patchew.org/QEMU/YldFMTbFLUcdFIfa@cae.in-ulm.de/
> 
> But there are no further discussions on that thread.

I convinced myself that this cannot happen upstream as the number of
entries in the table-loader is always small. However, it does happen
for us and the suggested patch fixes the issue for us.

Given that Shameer independantly came to the same conclusion
the patch should by considered for inclusion.

    thanks   Christian
RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Posted by Shameerali Kolothum Thodi via 1 year, 8 months ago

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: 26 August 2022 13:07
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> qemu-devel@nongnu.org; qemu-arm@nongnu.org
> Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm
> <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; Ard
> Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann
> <kraxel@redhat.com>
> Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> fw_cfg_modify_bytes_read()
> 
> +Ard +Gerd, one pointer at the bottom
> 
> On 08/26/22 13:59, Laszlo Ersek wrote:
> > On 08/25/22 18:18, Shameer Kolothum wrote:
> >> Hi
> >>
> >> On arm/virt platform, Chen Xiang reported a Guest crash while
> >> attempting the below steps,
> >>
> >> 1. Launch the Guest with nvdimm=on
> >> 2. Hot-add a NVDIMM dev
> >> 3. Reboot
> >> 4. Guest boots fine.
> >> 5. Reboot again.
> >> 6. Guest boot fails.
> >>
> >> QEMU_EFI reports the below error:
> >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> >>
> >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> >> Qemu updates the etc/table-loader len,
> >>
> >> qemu_ram_resize()
> >>   fw_cfg_modify_file()
> >>      fw_cfg_modify_bytes_read()
> >>
> >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> >> the "key" entry to NULL. Because of this, on the second reboot,
> >> virt_acpi_build_update() is called with a NULL "build_state" and
> >> returns without updating the ACPI tables. This seems to be
> >> upsetting the firmware.
> >>
> >> To fix this, don't change the callback_opaque in
> fw_cfg_modify_bytes_read().
> >>
> >> Reported-by: chenxiang <chenxiang66@hisilicon.com>
> >> Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> >> ---
> >> I am still not very convinced this is the root cause of the issue.
> >> Though it looks like setting callback_opaque to NULL while updating
> >> the file size is wrong, what puzzles me is that on the second reboot
> >> we don't have any ACPI table size changes and ideally firmware should
> >> see the updated tables from the first reboot itself.
> >>
> >> Please take a look and let me know.
> >>
> >> Thanks,
> >> Shameer
> >>
> >> ---
> >>  hw/nvram/fw_cfg.c | 1 -
> >>  1 file changed, 1 deletion(-)
> >>
> >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> >> index d605f3f45a..dfe8404c01 100644
> >> --- a/hw/nvram/fw_cfg.c
> >> +++ b/hw/nvram/fw_cfg.c
> >> @@ -728,7 +728,6 @@ static void
> *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> >>      ptr = s->entries[arch][key].data;
> >>      s->entries[arch][key].data = data;
> >>      s->entries[arch][key].len = len;
> >> -    s->entries[arch][key].callback_opaque = NULL;
> >>      s->entries[arch][key].allow_write = false;
> >>
> >>      return ptr;
> >>
> >
> > I vaguely recall seeing the same issue report years ago (also in
> > relation to hot-adding NVDIMM). However, I have no capacity to
> > participate in the discussion. Making this remark just for clarity.
> 
> The earlier report I've had in mind was from Shameer as well:
> 
> http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> B328@lhreml524-mbs.china.huawei.com

Right. That was a slightly different issue though. It was basically ACPI table size not
getting updated on the first reboot of Guest after we hot-add NVDIMM dev. The error
from firmware was different in that case,

ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
OnRootBridgesConnected: InstallAcpiTables: Protocol Error

And it was fixed with this series here,
https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.30664-1-shameerali.kolothum.thodi@huawei.com/

The current issue only happens on the second reboot of the Guest as described in 
the steps above.

Thanks,
Shameer