[PATCH 0/3] migration/ram: Abort on unsupported migratable RAM changes

Akihiko Odaki posted 3 patches 1 week, 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260611-ram-v1-0-a2dacf699718@rsg.ci.i.u-tokyo.ac.jp
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, "Philippe Mathieu-Daudé" <philmd@mailo.com>, Zhao Liu <zhao1.liu@intel.com>, Stefano Stabellini <sstabellini@kernel.org>, Anthony PERARD <anthony@xenproject.org>, "Edgar E. Iglesias" <edgar.iglesias@gmail.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Paolo Bonzini <pbonzini@redhat.com>, Reinoud Zandijk <reinoud@netbsd.org>, Marcelo Tosatti <mtosatti@redhat.com>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>
include/migration/misc.h    |   2 +-
include/system/ramlist.h    |  24 ++++++---
block/block-ram-registrar.c |   8 +--
hw/core/numa.c              |  54 ++++++++++++++++---
hw/xen/xen-mapcache.c       |   6 +--
migration/ram.c             | 125 ++++++++++++++++++++++++++++++++++++--------
system/physmem.c            |  16 ++++--
target/i386/nvmm/nvmm-all.c |   4 +-
target/i386/sev.c           |   8 +--
util/vfio-helpers.c         |   7 +--
10 files changed, 194 insertions(+), 60 deletions(-)
[PATCH 0/3] migration/ram: Abort on unsupported migratable RAM changes
Posted by Akihiko Odaki 1 week, 6 days ago
Supersedes: <20260604-migration-v1-1-cef4a5b1bbdd@rsg.ci.i.u-tokyo.ac.jp>
("[PATCH] system/physmem: Assert migration invariants")

ram_mig_ram_block_resized() already aborts migration when migratable RAM
is resized. Extend the same handling to other unsupported changes to the
migratable RAMBlock set, such as removing a migratable RAMBlock or
changing a RAMBlock's migratable state.

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
Akihiko Odaki (3):
      system/physmem: Pass RAMBlock to RAMBlockNotifier callbacks
      system/physmem: Notify RAMBlock migratable and idstr changes
      migration/ram: Abort on unsupported migratable RAM changes

 include/migration/misc.h    |   2 +-
 include/system/ramlist.h    |  24 ++++++---
 block/block-ram-registrar.c |   8 +--
 hw/core/numa.c              |  54 ++++++++++++++++---
 hw/xen/xen-mapcache.c       |   6 +--
 migration/ram.c             | 125 ++++++++++++++++++++++++++++++++++++--------
 system/physmem.c            |  16 ++++--
 target/i386/nvmm/nvmm-all.c |   4 +-
 target/i386/sev.c           |   8 +--
 util/vfio-helpers.c         |   7 +--
 10 files changed, 194 insertions(+), 60 deletions(-)
---
base-commit: 2db91528542672cf0db78b3f2cc0e22b36302b38
change-id: 20260606-ram-dcef14f001fb

Best regards,
--  
Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Re: [PATCH 0/3] migration/ram: Abort on unsupported migratable RAM changes
Posted by Peter Xu 1 day, 21 hours ago
On Thu, Jun 11, 2026 at 03:35:47PM +0900, Akihiko Odaki wrote:
> Supersedes: <20260604-migration-v1-1-cef4a5b1bbdd@rsg.ci.i.u-tokyo.ac.jp>
> ("[PATCH] system/physmem: Assert migration invariants")
> 
> ram_mig_ram_block_resized() already aborts migration when migratable RAM
> is resized. Extend the same handling to other unsupported changes to the
> migratable RAMBlock set, such as removing a migratable RAMBlock or
> changing a RAMBlock's migratable state.
> 
> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> ---
> Akihiko Odaki (3):
>       system/physmem: Pass RAMBlock to RAMBlockNotifier callbacks
>       system/physmem: Notify RAMBlock migratable and idstr changes
>       migration/ram: Abort on unsupported migratable RAM changes

Thanks for looking at this, Akihiko.

I understand this is a protection to the system to trap error use cases.
The question I have is do we have any possible way to trigger these.

I worry we add a bunch of code and notifiers, and then there's zero way to
trigger, essentially add dead code.

Logically we could already add assert() on things we don't expect to
happen.  This case might be slightly risky, but still I think we can also
consider things like error_report_once() instead of introducing slightly
complex notifiers just to cover what we think shouldn't happen.

Or do you have way to trigger any of these notifiers?

PS: today I went back and I wanted to try how the existing resize()
notifier would trigger, I can't even reproduce it with David's example
here:

https://lore.kernel.org/qemu-devel/20210429112708.12291-1-david@redhat.com/#t

I can trap a qemu_ram_resize(), but that's invoked with newsize==rb->size,
so it didn't really notify a thing.  I don't really know how to trigger
ram_block_notify_resize().  If you know, please share.

Thanks,

-- 
Peter Xu
Re: [PATCH 0/3] migration/ram: Abort on unsupported migratable RAM changes
Posted by Akihiko Odaki 1 day, 5 hours ago
On 2026/06/23 5:23, Peter Xu wrote:
> On Thu, Jun 11, 2026 at 03:35:47PM +0900, Akihiko Odaki wrote:
>> Supersedes: <20260604-migration-v1-1-cef4a5b1bbdd@rsg.ci.i.u-tokyo.ac.jp>
>> ("[PATCH] system/physmem: Assert migration invariants")
>>
>> ram_mig_ram_block_resized() already aborts migration when migratable RAM
>> is resized. Extend the same handling to other unsupported changes to the
>> migratable RAMBlock set, such as removing a migratable RAMBlock or
>> changing a RAMBlock's migratable state.
>>
>> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>> ---
>> Akihiko Odaki (3):
>>        system/physmem: Pass RAMBlock to RAMBlockNotifier callbacks
>>        system/physmem: Notify RAMBlock migratable and idstr changes
>>        migration/ram: Abort on unsupported migratable RAM changes
> 
> Thanks for looking at this, Akihiko.
> 
> I understand this is a protection to the system to trap error use cases.
> The question I have is do we have any possible way to trigger these.
> 
> I worry we add a bunch of code and notifiers, and then there's zero way to
> trigger, essentially add dead code.
> 
> Logically we could already add assert() on things we don't expect to
> happen.  This case might be slightly risky, but still I think we can also
> consider things like error_report_once() instead of introducing slightly
> complex notifiers just to cover what we think shouldn't happen.
> 
> Or do you have way to trigger any of these notifiers?

I simply followed what's already done for resize(), expecting resize() 
does the correct thing and following it won't introduce a regression.

> 
> PS: today I went back and I wanted to try how the existing resize()
> notifier would trigger, I can't even reproduce it with David's example
> here:
> 
> https://lore.kernel.org/qemu-devel/20210429112708.12291-1-david@redhat.com/#t
> 
> I can trap a qemu_ram_resize(), but that's invoked with newsize==rb->size,
> so it didn't really notify a thing.  I don't really know how to trigger
> ram_block_notify_resize().  If you know, please share.
I made an LLM amend the reproducer. Below is its output.

Regards,
Akihiko Odaki

LLM output:

A synthetic but effective variant is to add custom ACPI filler tables so 
the initial `etc/acpi/tables` blob is just under the 128 KiB alignment 
bucket, then let the normal boot-time fw_cfg ACPI rebuild push it over.

I tested this shape:

```sh
truncate -s 65000 /tmp/fill1
truncate -s 50600 /tmp/fill2
```

Then add to the original-ish command:

```sh
-device pcie-root-port,id=rp0,chassis=1,slot=1 \
-acpitable sig=FI1A,data=/tmp/fill1 \
-acpitable sig=FI2A,data=/tmp/fill2
```

Observed via `info ramblock`:

```text
before cont:
/rom@etc/acpi/tables   Used 0x0000000000020000

after cont:
/rom@etc/acpi/tables   Used 0x0000000000040000
```

So this does produce a real RAMBlock used-size growth during boot in the 
current tree. With migration started before `cont` using a stalled 
`exec:` target, `info migrate` moved to `cancelling`, which is 
consistent with the current resize-during-precopy abort path.

The key is not the root port itself; the key is making the ACPI table 
rebuild cross `ACPI_BUILD_TABLE_SIZE` alignment. The filler is a bit 
artificial, but it is a good stress variant for the exact class of bug.
Re: [PATCH 0/3] migration/ram: Abort on unsupported migratable RAM changes
Posted by Peter Xu 1 day, 1 hour ago
On Tue, Jun 23, 2026 at 09:05:22PM +0900, Akihiko Odaki wrote:
> On 2026/06/23 5:23, Peter Xu wrote:
> > On Thu, Jun 11, 2026 at 03:35:47PM +0900, Akihiko Odaki wrote:
> > > Supersedes: <20260604-migration-v1-1-cef4a5b1bbdd@rsg.ci.i.u-tokyo.ac.jp>
> > > ("[PATCH] system/physmem: Assert migration invariants")
> > > 
> > > ram_mig_ram_block_resized() already aborts migration when migratable RAM
> > > is resized. Extend the same handling to other unsupported changes to the
> > > migratable RAMBlock set, such as removing a migratable RAMBlock or
> > > changing a RAMBlock's migratable state.
> > > 
> > > Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> > > ---
> > > Akihiko Odaki (3):
> > >        system/physmem: Pass RAMBlock to RAMBlockNotifier callbacks
> > >        system/physmem: Notify RAMBlock migratable and idstr changes
> > >        migration/ram: Abort on unsupported migratable RAM changes
> > 
> > Thanks for looking at this, Akihiko.
> > 
> > I understand this is a protection to the system to trap error use cases.
> > The question I have is do we have any possible way to trigger these.
> > 
> > I worry we add a bunch of code and notifiers, and then there's zero way to
> > trigger, essentially add dead code.
> > 
> > Logically we could already add assert() on things we don't expect to
> > happen.  This case might be slightly risky, but still I think we can also
> > consider things like error_report_once() instead of introducing slightly
> > complex notifiers just to cover what we think shouldn't happen.
> > 
> > Or do you have way to trigger any of these notifiers?
> 
> I simply followed what's already done for resize(), expecting resize() does
> the correct thing and following it won't introduce a regression.
> 
> > 
> > PS: today I went back and I wanted to try how the existing resize()
> > notifier would trigger, I can't even reproduce it with David's example
> > here:
> > 
> > https://lore.kernel.org/qemu-devel/20210429112708.12291-1-david@redhat.com/#t
> > 
> > I can trap a qemu_ram_resize(), but that's invoked with newsize==rb->size,
> > so it didn't really notify a thing.  I don't really know how to trigger
> > ram_block_notify_resize().  If you know, please share.
> I made an LLM amend the reproducer. Below is its output.
> 
> Regards,
> Akihiko Odaki
> 
> LLM output:
> 
> A synthetic but effective variant is to add custom ACPI filler tables so the
> initial `etc/acpi/tables` blob is just under the 128 KiB alignment bucket,
> then let the normal boot-time fw_cfg ACPI rebuild push it over.
> 
> I tested this shape:
> 
> ```sh
> truncate -s 65000 /tmp/fill1
> truncate -s 50600 /tmp/fill2
> ```
> 
> Then add to the original-ish command:
> 
> ```sh
> -device pcie-root-port,id=rp0,chassis=1,slot=1 \
> -acpitable sig=FI1A,data=/tmp/fill1 \
> -acpitable sig=FI2A,data=/tmp/fill2
> ```

These lines should inject some sections into ACPI, but I don't see why the
acpi table would change: that should be appended right at QEMU boots, so I
expect the ACPI table to grow indeed comparing to when without these lines,
but not resize during VM running.  I wonder if below is hallucinations from
the AI.

> 
> Observed via `info ramblock`:
> 
> ```text
> before cont:
> /rom@etc/acpi/tables   Used 0x0000000000020000
> 
> after cont:
> /rom@etc/acpi/tables   Used 0x0000000000040000
> ```
> 
> So this does produce a real RAMBlock used-size growth during boot in the
> current tree. With migration started before `cont` using a stalled `exec:`
> target, `info migrate` moved to `cancelling`, which is consistent with the
> current resize-during-precopy abort path.
> 
> The key is not the root port itself; the key is making the ACPI table
> rebuild cross `ACPI_BUILD_TABLE_SIZE` alignment. The filler is a bit
> artificial, but it is a good stress variant for the exact class of bug.

I did have a closer look on this whole "MR size can change" thing.

We have two users: ACPI (rom_add_blob()) and other firmwares (most of them
rom_add_file() users, very little used rom_add_blob()).

AFAIU, the real resize should only happen at the 2nd user, not ACPI.

ACPI seems to be able to change ROM size (PS: this is tricky to call it ROM
in the first place: I believe it's only a data blob in fw_cfg) when e.g. it
scans the pci bus and things changed, only happen during reboot, but it
can't happen during migration because qdev_add is forbidden.

Device ROMs can really change size if dest host has newer firmware packages
than source, but that's another use case and I _think_ we support fine,
except that firmwares can only grow not shrink, guarded by
qemu_ram_resize() check on max_length.

That's a pretty niche use case and nothing I can think of that on change of
flipping migratable and so on.  So IMHO we will need to understand the
problem better before having more notifiers.

PS: I wished ACPI three use cases of ROM can be part of device states
already, then it is out of question on MR resize complexity: the max size
is 128K as far as I know; it doesn't need iterability... we migrate devices
sometimes much larger than 128KB on device states.  It can be a VMSD field.

Thanks,

-- 
Peter Xu
Re: [PATCH 0/3] migration/ram: Abort on unsupported migratable RAM changes
Posted by Akihiko Odaki 1 day ago
On 2026/06/24 0:45, Peter Xu wrote:
> On Tue, Jun 23, 2026 at 09:05:22PM +0900, Akihiko Odaki wrote:
>> On 2026/06/23 5:23, Peter Xu wrote:
>>> On Thu, Jun 11, 2026 at 03:35:47PM +0900, Akihiko Odaki wrote:
>>>> Supersedes: <20260604-migration-v1-1-cef4a5b1bbdd@rsg.ci.i.u-tokyo.ac.jp>
>>>> ("[PATCH] system/physmem: Assert migration invariants")
>>>>
>>>> ram_mig_ram_block_resized() already aborts migration when migratable RAM
>>>> is resized. Extend the same handling to other unsupported changes to the
>>>> migratable RAMBlock set, such as removing a migratable RAMBlock or
>>>> changing a RAMBlock's migratable state.
>>>>
>>>> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>> ---
>>>> Akihiko Odaki (3):
>>>>         system/physmem: Pass RAMBlock to RAMBlockNotifier callbacks
>>>>         system/physmem: Notify RAMBlock migratable and idstr changes
>>>>         migration/ram: Abort on unsupported migratable RAM changes
>>>
>>> Thanks for looking at this, Akihiko.
>>>
>>> I understand this is a protection to the system to trap error use cases.
>>> The question I have is do we have any possible way to trigger these.
>>>
>>> I worry we add a bunch of code and notifiers, and then there's zero way to
>>> trigger, essentially add dead code.
>>>
>>> Logically we could already add assert() on things we don't expect to
>>> happen.  This case might be slightly risky, but still I think we can also
>>> consider things like error_report_once() instead of introducing slightly
>>> complex notifiers just to cover what we think shouldn't happen.
>>>
>>> Or do you have way to trigger any of these notifiers?
>>
>> I simply followed what's already done for resize(), expecting resize() does
>> the correct thing and following it won't introduce a regression.
>>
>>>
>>> PS: today I went back and I wanted to try how the existing resize()
>>> notifier would trigger, I can't even reproduce it with David's example
>>> here:
>>>
>>> https://lore.kernel.org/qemu-devel/20210429112708.12291-1-david@redhat.com/#t
>>>
>>> I can trap a qemu_ram_resize(), but that's invoked with newsize==rb->size,
>>> so it didn't really notify a thing.  I don't really know how to trigger
>>> ram_block_notify_resize().  If you know, please share.
>> I made an LLM amend the reproducer. Below is its output.
>>
>> Regards,
>> Akihiko Odaki
>>
>> LLM output:
>>
>> A synthetic but effective variant is to add custom ACPI filler tables so the
>> initial `etc/acpi/tables` blob is just under the 128 KiB alignment bucket,
>> then let the normal boot-time fw_cfg ACPI rebuild push it over.
>>
>> I tested this shape:
>>
>> ```sh
>> truncate -s 65000 /tmp/fill1
>> truncate -s 50600 /tmp/fill2
>> ```
>>
>> Then add to the original-ish command:
>>
>> ```sh
>> -device pcie-root-port,id=rp0,chassis=1,slot=1 \
>> -acpitable sig=FI1A,data=/tmp/fill1 \
>> -acpitable sig=FI2A,data=/tmp/fill2
>> ```
> 
> These lines should inject some sections into ACPI, but I don't see why the
> acpi table would change: that should be appended right at QEMU boots, so I
> expect the ACPI table to grow indeed comparing to when without these lines,
> but not resize during VM running.  I wonder if below is hallucinations from
> the AI.

The resize happens because the ACPI fw_cfg blobs are built lazily when 
the guest firmware selects them. acpi_add_rom_blob() registers 
acpi_build_update() as the fw_cfg select callback; after `cont`, 
firmware reads the fw_cfg ACPI entries, QEMU builds the tables, and
acpi_ram_update() calls memory_region_ram_resize().

Below is the reprouction case (LLM-generated):

#!/bin/sh
set -eu

QEMU=${QEMU:-build/qemu-system-x86_64}
tmp=$(mktemp -d)
trap 'rm -rf "$tmp"' EXIT

qmp_migrate()
{
     printf '%s%s%s\n' \
         '{"execute":"migrate","arguments":{"channels":[{' \
         '"channel-type":"main","addr":{"transport":"exec",' \
         '"args":["/bin/sleep","1000"]}}]}}'
}

truncate -s 65000 "$tmp/fill1"
truncate -s 50600 "$tmp/fill2"
truncate -s 256M "$tmp/nvdimm"

{
     echo '{"execute":"qmp_capabilities"}'
     echo '{"execute":"x-query-ramblock"}'
     qmp_migrate
     sleep 1
     echo '{"execute":"query-migrate"}'
     echo '{"execute":"cont"}'
     sleep 3
     echo '{"execute":"query-migrate"}'
     echo '{"execute":"x-query-ramblock"}'
     echo '{"execute":"quit"}'
} | "$QEMU" \
     -S \
     -machine q35,nvdimm=on,accel=tcg \
     -smp 1 \
     -cpu max \
     -m size=20G,slots=8,maxmem=22G \
     -object \
     memory-backend-file,id=mem0,mem-path="$tmp/nvdimm",size=256M \
     -device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 \
     -nodefaults \
     -qmp stdio \
     -serial none \
     -device vmgenid \
     -device intel-iommu \
     -acpitable sig=FI1A,data="$tmp/fill1" \
     -acpitable sig=FI2A,data="$tmp/fill2" \
     -display none

Expected markers in the output:

/rom@etc/acpi/tables ... Used 0x0000000000020000
"status": "active"
"status": "cancelling", "error-desc": "RAM block '/rom@etc/acpi/tables' 
resized during precopy."
/rom@etc/acpi/tables ... Used 0x0000000000040000

Regards,
Akihiko Odaki

> 
>>
>> Observed via `info ramblock`:
>>
>> ```text
>> before cont:
>> /rom@etc/acpi/tables   Used 0x0000000000020000
>>
>> after cont:
>> /rom@etc/acpi/tables   Used 0x0000000000040000
>> ```
>>
>> So this does produce a real RAMBlock used-size growth during boot in the
>> current tree. With migration started before `cont` using a stalled `exec:`
>> target, `info migrate` moved to `cancelling`, which is consistent with the
>> current resize-during-precopy abort path.
>>
>> The key is not the root port itself; the key is making the ACPI table
>> rebuild cross `ACPI_BUILD_TABLE_SIZE` alignment. The filler is a bit
>> artificial, but it is a good stress variant for the exact class of bug.
> 
> I did have a closer look on this whole "MR size can change" thing.
> 
> We have two users: ACPI (rom_add_blob()) and other firmwares (most of them
> rom_add_file() users, very little used rom_add_blob()).
> 
> AFAIU, the real resize should only happen at the 2nd user, not ACPI.
> 
> ACPI seems to be able to change ROM size (PS: this is tricky to call it ROM
> in the first place: I believe it's only a data blob in fw_cfg) when e.g. it
> scans the pci bus and things changed, only happen during reboot, but it
> can't happen during migration because qdev_add is forbidden.
> 
> Device ROMs can really change size if dest host has newer firmware packages
> than source, but that's another use case and I _think_ we support fine,
> except that firmwares can only grow not shrink, guarded by
> qemu_ram_resize() check on max_length.
> 
> That's a pretty niche use case and nothing I can think of that on change of
> flipping migratable and so on.  So IMHO we will need to understand the
> problem better before having more notifiers.
> 
> PS: I wished ACPI three use cases of ROM can be part of device states
> already, then it is out of question on MR resize complexity: the max size
> is 128K as far as I know; it doesn't need iterability... we migrate devices
> sometimes much larger than 128KB on device states.  It can be a VMSD field.
> 
> Thanks,
>