[PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access

Nicholas Piggin posted 1 patch 2 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240219061731.232570-1-npiggin@gmail.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>
There is a newer version of this series
include/exec/ram_addr.h | 13 +++++++++++++
system/physmem.c        | 14 +++++++++-----
2 files changed, 22 insertions(+), 5 deletions(-)
[PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Nicholas Piggin 2 months, 1 week ago
The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
aligned ranges forgot to bring the TCG TLB up to date after clearing
some of the dirty memory bitmap bits. This can result in stores though
the TCG TLB not setting the dirty memory bitmap and ultimately causes
memory corruption / lost updates during migration from a TCG host.

Fix this by exporting an abstracted function to call when dirty bits
have been cleared.

Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

I reproduced this with a kvm-unit-tests migration stress tester I'm
working on. Tree here with reproduce instructions in latest commit.

https://github.com/npiggin/kvm-unit-tests/tree/qemu-tcg-migration-bug

Thanks,
Nick

 include/exec/ram_addr.h | 13 +++++++++++++
 system/physmem.c        | 14 +++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 90676093f5..dadb2deb11 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -443,6 +443,16 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 }
 #endif /* not _WIN32 */
 
+void tcg_cpu_physical_memory_dirty_bits_cleared(ram_addr_t start,
+                                                ram_addr_t length);
+static inline void cpu_physical_memory_dirty_bits_cleared(ram_addr_t start,
+                                                          ram_addr_t length)
+{
+    if (tcg_enabled()) {
+        tcg_cpu_physical_memory_dirty_bits_cleared(start, length);
+    }
+
+}
 bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
                                               ram_addr_t length,
                                               unsigned client);
@@ -504,6 +514,9 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                 idx++;
             }
         }
+        if (num_dirty) {
+            cpu_physical_memory_dirty_bits_cleared(start, length);
+        }
 
         if (rb->clear_bmap) {
             /*
diff --git a/system/physmem.c b/system/physmem.c
index 5e66d9ae36..dc0d8b16aa 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -839,6 +839,12 @@ static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length)
     }
 }
 
+void tcg_cpu_physical_memory_dirty_bits_cleared(ram_addr_t start,
+                                                ram_addr_t length)
+{
+    tlb_reset_dirty_range_all(start, length);
+}
+
 /* Note: start and end must be within the same ram block.  */
 bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
                                               ram_addr_t length,
@@ -881,8 +887,8 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
         memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
     }
 
-    if (dirty && tcg_enabled()) {
-        tlb_reset_dirty_range_all(start, length);
+    if (dirty) {
+        cpu_physical_memory_dirty_bits_cleared(start, length);
     }
 
     return dirty;
@@ -929,9 +935,7 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
         }
     }
 
-    if (tcg_enabled()) {
-        tlb_reset_dirty_range_all(start, length);
-    }
+    cpu_physical_memory_dirty_bits_cleared(start, length);
 
     memory_region_clear_dirty_bitmap(mr, offset, length);
 
-- 
2.42.0
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Thomas Huth 1 month, 2 weeks ago
  Hi Peter, Paolo, David,

this patch fixes a problem with the kvm-unit-tests ... could we get it 
included in QEMU 9.0 ?

  Thanks,
   Thomas


On 19/02/2024 07.17, Nicholas Piggin wrote:
> The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> aligned ranges forgot to bring the TCG TLB up to date after clearing
> some of the dirty memory bitmap bits. This can result in stores though
> the TCG TLB not setting the dirty memory bitmap and ultimately causes
> memory corruption / lost updates during migration from a TCG host.
> 
> Fix this by exporting an abstracted function to call when dirty bits
> have been cleared.
> 
> Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> 
> I reproduced this with a kvm-unit-tests migration stress tester I'm
> working on. Tree here with reproduce instructions in latest commit.
> 
> https://github.com/npiggin/kvm-unit-tests/tree/qemu-tcg-migration-bug
> 
> Thanks,
> Nick
> 
>   include/exec/ram_addr.h | 13 +++++++++++++
>   system/physmem.c        | 14 +++++++++-----
>   2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 90676093f5..dadb2deb11 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -443,6 +443,16 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
>   }
>   #endif /* not _WIN32 */
>   
> +void tcg_cpu_physical_memory_dirty_bits_cleared(ram_addr_t start,
> +                                                ram_addr_t length);
> +static inline void cpu_physical_memory_dirty_bits_cleared(ram_addr_t start,
> +                                                          ram_addr_t length)
> +{
> +    if (tcg_enabled()) {
> +        tcg_cpu_physical_memory_dirty_bits_cleared(start, length);
> +    }
> +
> +}
>   bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>                                                 ram_addr_t length,
>                                                 unsigned client);
> @@ -504,6 +514,9 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
>                   idx++;
>               }
>           }
> +        if (num_dirty) {
> +            cpu_physical_memory_dirty_bits_cleared(start, length);
> +        }
>   
>           if (rb->clear_bmap) {
>               /*
> diff --git a/system/physmem.c b/system/physmem.c
> index 5e66d9ae36..dc0d8b16aa 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -839,6 +839,12 @@ static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length)
>       }
>   }
>   
> +void tcg_cpu_physical_memory_dirty_bits_cleared(ram_addr_t start,
> +                                                ram_addr_t length)
> +{
> +    tlb_reset_dirty_range_all(start, length);
> +}
> +
>   /* Note: start and end must be within the same ram block.  */
>   bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>                                                 ram_addr_t length,
> @@ -881,8 +887,8 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>           memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
>       }
>   
> -    if (dirty && tcg_enabled()) {
> -        tlb_reset_dirty_range_all(start, length);
> +    if (dirty) {
> +        cpu_physical_memory_dirty_bits_cleared(start, length);
>       }
>   
>       return dirty;
> @@ -929,9 +935,7 @@ DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
>           }
>       }
>   
> -    if (tcg_enabled()) {
> -        tlb_reset_dirty_range_all(start, length);
> -    }
> +    cpu_physical_memory_dirty_bits_cleared(start, length);
>   
>       memory_region_clear_dirty_bitmap(mr, offset, length);
>
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Peter Xu 1 month, 2 weeks ago
On Tue, Mar 12, 2024 at 06:38:13PM +0100, Thomas Huth wrote:
> 
>  Hi Peter, Paolo, David,
> 
> this patch fixes a problem with the kvm-unit-tests ... could we get it
> included in QEMU 9.0 ?

Yes I think so.  Apologies for a long delay, I queued it for the next rc
pull.

Thanks,

-- 
Peter Xu
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Philippe Mathieu-Daudé 1 month, 2 weeks ago
On 12/3/24 20:24, Peter Xu wrote:
> On Tue, Mar 12, 2024 at 06:38:13PM +0100, Thomas Huth wrote:
>>
>>   Hi Peter, Paolo, David,
>>
>> this patch fixes a problem with the kvm-unit-tests ... could we get it
>> included in QEMU 9.0 ?
> 
> Yes I think so.  Apologies for a long delay, I queued it for the next rc
> pull.

I was testing a v2, please consider it instead:
https://lore.kernel.org/qemu-devel/20240312201458.79532-1-philmd@linaro.org/
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Thomas Huth 2 months, 1 week ago
On 19/02/2024 07.17, Nicholas Piggin wrote:
> The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> aligned ranges forgot to bring the TCG TLB up to date after clearing
> some of the dirty memory bitmap bits. This can result in stores though
> the TCG TLB not setting the dirty memory bitmap and ultimately causes
> memory corruption / lost updates during migration from a TCG host.
> 
> Fix this by exporting an abstracted function to call when dirty bits
> have been cleared.
> 
> Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---

Sounds promising! ... but it doesn't seem to fix the migration-test qtest 
with s390x when it gets enabled again:

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
          return g_test_run();
      }

-    /*
-     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
-     * there until the problems are resolved
-     */
-    if (g_str_equal(arch, "s390x") && !has_kvm) {
-        g_test_message("Skipping test: s390x host with KVM is required");
-        return g_test_run();
-    }
-
      tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
      if (!tmpfs) {
          g_test_message("Can't create temporary directory in %s: %s",

I wonder whether there is more stuff like this necessary somewhere?

Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to 
see whether that works fine now?

  Thomas
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Nicholas Piggin 2 months, 1 week ago
On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
> On 19/02/2024 07.17, Nicholas Piggin wrote:
> > The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> > aligned ranges forgot to bring the TCG TLB up to date after clearing
> > some of the dirty memory bitmap bits. This can result in stores though
> > the TCG TLB not setting the dirty memory bitmap and ultimately causes
> > memory corruption / lost updates during migration from a TCG host.
> > 
> > Fix this by exporting an abstracted function to call when dirty bits
> > have been cleared.
> > 
> > Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
>
> Sounds promising! ... but it doesn't seem to fix the migration-test qtest 
> with s390x when it gets enabled again:
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
>           return g_test_run();
>       }
>
> -    /*
> -     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
> -     * there until the problems are resolved
> -     */
> -    if (g_str_equal(arch, "s390x") && !has_kvm) {
> -        g_test_message("Skipping test: s390x host with KVM is required");
> -        return g_test_run();
> -    }
> -
>       tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
>       if (!tmpfs) {
>           g_test_message("Can't create temporary directory in %s: %s",
>
> I wonder whether there is more stuff like this necessary somewhere?
>
> Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to 
> see whether that works fine now?

I'm seeing a hang about every 10 minutes with s390x. ppc64 is reliable
so far.

So both my patches didn't fix the problem for s390. It seems like the
test just stops running, so maybe it's a harness problem? I didn't
dig into what state the machine is in at this point.

I did fix a few ppc64 migration issues recently that came up with
testing reverse-debugging. That was very good for finding problems
(but very difficult to diagnose failures). Maybe that helped stability
on this test?

Thanks,
Nick
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Nicholas Piggin 2 months, 1 week ago
On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
> On 19/02/2024 07.17, Nicholas Piggin wrote:
> > The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> > aligned ranges forgot to bring the TCG TLB up to date after clearing
> > some of the dirty memory bitmap bits. This can result in stores though
> > the TCG TLB not setting the dirty memory bitmap and ultimately causes
> > memory corruption / lost updates during migration from a TCG host.
> > 
> > Fix this by exporting an abstracted function to call when dirty bits
> > have been cleared.
> > 
> > Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
>
> Sounds promising! ... but it doesn't seem to fix the migration-test qtest 
> with s390x when it gets enabled again:

Did it fix kvm-unit-tests for you?

> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
>           return g_test_run();
>       }
>
> -    /*
> -     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
> -     * there until the problems are resolved
> -     */
> -    if (g_str_equal(arch, "s390x") && !has_kvm) {
> -        g_test_message("Skipping test: s390x host with KVM is required");
> -        return g_test_run();
> -    }
> -
>       tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
>       if (!tmpfs) {
>           g_test_message("Can't create temporary directory in %s: %s",
>
> I wonder whether there is more stuff like this necessary somewhere?

Possibly. That's what the commit logs for the TCG disable indicate. I
have found another dirty bitmap TCG race too. I'll send it out after
some more testing.

> Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to 
> see whether that works fine now?

Hmm, I did try and so far ppc64 is not failing even with upstream QEMU.
I'll try with s390x. Any additional build or runtime options to make it
break? How long does it take for breakage to be evident?

Thanks,
Nick
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Thomas Huth 2 months ago
On 20/02/2024 02.13, Nicholas Piggin wrote:
> On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
>> On 19/02/2024 07.17, Nicholas Piggin wrote:
>>> The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
>>> aligned ranges forgot to bring the TCG TLB up to date after clearing
>>> some of the dirty memory bitmap bits. This can result in stores though
>>> the TCG TLB not setting the dirty memory bitmap and ultimately causes
>>> memory corruption / lost updates during migration from a TCG host.
>>>
>>> Fix this by exporting an abstracted function to call when dirty bits
>>> have been cleared.
>>>
>>> Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>
>> Sounds promising! ... but it doesn't seem to fix the migration-test qtest
>> with s390x when it gets enabled again:
> 
> Did it fix kvm-unit-tests for you?

It does, indeed! With your QEMU patch here, your new selftest-migration test 
of the k-u-t is working reliably with TCG now, indeed. Thus feel free to add:

Tested-by: Thomas Huth <thuth@redhat.com>

>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
>>            return g_test_run();
>>        }
>>
>> -    /*
>> -     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
>> -     * there until the problems are resolved
>> -     */
>> -    if (g_str_equal(arch, "s390x") && !has_kvm) {
>> -        g_test_message("Skipping test: s390x host with KVM is required");
>> -        return g_test_run();
>> -    }
>> -
>>        tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
>>        if (!tmpfs) {
>>            g_test_message("Can't create temporary directory in %s: %s",
>>
>> I wonder whether there is more stuff like this necessary somewhere?
> 
> Possibly. That's what the commit logs for the TCG disable indicate. I
> have found another dirty bitmap TCG race too. I'll send it out after
> some more testing.
> 
>> Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to
>> see whether that works fine now?
> 
> Hmm, I did try and so far ppc64 is not failing even with upstream QEMU.

Oh, indeed! Actually, now that you mentioned it, I remembered that I checked 
it a couple of weeks ago already:

https://lore.kernel.org/qemu-devel/7d4f5624-83d2-4330-9315-b23869529e99@redhat.com/

> I'll try with s390x. Any additional build or runtime options to make it
> break? How long does it take for breakage to be evident?

For me, it normally breaks after running the migration test a couple of few 
times already, let's say one time out of ten runs?

  Thomas
Re: [PATCH] system/physmem: Fix migration dirty bitmap coherency with TCG memory access
Posted by Nicholas Piggin 2 months ago
On Fri Feb 23, 2024 at 6:59 AM AEST, Thomas Huth wrote:
> On 20/02/2024 02.13, Nicholas Piggin wrote:
> > On Tue Feb 20, 2024 at 12:10 AM AEST, Thomas Huth wrote:
> >> On 19/02/2024 07.17, Nicholas Piggin wrote:
> >>> The fastpath in cpu_physical_memory_sync_dirty_bitmap() to test large
> >>> aligned ranges forgot to bring the TCG TLB up to date after clearing
> >>> some of the dirty memory bitmap bits. This can result in stores though
> >>> the TCG TLB not setting the dirty memory bitmap and ultimately causes
> >>> memory corruption / lost updates during migration from a TCG host.
> >>>
> >>> Fix this by exporting an abstracted function to call when dirty bits
> >>> have been cleared.
> >>>
> >>> Fixes: aa8dc044772 ("migration: synchronize memory bitmap 64bits at a time")
> >>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>> ---
> >>
> >> Sounds promising! ... but it doesn't seem to fix the migration-test qtest
> >> with s390x when it gets enabled again:
> > 
> > Did it fix kvm-unit-tests for you?
>
> It does, indeed! With your QEMU patch here, your new selftest-migration test 
> of the k-u-t is working reliably with TCG now, indeed. Thus feel free to add:
>
> Tested-by: Thomas Huth <thuth@redhat.com>

Great, thanks.

>
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -3385,15 +3385,6 @@ int main(int argc, char **argv)
> >>            return g_test_run();
> >>        }
> >>
> >> -    /*
> >> -     * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
> >> -     * there until the problems are resolved
> >> -     */
> >> -    if (g_str_equal(arch, "s390x") && !has_kvm) {
> >> -        g_test_message("Skipping test: s390x host with KVM is required");
> >> -        return g_test_run();
> >> -    }
> >> -
> >>        tmpfs = g_dir_make_tmp("migration-test-XXXXXX", &err);
> >>        if (!tmpfs) {
> >>            g_test_message("Can't create temporary directory in %s: %s",
> >>
> >> I wonder whether there is more stuff like this necessary somewhere?
> > 
> > Possibly. That's what the commit logs for the TCG disable indicate. I
> > have found another dirty bitmap TCG race too. I'll send it out after
> > some more testing.
> > 
> >> Did you try to re-enable tests/qtest/migration-test.c for ppc64 with TCG to
> >> see whether that works fine now?
> > 
> > Hmm, I did try and so far ppc64 is not failing even with upstream QEMU.
>
> Oh, indeed! Actually, now that you mentioned it, I remembered that I checked 
> it a couple of weeks ago already:
>
> https://lore.kernel.org/qemu-devel/7d4f5624-83d2-4330-9315-b23869529e99@redhat.com/

Okay I'll look at re-enabling it then.

> > I'll try with s390x. Any additional build or runtime options to make it
> > break? How long does it take for breakage to be evident?
>
> For me, it normally breaks after running the migration test a couple of few 
> times already, let's say one time out of ten runs?

Seems like a tricky one to debug.

It looks like the migration qtest is just migrating while incrementing each
char in 99MB of memory? Interesting if that breaks but k-u-t multi
migration on s390x does not. Could be worth looking at the differences
between them.

It is also odd the qtest didn't trigger this TCG bug. I have another
multi-migration test for kvm-unit-tests (not yet submitted) which does
similar dirtying of memory and that *does* break TCG.

Thanks,
Nick