[PATCH] s390x: Clear RAM on diag308 subcode 3 reset

Nicholas Miehlbradt posted 1 patch 6 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250429052021.10789-1-nicholas@linux.ibm.com
Maintainers: Christian Borntraeger <borntraeger@linux.ibm.com>, Thomas Huth <thuth@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, David Hildenbrand <david@redhat.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Halil Pasic <pasic@linux.ibm.com>, Eric Farman <farman@linux.ibm.com>
hw/s390x/ipl.h             | 1 +
hw/s390x/s390-virtio-ccw.c | 6 ++++++
target/s390x/diag.c        | 3 +--
target/s390x/kvm/kvm.c     | 6 +++++-
4 files changed, 13 insertions(+), 3 deletions(-)
[PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Nicholas Miehlbradt 6 months, 2 weeks ago
The reset performed by subcode 3 of the diag308 instruction specifies
that system memory should be reset. This patch implements that
behaviour.

Introduce S390_RESET_REIPL_CLEAR to differentiate between subcode 3 and
subcode 4 resets.

When doing a clear reset, discard the ramblock containing the system
ram.

Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
---
 hw/s390x/ipl.h             | 1 +
 hw/s390x/s390-virtio-ccw.c | 6 ++++++
 target/s390x/diag.c        | 3 +--
 target/s390x/kvm/kvm.c     | 6 +++++-
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
index cb55101f06..9c38946363 100644
--- a/hw/s390x/ipl.h
+++ b/hw/s390x/ipl.h
@@ -38,6 +38,7 @@ enum s390_reset {
     /* default is a reset not triggered by a CPU e.g. issued by QMP */
     S390_RESET_EXTERNAL = 0,
     S390_RESET_REIPL,
+    S390_RESET_REIPL_CLEAR,
     S390_RESET_MODIFIED_CLEAR,
     S390_RESET_LOAD_NORMAL,
     S390_RESET_PV,
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 94edd42dd2..bc07158b16 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -455,6 +455,7 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
     enum s390_reset reset_type;
     CPUState *cs, *t;
     S390CPU *cpu;
+    RAMBlock *rb = machine->ram->ram_block;
 
     /*
      * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
@@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
     switch (reset_type) {
     case S390_RESET_EXTERNAL:
     case S390_RESET_REIPL:
+    case S390_RESET_REIPL_CLEAR:
         /*
          * Reset the subsystem which includes a AP reset. If a PV
          * guest had APQNs attached the AP reset is a prerequisite to
@@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
             s390_machine_unprotect(ms);
         }
 
+        if (reset_type == S390_RESET_REIPL_CLEAR) {
+            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
+        }
+
         /*
          * Device reset includes CPU clear resets so this has to be
          * done AFTER the unprotect call above.
diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index da44b0133e..cff9fbc4b0 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -105,8 +105,7 @@ void handle_diag_308(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t ra)
         s390_ipl_reset_request(cs, S390_RESET_LOAD_NORMAL);
         break;
     case DIAG308_LOAD_CLEAR:
-        /* Well we still lack the clearing bit... */
-        s390_ipl_reset_request(cs, S390_RESET_REIPL);
+        s390_ipl_reset_request(cs, S390_RESET_REIPL_CLEAR);
         break;
     case DIAG308_SET:
     case DIAG308_PV_SET:
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index b9f1422197..f2d5f7ddc0 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -1915,7 +1915,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
             ret = handle_intercept(cpu);
             break;
         case KVM_EXIT_S390_RESET:
-            s390_ipl_reset_request(cs, S390_RESET_REIPL);
+            if (run->s390_reset_flags & KVM_S390_RESET_CLEAR) {
+                s390_ipl_reset_request(cs, S390_RESET_REIPL_CLEAR);
+            } else {
+                s390_ipl_reset_request(cs, S390_RESET_REIPL);
+            }
             break;
         case KVM_EXIT_S390_TSCH:
             ret = handle_tsch(cpu);
-- 
2.47.2
Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Christian Borntraeger 4 days, 13 hours ago
Am 29.04.25 um 07:20 schrieb Nicholas Miehlbradt:
> The reset performed by subcode 3 of the diag308 instruction specifies
> that system memory should be reset. This patch implements that
> behaviour.
> 
> Introduce S390_RESET_REIPL_CLEAR to differentiate between subcode 3 and
> subcode 4 resets.
> 
> When doing a clear reset, discard the ramblock containing the system
> ram.
> 
> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
> ---
>   hw/s390x/ipl.h             | 1 +
>   hw/s390x/s390-virtio-ccw.c | 6 ++++++
>   target/s390x/diag.c        | 3 +--
>   target/s390x/kvm/kvm.c     | 6 +++++-
>   4 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
> index cb55101f06..9c38946363 100644
> --- a/hw/s390x/ipl.h
> +++ b/hw/s390x/ipl.h
> @@ -38,6 +38,7 @@ enum s390_reset {
>       /* default is a reset not triggered by a CPU e.g. issued by QMP */
>       S390_RESET_EXTERNAL = 0,
>       S390_RESET_REIPL,
> +    S390_RESET_REIPL_CLEAR,
>       S390_RESET_MODIFIED_CLEAR,
>       S390_RESET_LOAD_NORMAL,
>       S390_RESET_PV,
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 94edd42dd2..bc07158b16 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -455,6 +455,7 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
>       enum s390_reset reset_type;
>       CPUState *cs, *t;
>       S390CPU *cpu;
> +    RAMBlock *rb = machine->ram->ram_block;
>   
>       /*
>        * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
> @@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
>       switch (reset_type) {
>       case S390_RESET_EXTERNAL:
>       case S390_RESET_REIPL:
> +    case S390_RESET_REIPL_CLEAR:
>           /*
>            * Reset the subsystem which includes a AP reset. If a PV
>            * guest had APQNs attached the AP reset is a prerequisite to
> @@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
>               s390_machine_unprotect(ms);
>           }
>   
> +        if (reset_type == S390_RESET_REIPL_CLEAR) {
> +            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
> +        }
> +
>           /*
>            * Device reset includes CPU clear resets so this has to be
>            * done AFTER the unprotect call above.
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index da44b0133e..cff9fbc4b0 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -105,8 +105,7 @@ void handle_diag_308(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t ra)
>           s390_ipl_reset_request(cs, S390_RESET_LOAD_NORMAL);
>           break;      case DIAG308_LOAD_CLEAR:
> -        /* Well we still lack the clearing bit... */
> -        s390_ipl_reset_request(cs, S390_RESET_REIPL);
> +        s390_ipl_reset_request(cs, S390_RESET_REIPL_CLEAR);
>           break;
>       case DIAG308_SET:
>       case DIAG308_PV_SET:
> diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
> index b9f1422197..f2d5f7ddc0 100644
> --- a/target/s390x/kvm/kvm.c
> +++ b/target/s390x/kvm/kvm.c
> @@ -1915,7 +1915,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>               ret = handle_intercept(cpu);
>               break;
>           case KVM_EXIT_S390_RESET:
> -            s390_ipl_reset_request(cs, S390_RESET_REIPL);
> +            if (run->s390_reset_flags & KVM_S390_RESET_CLEAR) {
> +                s390_ipl_reset_request(cs, S390_RESET_REIPL_CLEAR);
> +            } else {
> +                s390_ipl_reset_request(cs, S390_RESET_REIPL);
> +            }
>               break;
>           case KVM_EXIT_S390_TSCH:
>               ret = handle_tsch(cpu);



Do I see that right that this patch never made it into qemu master? IIRC Matt has clarified all concerns?
Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Thomas Huth 4 days, 13 hours ago
On 11/11/2025 09.43, Christian Borntraeger wrote:
> Am 29.04.25 um 07:20 schrieb Nicholas Miehlbradt:
>> The reset performed by subcode 3 of the diag308 instruction specifies
>> that system memory should be reset. This patch implements that
>> behaviour.
>>
>> Introduce S390_RESET_REIPL_CLEAR to differentiate between subcode 3 and
>> subcode 4 resets.
>>
>> When doing a clear reset, discard the ramblock containing the system
>> ram.
>>
>> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
>> ---
>>   hw/s390x/ipl.h             | 1 +
>>   hw/s390x/s390-virtio-ccw.c | 6 ++++++
>>   target/s390x/diag.c        | 3 +--
>>   target/s390x/kvm/kvm.c     | 6 +++++-
>>   4 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
>> index cb55101f06..9c38946363 100644
>> --- a/hw/s390x/ipl.h
>> +++ b/hw/s390x/ipl.h
>> @@ -38,6 +38,7 @@ enum s390_reset {
>>       /* default is a reset not triggered by a CPU e.g. issued by QMP */
>>       S390_RESET_EXTERNAL = 0,
>>       S390_RESET_REIPL,
>> +    S390_RESET_REIPL_CLEAR,
>>       S390_RESET_MODIFIED_CLEAR,
>>       S390_RESET_LOAD_NORMAL,
>>       S390_RESET_PV,
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index 94edd42dd2..bc07158b16 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -455,6 +455,7 @@ static void s390_machine_reset(MachineState *machine, 
>> ResetType type)
>>       enum s390_reset reset_type;
>>       CPUState *cs, *t;
>>       S390CPU *cpu;
>> +    RAMBlock *rb = machine->ram->ram_block;
>>       /*
>>        * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
>> @@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine, 
>> ResetType type)
>>       switch (reset_type) {
>>       case S390_RESET_EXTERNAL:
>>       case S390_RESET_REIPL:
>> +    case S390_RESET_REIPL_CLEAR:
>>           /*
>>            * Reset the subsystem which includes a AP reset. If a PV
>>            * guest had APQNs attached the AP reset is a prerequisite to
>> @@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine, 
>> ResetType type)
>>               s390_machine_unprotect(ms);
>>           }
>> +        if (reset_type == S390_RESET_REIPL_CLEAR) {
>> +            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
>> +        }
>> +
>>           /*
>>            * Device reset includes CPU clear resets so this has to be
>>            * done AFTER the unprotect call above.
>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>> index da44b0133e..cff9fbc4b0 100644
>> --- a/target/s390x/diag.c
>> +++ b/target/s390x/diag.c
>> @@ -105,8 +105,7 @@ void handle_diag_308(CPUS390XState *env, uint64_t r1, 
>> uint64_t r3, uintptr_t ra)
>>           s390_ipl_reset_request(cs, S390_RESET_LOAD_NORMAL);
>>           break;      case DIAG308_LOAD_CLEAR:
>> -        /* Well we still lack the clearing bit... */
>> -        s390_ipl_reset_request(cs, S390_RESET_REIPL);
>> +        s390_ipl_reset_request(cs, S390_RESET_REIPL_CLEAR);
>>           break;
>>       case DIAG308_SET:
>>       case DIAG308_PV_SET:
>> diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
>> index b9f1422197..f2d5f7ddc0 100644
>> --- a/target/s390x/kvm/kvm.c
>> +++ b/target/s390x/kvm/kvm.c
>> @@ -1915,7 +1915,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct 
>> kvm_run *run)
>>               ret = handle_intercept(cpu);
>>               break;
>>           case KVM_EXIT_S390_RESET:
>> -            s390_ipl_reset_request(cs, S390_RESET_REIPL);
>> +            if (run->s390_reset_flags & KVM_S390_RESET_CLEAR) {
>> +                s390_ipl_reset_request(cs, S390_RESET_REIPL_CLEAR);
>> +            } else {
>> +                s390_ipl_reset_request(cs, S390_RESET_REIPL);
>> +            }
>>               break;
>>           case KVM_EXIT_S390_TSCH:
>>               ret = handle_tsch(cpu);
> 
> 
> 
> Do I see that right that this patch never made it into qemu master? IIRC 
> Matt has clarified all concerns?

I was hoping to see a reply from David that he's fine with the patch now... 
David?

  Thomas


Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by David Hildenbrand (Red Hat) 4 days, 8 hours ago
>>>        /*
>>>         * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
>>> @@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine,
>>> ResetType type)
>>>        switch (reset_type) {
>>>        case S390_RESET_EXTERNAL:
>>>        case S390_RESET_REIPL:
>>> +    case S390_RESET_REIPL_CLEAR:
>>>            /*
>>>             * Reset the subsystem which includes a AP reset. If a PV
>>>             * guest had APQNs attached the AP reset is a prerequisite to
>>> @@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine,
>>> ResetType type)
>>>                s390_machine_unprotect(ms);
>>>            }
>>> +        if (reset_type == S390_RESET_REIPL_CLEAR) {
>>> +            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
>>> +        }
>>> +

...

>>
>>
>>
>> Do I see that right that this patch never made it into qemu master? IIRC
>> Matt has clarified all concerns?
> 
> I was hoping to see a reply from David that he's fine with the patch now...
> David?

Staring at this again, one more point regarding userfaultfd: doing the 
discard on the destination while postcopy is active might be problematic.

I don't remember all details, but I think that if we have the following:

1) Migrate page X to dst
2) Discard page X on dst
3) Access page X on dst

that postcopy_request_page()->migrate_send_rp_req_pages() would assume 
that the page was already transferred (marked received in the receive 
bitmap during 1) ) and essentially never place a fresh zeropage during 
3) to be stuck forever.

-- 
Cheers

David

Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Christian Borntraeger 4 days, 7 hours ago
Am 11.11.25 um 14:37 schrieb David Hildenbrand (Red Hat):
>>>>        /*
>>>>         * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
>>>> @@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine,
>>>> ResetType type)
>>>>        switch (reset_type) {
>>>>        case S390_RESET_EXTERNAL:
>>>>        case S390_RESET_REIPL:
>>>> +    case S390_RESET_REIPL_CLEAR:
>>>>            /*
>>>>             * Reset the subsystem which includes a AP reset. If a PV
>>>>             * guest had APQNs attached the AP reset is a prerequisite to
>>>> @@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine,
>>>> ResetType type)
>>>>                s390_machine_unprotect(ms);
>>>>            }
>>>> +        if (reset_type == S390_RESET_REIPL_CLEAR) {
>>>> +            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
>>>> +        }
>>>> +
> 
> ...
> 
>>>
>>>
>>>
>>> Do I see that right that this patch never made it into qemu master? IIRC
>>> Matt has clarified all concerns?
>>
>> I was hoping to see a reply from David that he's fine with the patch now...
>> David?
> 
> Staring at this again, one more point regarding userfaultfd: doing the discard on the destination while postcopy is active might be problematic.
> 
> I don't remember all details, but I think that if we have the following:
> 
> 1) Migrate page X to dst
> 2) Discard page X on dst
> 3) Access page X on dst
> 
> that postcopy_request_page()->migrate_send_rp_req_pages() would assume that the page was already transferred (marked received in the receive bitmap during 1) ) and essentially never place a fresh zeropage during 3) to be stuck forever.

Can we have a postcopy running while we are in system reset? Or as an alternative can we check for postcopy running and not discard in that case.

Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by David Hildenbrand (Red Hat) 4 days, 6 hours ago
On 11.11.25 15:55, Christian Borntraeger wrote:
> 
> Am 11.11.25 um 14:37 schrieb David Hildenbrand (Red Hat):
>>>>>         /*
>>>>>          * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
>>>>> @@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine,
>>>>> ResetType type)
>>>>>         switch (reset_type) {
>>>>>         case S390_RESET_EXTERNAL:
>>>>>         case S390_RESET_REIPL:
>>>>> +    case S390_RESET_REIPL_CLEAR:
>>>>>             /*
>>>>>              * Reset the subsystem which includes a AP reset. If a PV
>>>>>              * guest had APQNs attached the AP reset is a prerequisite to
>>>>> @@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine,
>>>>> ResetType type)
>>>>>                 s390_machine_unprotect(ms);
>>>>>             }
>>>>> +        if (reset_type == S390_RESET_REIPL_CLEAR) {
>>>>> +            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
>>>>> +        }
>>>>> +
>>
>> ...
>>
>>>>
>>>>
>>>>
>>>> Do I see that right that this patch never made it into qemu master? IIRC
>>>> Matt has clarified all concerns?
>>>
>>> I was hoping to see a reply from David that he's fine with the patch now...
>>> David?
>>
>> Staring at this again, one more point regarding userfaultfd: doing the discard on the destination while postcopy is active might be problematic.
>>
>> I don't remember all details, but I think that if we have the following:
>>
>> 1) Migrate page X to dst
>> 2) Discard page X on dst
>> 3) Access page X on dst
>>
>> that postcopy_request_page()->migrate_send_rp_req_pages() would assume that the page was already transferred (marked received in the receive bitmap during 1) ) and essentially never place a fresh zeropage during 3) to be stuck forever.
> 
> Can we have a postcopy running while we are in system reset? 

Yes, that should be possible. Start postcopy and then trigger a system reset on the
destination (e.g., from the guest).

> Or as an alternative can we check for postcopy running and not discard in that case.

Another interaction might be with background snapshots (another form of migration)
running concurrently. If we discard after populating all memory+registering
userfaultfd-wp I think we might not get write events for all changes,
possibly corrupting the snapshot (not 100% sure but that's what I remember).


What virtio-mem does to workaround all that is the following:

static bool virtio_mem_is_busy(void)
{
     /*
      * Postcopy cannot handle concurrent discards and we don't want to migrate
      * pages on-demand with stale content when plugging new blocks.
      *
      * For precopy, we don't want unplugged blocks in our migration stream, and
      * when plugging new blocks, the page content might differ between source
      * and destination (observable by the guest when not initializing pages
      * after plugging them) until we're running on the destination (as we didn't
      * migrate these blocks when they were unplugged).
      */
     return migration_in_incoming_postcopy() || migration_is_running();
}



-- 
Cheers

David

Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by David Hildenbrand 6 months, 2 weeks ago
On 29.04.25 07:20, Nicholas Miehlbradt wrote:
> The reset performed by subcode 3 of the diag308 instruction specifies
> that system memory should be reset. This patch implements that
> behaviour.
> 
> Introduce S390_RESET_REIPL_CLEAR to differentiate between subcode 3 and
> subcode 4 resets.
> 
> When doing a clear reset, discard the ramblock containing the system
> ram.
> 
> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
> ---
>   hw/s390x/ipl.h             | 1 +
>   hw/s390x/s390-virtio-ccw.c | 6 ++++++
>   target/s390x/diag.c        | 3 +--
>   target/s390x/kvm/kvm.c     | 6 +++++-
>   4 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
> index cb55101f06..9c38946363 100644
> --- a/hw/s390x/ipl.h
> +++ b/hw/s390x/ipl.h
> @@ -38,6 +38,7 @@ enum s390_reset {
>       /* default is a reset not triggered by a CPU e.g. issued by QMP */
>       S390_RESET_EXTERNAL = 0,
>       S390_RESET_REIPL,
> +    S390_RESET_REIPL_CLEAR,
>       S390_RESET_MODIFIED_CLEAR,
>       S390_RESET_LOAD_NORMAL,
>       S390_RESET_PV,
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 94edd42dd2..bc07158b16 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -455,6 +455,7 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
>       enum s390_reset reset_type;
>       CPUState *cs, *t;
>       S390CPU *cpu;
> +    RAMBlock *rb = machine->ram->ram_block;
>   
>       /*
>        * Temporarily drop the record/replay mutex to let rr_cpu_thread_fn()
> @@ -479,6 +480,7 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
>       switch (reset_type) {
>       case S390_RESET_EXTERNAL:
>       case S390_RESET_REIPL:
> +    case S390_RESET_REIPL_CLEAR:
>           /*
>            * Reset the subsystem which includes a AP reset. If a PV
>            * guest had APQNs attached the AP reset is a prerequisite to
> @@ -489,6 +491,10 @@ static void s390_machine_reset(MachineState *machine, ResetType type)
>               s390_machine_unprotect(ms);
>           }
>   
> +        if (reset_type == S390_RESET_REIPL_CLEAR) {
> +            ram_block_discard_range(rb, 0 , qemu_ram_get_used_length(rb));
> +        }
> +

I suspect that we don't care about virtio-mem devices, because the 
memory semantics are different (and usually they will get reset to "no 
memory" during reboots either way -> discarded)

The only problem I see is with vfio devices is the new "memory pinned" 
mode. [1]

There, we'd have to check if any such device is around (discarding of 
ram is disabled?), and fallback to actual zeroing of memory.


[1] https://lkml.kernel.org/r/20250226210013.238349-1-mjrosato@linux.ibm.com

-- 
Cheers,

David / dhildenb
Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Christian Borntraeger 6 months, 2 weeks ago
Am 29.04.25 um 09:37 schrieb David Hildenbrand:
[...]
> The only problem I see is with vfio devices is the new "memory pinned" mode. [1]
> 
> There, we'd have to check if any such device is around (discarding of ram is disabled?), and fallback to actual zeroing of memory.

CC Matt to double check.

> 
> [1] https://lkml.kernel.org/r/20250226210013.238349-1-mjrosato@linux.ibm.com
Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Matthew Rosato 6 months, 2 weeks ago
On 4/29/25 3:45 AM, Christian Borntraeger wrote:
> Am 29.04.25 um 09:37 schrieb David Hildenbrand:
> [...]
>> The only problem I see is with vfio devices is the new "memory pinned" mode. [1]
>>
>> There, we'd have to check if any such device is around (discarding of ram is disabled?), and fallback to actual zeroing of memory.
> 
> CC Matt to double check.

When triggering the "relaxed translation" mode via iommu.passthrough in the guest, we now take the default (for other platforms) memory_region_is_ram() path in vfio_listener_region_add/del() which handles the pin/unpin from vfio common code.  As for ram discarding, we then also use the vfio common path and only uncoordinated discards are disabled via:

vfio_ram_block_discard_disable() -> ram_block_uncoordinated_discard_disable()


> 
>>
>> [1] https://lkml.kernel.org/r/20250226210013.238349-1-mjrosato@linux.ibm.com
>
Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Christian Borntraeger 6 months ago
Am 29.04.25 um 16:09 schrieb Matthew Rosato:
> On 4/29/25 3:45 AM, Christian Borntraeger wrote:
>> Am 29.04.25 um 09:37 schrieb David Hildenbrand:
>> [...]
>>> The only problem I see is with vfio devices is the new "memory pinned" mode. [1]
>>>
>>> There, we'd have to check if any such device is around (discarding of ram is disabled?), and fallback to actual zeroing of memory.
>>
>> CC Matt to double check.
> 
> When triggering the "relaxed translation" mode via iommu.passthrough in the guest, we now take the default (for other platforms) memory_region_is_ram() path in vfio_listener_region_add/del() which handles the pin/unpin from vfio common code.  As for ram discarding, we then also use the vfio common path and only uncoordinated discards are disabled via:
> 
> vfio_ram_block_discard_disable() -> ram_block_uncoordinated_discard_disable()

So this patch is good?
Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Matthew Rosato 6 months ago
On 5/13/25 2:50 AM, Christian Borntraeger wrote:
> Am 29.04.25 um 16:09 schrieb Matthew Rosato:
>> On 4/29/25 3:45 AM, Christian Borntraeger wrote:
>>> Am 29.04.25 um 09:37 schrieb David Hildenbrand:
>>> [...]
>>>> The only problem I see is with vfio devices is the new "memory pinned" mode. [1]
>>>>
>>>> There, we'd have to check if any such device is around (discarding of ram is disabled?), and fallback to actual zeroing of memory.
>>>
>>> CC Matt to double check.
>>
>> When triggering the "relaxed translation" mode via iommu.passthrough in the guest, we now take the default (for other platforms) memory_region_is_ram() path in vfio_listener_region_add/del() which handles the pin/unpin from vfio common code.  As for ram discarding, we then also use the vfio common path and only uncoordinated discards are disabled via:
>>
>> vfio_ram_block_discard_disable() -> ram_block_uncoordinated_discard_disable()
> 
> So this patch is good?
> 

Worked fine in my testing in combination with iommu.passthrough=1.  I traced to verify I was hitting the S390_RESET_REIPL_CLEAR path in s390_machine_reset() and observed the pinned memory of the guest shrink / grow.  Device worked fine afterwards.

Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Thomas Huth 6 months ago
On 13/05/2025 15.42, Matthew Rosato wrote:
> On 5/13/25 2:50 AM, Christian Borntraeger wrote:
>> Am 29.04.25 um 16:09 schrieb Matthew Rosato:
>>> On 4/29/25 3:45 AM, Christian Borntraeger wrote:
>>>> Am 29.04.25 um 09:37 schrieb David Hildenbrand:
>>>> [...]
>>>>> The only problem I see is with vfio devices is the new "memory pinned" mode. [1]
>>>>>
>>>>> There, we'd have to check if any such device is around (discarding of ram is disabled?), and fallback to actual zeroing of memory.
>>>>
>>>> CC Matt to double check.
>>>
>>> When triggering the "relaxed translation" mode via iommu.passthrough in the guest, we now take the default (for other platforms) memory_region_is_ram() path in vfio_listener_region_add/del() which handles the pin/unpin from vfio common code.  As for ram discarding, we then also use the vfio common path and only uncoordinated discards are disabled via:
>>>
>>> vfio_ram_block_discard_disable() -> ram_block_uncoordinated_discard_disable()
>>
>> So this patch is good?
>>
> 
> Worked fine in my testing in combination with iommu.passthrough=1.  I traced to verify I was hitting the S390_RESET_REIPL_CLEAR path in s390_machine_reset() and observed the pinned memory of the guest shrink / grow.  Device worked fine afterwards.

What about David's concern here: 
https://lore.kernel.org/qemu-devel/489d0473-579a-4850-a6d5-be38bf2954b9@redhat.com/ 
?

  Thomas


Re: [PATCH] s390x: Clear RAM on diag308 subcode 3 reset
Posted by Matthew Rosato 6 months ago
On 5/14/25 5:32 AM, Thomas Huth wrote:
> On 13/05/2025 15.42, Matthew Rosato wrote:
>> On 5/13/25 2:50 AM, Christian Borntraeger wrote:
>>> Am 29.04.25 um 16:09 schrieb Matthew Rosato:
>>>> On 4/29/25 3:45 AM, Christian Borntraeger wrote:
>>>>> Am 29.04.25 um 09:37 schrieb David Hildenbrand:
>>>>> [...]
>>>>>> The only problem I see is with vfio devices is the new "memory pinned" mode. [1]
>>>>>>
>>>>>> There, we'd have to check if any such device is around (discarding of ram is disabled?), and fallback to actual zeroing of memory.
>>>>>
>>>>> CC Matt to double check.
>>>>
>>>> When triggering the "relaxed translation" mode via iommu.passthrough in the guest, we now take the default (for other platforms) memory_region_is_ram() path in vfio_listener_region_add/del() which handles the pin/unpin from vfio common code.  As for ram discarding, we then also use the vfio common path and only uncoordinated discards are disabled via:
>>>>
>>>> vfio_ram_block_discard_disable() -> ram_block_uncoordinated_discard_disable()
>>>
>>> So this patch is good?
>>>
>>
>> Worked fine in my testing in combination with iommu.passthrough=1.  I traced to verify I was hitting the S390_RESET_REIPL_CLEAR path in s390_machine_reset() and observed the pinned memory of the guest shrink / grow.  Device worked fine afterwards.
> 
> What about David's concern here: https://lore.kernel.org/qemu-devel/489d0473-579a-4850-a6d5-be38bf2954b9@redhat.com/ ?
> 

With the current code + this patch applied and testing with a vfio-pci device in the pinned mode, I can observe the pinned memory be discarded / pinned memory shrinks until finally ram_discard_block_range returns rc=0.

We don't disable discarding of ram when using the vfio-pci pinned memory mode, we only disable uncoordinated discarding of ram - maybe David's concern was based on an older version of my implementation that required disabling both coordinated and uncoordinated discards?

Or am I missing some other concern?