[PATCH v1 1/3] s390/uv: don't return 0 from make_hva_secure() if the operation was not successful

David Hildenbrand posted 3 patches 7 months, 1 week ago
[PATCH v1 1/3] s390/uv: don't return 0 from make_hva_secure() if the operation was not successful
Posted by David Hildenbrand 7 months, 1 week ago
If s390_wiggle_split_folio() returns 0 because splitting a large folio
succeeded, we will return 0 from make_hva_secure() even though a retry
is required. Return -EAGAIN in that case.

Otherwise, we'll return 0 from gmap_make_secure(), and consequently from
unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking
succeeded and skip unpacking this page. Later on, we run into issues
and fail booting the VM.

So far, this issue was only observed with follow-up patches where we
split large pagecache XFS folios. Maybe it can also be triggered with
shmem?

We'll cleanup s390_wiggle_split_folio() a bit next, to also return 0
if no split was required.

Fixes: d8dfda5af0be ("KVM: s390: pv: fix race when making a page secure")
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/kernel/uv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 9a5d5be8acf41..2cc3b599c7fe3 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -393,8 +393,11 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
 	folio_walk_end(&fw, vma);
 	mmap_read_unlock(mm);
 
-	if (rc == -E2BIG || rc == -EBUSY)
+	if (rc == -E2BIG || rc == -EBUSY) {
 		rc = s390_wiggle_split_folio(mm, folio, rc == -E2BIG);
+		if (!rc)
+			rc = -EAGAIN;
+	}
 	folio_put(folio);
 
 	return rc;
-- 
2.49.0
Re: [PATCH v1 1/3] s390/uv: don't return 0 from make_hva_secure() if the operation was not successful
Posted by Zi Yan 7 months, 1 week ago
On 16 May 2025, at 8:39, David Hildenbrand wrote:

> If s390_wiggle_split_folio() returns 0 because splitting a large folio
> succeeded, we will return 0 from make_hva_secure() even though a retry
> is required. Return -EAGAIN in that case.
>
> Otherwise, we'll return 0 from gmap_make_secure(), and consequently from
> unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking
> succeeded and skip unpacking this page. Later on, we run into issues
> and fail booting the VM.
>
> So far, this issue was only observed with follow-up patches where we
> split large pagecache XFS folios. Maybe it can also be triggered with
> shmem?
>
> We'll cleanup s390_wiggle_split_folio() a bit next, to also return 0
> if no split was required.
>
> Fixes: d8dfda5af0be ("KVM: s390: pv: fix race when making a page secure")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/kernel/uv.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 9a5d5be8acf41..2cc3b599c7fe3 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -393,8 +393,11 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
>  	folio_walk_end(&fw, vma);
>  	mmap_read_unlock(mm);
>
> -	if (rc == -E2BIG || rc == -EBUSY)
> +	if (rc == -E2BIG || rc == -EBUSY) {
>  		rc = s390_wiggle_split_folio(mm, folio, rc == -E2BIG);
> +		if (!rc)
> +			rc = -EAGAIN;

Why not just folio_put() then jump back to the beginning of the
function to do the retry? This could avoid going all the way back
to kvm_s390_unpack().

> +	}
>  	folio_put(folio);
>
>  	return rc;
> -- 
> 2.49.0


--
Best Regards,
Yan, Zi
Re: [PATCH v1 1/3] s390/uv: don't return 0 from make_hva_secure() if the operation was not successful
Posted by David Hildenbrand 7 months, 1 week ago
On 16.05.25 23:08, Zi Yan wrote:
> On 16 May 2025, at 8:39, David Hildenbrand wrote:
> 
>> If s390_wiggle_split_folio() returns 0 because splitting a large folio
>> succeeded, we will return 0 from make_hva_secure() even though a retry
>> is required. Return -EAGAIN in that case.
>>
>> Otherwise, we'll return 0 from gmap_make_secure(), and consequently from
>> unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking
>> succeeded and skip unpacking this page. Later on, we run into issues
>> and fail booting the VM.
>>
>> So far, this issue was only observed with follow-up patches where we
>> split large pagecache XFS folios. Maybe it can also be triggered with
>> shmem?
>>
>> We'll cleanup s390_wiggle_split_folio() a bit next, to also return 0
>> if no split was required.
>>
>> Fixes: d8dfda5af0be ("KVM: s390: pv: fix race when making a page secure")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/s390/kernel/uv.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>> index 9a5d5be8acf41..2cc3b599c7fe3 100644
>> --- a/arch/s390/kernel/uv.c
>> +++ b/arch/s390/kernel/uv.c
>> @@ -393,8 +393,11 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
>>   	folio_walk_end(&fw, vma);
>>   	mmap_read_unlock(mm);
>>
>> -	if (rc == -E2BIG || rc == -EBUSY)
>> +	if (rc == -E2BIG || rc == -EBUSY) {
>>   		rc = s390_wiggle_split_folio(mm, folio, rc == -E2BIG);
>> +		if (!rc)
>> +			rc = -EAGAIN;
> 
> Why not just folio_put() then jump back to the beginning of the
> function to do the retry? This could avoid going all the way back
> to kvm_s390_unpack().

Hi, thanks for the review.

We had a pretty optimized version with such tricks before Claudio 
refactored it in:

commit 5cbe24350b7d8ef6d466a37d56b07ae643c622ca
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
Date:   Thu Jan 23 15:46:17 2025 +0100

     KVM: s390: move pv gmap functions into kvm



In particular, one relevant hunk was:

-       switch (rc) {
-       case -E2BIG:
-               folio_lock(folio);
-               rc = split_folio(folio);
-               folio_unlock(folio);
-               folio_put(folio);
-
-               switch (rc) {
-               case 0:
-                       /* Splitting succeeded, try again immediately. */
-                       goto again;
-               case -EAGAIN:
-                       /* Additional folio references. */
-                       if (drain_lru(&drain_lru_called))
-                               goto again;
-                       return -EAGAIN;



Claudio probably had a good reason to rewrite the code -- and I hope 
we'll be able to rip all of that out soon, so ...

... minimal changes until then :)


-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] s390/uv: don't return 0 from make_hva_secure() if the operation was not successful
Posted by Zi Yan 7 months ago
On 16 May 2025, at 17:20, David Hildenbrand wrote:

> On 16.05.25 23:08, Zi Yan wrote:
>> On 16 May 2025, at 8:39, David Hildenbrand wrote:
>>
>>> If s390_wiggle_split_folio() returns 0 because splitting a large folio
>>> succeeded, we will return 0 from make_hva_secure() even though a retry
>>> is required. Return -EAGAIN in that case.
>>>
>>> Otherwise, we'll return 0 from gmap_make_secure(), and consequently from
>>> unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking
>>> succeeded and skip unpacking this page. Later on, we run into issues
>>> and fail booting the VM.
>>>
>>> So far, this issue was only observed with follow-up patches where we
>>> split large pagecache XFS folios. Maybe it can also be triggered with
>>> shmem?
>>>
>>> We'll cleanup s390_wiggle_split_folio() a bit next, to also return 0
>>> if no split was required.
>>>
>>> Fixes: d8dfda5af0be ("KVM: s390: pv: fix race when making a page secure")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   arch/s390/kernel/uv.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>>> index 9a5d5be8acf41..2cc3b599c7fe3 100644
>>> --- a/arch/s390/kernel/uv.c
>>> +++ b/arch/s390/kernel/uv.c
>>> @@ -393,8 +393,11 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
>>>   	folio_walk_end(&fw, vma);
>>>   	mmap_read_unlock(mm);
>>>
>>> -	if (rc == -E2BIG || rc == -EBUSY)
>>> +	if (rc == -E2BIG || rc == -EBUSY) {
>>>   		rc = s390_wiggle_split_folio(mm, folio, rc == -E2BIG);
>>> +		if (!rc)
>>> +			rc = -EAGAIN;
>>
>> Why not just folio_put() then jump back to the beginning of the
>> function to do the retry? This could avoid going all the way back
>> to kvm_s390_unpack().
>
> Hi, thanks for the review.
>
> We had a pretty optimized version with such tricks before Claudio refactored it in:
>
> commit 5cbe24350b7d8ef6d466a37d56b07ae643c622ca
> Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Date:   Thu Jan 23 15:46:17 2025 +0100
>
>     KVM: s390: move pv gmap functions into kvm
>
>
>
> In particular, one relevant hunk was:
>
> -       switch (rc) {
> -       case -E2BIG:
> -               folio_lock(folio);
> -               rc = split_folio(folio);
> -               folio_unlock(folio);
> -               folio_put(folio);
> -
> -               switch (rc) {
> -               case 0:
> -                       /* Splitting succeeded, try again immediately. */
> -                       goto again;
> -               case -EAGAIN:
> -                       /* Additional folio references. */
> -                       if (drain_lru(&drain_lru_called))
> -                               goto again;
> -                       return -EAGAIN;
>
>
>
> Claudio probably had a good reason to rewrite the code -- and I hope we'll be able to rip all of that out soon, so ...
>
> ... minimal changes until then :)

Got it. Acked-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi
Re: [PATCH v1 1/3] s390/uv: don't return 0 from make_hva_secure() if the operation was not successful
Posted by David Hildenbrand 7 months, 1 week ago
On 16.05.25 14:39, David Hildenbrand wrote:
> If s390_wiggle_split_folio() returns 0 because splitting a large folio
> succeeded, we will return 0 from make_hva_secure() even though a retry
> is required. Return -EAGAIN in that case.
> 
> Otherwise, we'll return 0 from gmap_make_secure(), and consequently from
> unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking
> succeeded and skip unpacking this page. Later on, we run into issues
> and fail booting the VM.
> 
> So far, this issue was only observed with follow-up patches where we
> split large pagecache XFS folios. Maybe it can also be triggered with
> shmem?

Yes! I can reproduce it when allocating pages outside of the qemu process.

$ echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled
$ rm /dev/shm/vm-ram
$ fallocate -l 4G /dev/shm/vm-ram
$ /usr/libexec/qemu-kvm ... -object 
memory-backend-file,id=mem0,size=4g,share=on,mem-path=/dev/shm/vm-ram -M 
memory-backend=mem0

LOADPARM=[        ]

Using virtio-blk.
Using SCSI scheme.
.........................................................................................................................
qemu-kvm: KVM PV command 4 (KVM_PV_VERIFY) failed: header rc 102 rrc 1a 
IOCTL rc: -22
Protected boot has failed: 0xa02
Guest crashed on cpu 0: disabled-wait
PSW: 0x0002000080000000 0x0000000000004608


-- 
Cheers,

David / dhildenb