[PATCH 8/9] gnttab: bail from GFN-storing loops early in case of error

Jan Beulich posted 9 patches 4 years, 5 months ago
[PATCH 8/9] gnttab: bail from GFN-storing loops early in case of error
Posted by Jan Beulich 4 years, 5 months ago
The contents of the output arrays are undefined in both cases anyway
when the operation itself gets marked as failed. There's no value in
trying to continue after a guest memory access failure.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
There's also a curious difference between the two sub-ops wrt the use of
SHARED_M2P().

--- a/xen/common/compat/grant_table.c
+++ b/xen/common/compat/grant_table.c
@@ -170,17 +170,14 @@ int compat_grant_table_op(unsigned int c
             if ( rc == 0 )
             {
 #define XLAT_gnttab_setup_table_HNDL_frame_list(_d_, _s_) \
-                do \
-                { \
-                    if ( (_s_)->status == GNTST_okay ) \
+                do { \
+                    for ( i = 0; (_s_)->status == GNTST_okay && \
+                                 i < (_s_)->nr_frames; ++i ) \
                     { \
-                        for ( i = 0; i < (_s_)->nr_frames; ++i ) \
-                        { \
-                            unsigned int frame = (_s_)->frame_list.p[i]; \
-                            if ( __copy_to_compat_offset((_d_)->frame_list, \
-                                                         i, &frame, 1) ) \
-                                (_s_)->status = GNTST_bad_virt_addr; \
-                        } \
+                        compat_pfn_t frame = (_s_)->frame_list.p[i]; \
+                        if ( __copy_to_compat_offset((_d_)->frame_list, \
+                                                     i, &frame, 1) ) \
+                            (_s_)->status = GNTST_bad_virt_addr; \
                     } \
                 } while (0)
                 XLAT_gnttab_setup_table(&cmp.setup, nat.setup);
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2103,7 +2103,10 @@ gnttab_setup_table(
         BUG_ON(SHARED_M2P(gmfn));
 
         if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
+        {
             op.status = GNTST_bad_virt_addr;
+            break;
+        }
     }
 
  unlock:
@@ -3289,17 +3292,15 @@ gnttab_get_status_frames(XEN_GUEST_HANDL
                  "status frames, but has only %u\n",
                  d->domain_id, op.nr_frames, nr_status_frames(gt));
         op.status = GNTST_general_error;
-        goto unlock;
     }
 
-    for ( i = 0; i < op.nr_frames; i++ )
+    for ( i = 0; op.status == GNTST_okay && i < op.nr_frames; i++ )
     {
         gmfn = gfn_x(gnttab_status_gfn(d, gt, i));
         if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
             op.status = GNTST_bad_virt_addr;
     }
 
- unlock:
     grant_read_unlock(gt);
  out2:
     rcu_unlock_domain(d);


Re: [PATCH 8/9] gnttab: bail from GFN-storing loops early in case of error
Posted by Andrew Cooper 3 years, 4 months ago
On 26/08/2021 11:14, Jan Beulich wrote:
> The contents of the output arrays are undefined in both cases anyway
> when the operation itself gets marked as failed. There's no value in
> trying to continue after a guest memory access failure.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Not really Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

This is an example of a bad loop adjustment.  Taking just one example to
demonstrate with,

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -3289,17 +3292,15 @@ gnttab_get_status_frames(XEN_GUEST_HANDL
>                   "status frames, but has only %u\n",
>                   d->domain_id, op.nr_frames, nr_status_frames(gt));
>          op.status = GNTST_general_error;
> -        goto unlock;
>      }
>  
> -    for ( i = 0; i < op.nr_frames; i++ )
> +    for ( i = 0; op.status == GNTST_okay && i < op.nr_frames; i++ )
>      {
>          gmfn = gfn_x(gnttab_status_gfn(d, gt, i));
>          if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
>              op.status = GNTST_bad_virt_addr;
>      }
>  
> - unlock:
>      grant_read_unlock(gt);
>   out2:
>      rcu_unlock_domain(d);
>


If instead, this were written

    for ( i = 0; i < op.nr_frames; i++ )
    {
        gmfn = gfn_x(gnttab_status_gfn(d, gt, i));
        if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
        {
            op.status = GNTST_bad_virt_addr;
            goto unlock;
        }
    }

then the delta vs your version is -36 bytes, and faster to run because
the loop doesn't need a memory read and compare on every iteration when
you can exit based purely on existing control flow.

Furthermore, the version with a goto is clearer to follow, because the
exit condition is much more obvious.  The compat change can do the same
with breaks rather than gotos, for a slightly more modest -11 saving.

A form with the op.status changes adjustments *not* added to the loop
condition, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>


In reference to the hypercall ABI adjustments, it occurs to me that
loops like this (which we have loads of, even in hypercall hotpaths) are
horrifying for performance.  For HVM, we're redoing the nested pagewalk
for every uint64_t element of an array. 

A "copy array to guest" primitive would more efficient still than a
plain virt->phys translation, because we'd be able to drop the p2m walks
too.

Obviously, we don't want every instance like this to be doing its own
manual bounce buffering, so perhaps we should invest in some buffered
copy helpers as part of trying to improve hypercall performance.

~Andrew
Re: [PATCH 8/9] gnttab: bail from GFN-storing loops early in case of error
Posted by Jan Beulich 3 years, 4 months ago
On 07.10.2022 21:36, Andrew Cooper wrote:
> On 26/08/2021 11:14, Jan Beulich wrote:
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -3289,17 +3292,15 @@ gnttab_get_status_frames(XEN_GUEST_HANDL
>>                   "status frames, but has only %u\n",
>>                   d->domain_id, op.nr_frames, nr_status_frames(gt));
>>          op.status = GNTST_general_error;
>> -        goto unlock;
>>      }
>>  
>> -    for ( i = 0; i < op.nr_frames; i++ )
>> +    for ( i = 0; op.status == GNTST_okay && i < op.nr_frames; i++ )
>>      {
>>          gmfn = gfn_x(gnttab_status_gfn(d, gt, i));
>>          if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
>>              op.status = GNTST_bad_virt_addr;
>>      }
>>  
>> - unlock:
>>      grant_read_unlock(gt);
>>   out2:
>>      rcu_unlock_domain(d);
>>
> 
> 
> If instead, this were written
> 
>     for ( i = 0; i < op.nr_frames; i++ )
>     {
>         gmfn = gfn_x(gnttab_status_gfn(d, gt, i));
>         if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) )
>         {
>             op.status = GNTST_bad_virt_addr;
>             goto unlock;
>         }
>     }
> 
> then the delta vs your version is -36 bytes, and faster to run because
> the loop doesn't need a memory read and compare on every iteration when
> you can exit based purely on existing control flow.
> 
> Furthermore, the version with a goto is clearer to follow, because the
> exit condition is much more obvious.

I know you and others deem "goto" okay to use; where possible (and where
the resulting code remains readable/maintainable) I'm aiming at avoiding
them. Nevertheless I'll follow the request here.

>  The compat change can do the same
> with breaks rather than gotos, for a slightly more modest -11 saving.

There of course the original if() around the the loop then also needs
retaining; on the positive side this means a smaller diff.

> A form with the op.status changes adjustments *not* added to the loop
> condition, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks.

> In reference to the hypercall ABI adjustments, it occurs to me that
> loops like this (which we have loads of, even in hypercall hotpaths) are
> horrifying for performance.  For HVM, we're redoing the nested pagewalk
> for every uint64_t element of an array. 
> 
> A "copy array to guest" primitive would more efficient still than a
> plain virt->phys translation, because we'd be able to drop the p2m walks
> too.

Generally we can copy arrays (the last parameter of the copying primitives
is a count, after all) but ...

> Obviously, we don't want every instance like this to be doing its own
> manual bounce buffering, so perhaps we should invest in some buffered
> copy helpers as part of trying to improve hypercall performance.

... avoiding bounce buffering is possible only where the data to copy out
is available in ready-to-copy form. Here in the native cases we need to
retrieve the GFN (from e.g. gnttab_status_gfn()), and in the compat case
we need to translate from 64 to 32 bits. Neither really lends itself to
the use of a generic helper, I think.

Jan