[PATCH v2] x86emul: de-duplicate scatters to the same linear address

Jan Beulich posted 1 patch 2 years, 11 months ago
Test gitlab-ci passed
Failed in applying to current master (apply log)
[PATCH v2] x86emul: de-duplicate scatters to the same linear address
Posted by Jan Beulich 2 years, 11 months ago
The SDM specifically allows for earlier writes to fully overlapping
ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
would crash it if varying data was written to the same address. Detect
overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
be quite a bit more difficult. To maintain proper faulting behavior,
instead of dropping earlier write instances of fully overlapping slots
altogether, write the data of the final of these slots multiple times.
(We also can't pull ahead the [single] write of the data of the last of
the slots, clearing all involved slots' op_mask bits together, as this
would yield incorrect results if there were intervening partially
overlapping ones.)

Note that due to cache slot use being linear address based, there's no
similar issue with multiple writes to the same physical address (mapped
through different linear addresses).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Maintain correct faulting behavior.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -10073,15 +10073,36 @@ x86_emulate(
 
         for ( i = 0; op_mask; ++i )
         {
-            long idx = b & 1 ? index.qw[i] : index.dw[i];
+            long idx = (b & 1 ? index.qw[i]
+                              : index.dw[i]) * (1 << state->sib_scale);
+            unsigned long offs = truncate_ea(ea.mem.off + idx);
+            unsigned int j, slot;
 
             if ( !(op_mask & (1 << i)) )
                 continue;
 
-            rc = ops->write(ea.mem.seg,
-                            truncate_ea(ea.mem.off +
-                                        idx * (1 << state->sib_scale)),
-                            (void *)mmvalp + i * op_bytes, op_bytes, ctxt);
+            /*
+             * hvmemul_linear_mmio_access() will find a cache slot based on
+             * linear address.  hvmemul_phys_mmio_access() will crash the
+             * domain if observing varying data getting written to the same
+             * cache slot.  Utilize that squashing earlier writes to fully
+             * overlapping addresses is permitted by the spec.  We can't,
+             * however, drop the writes altogether, to maintain correct
+             * faulting behavior.  Instead write the data from the last of
+             * the fully overlapping slots multiple times.
+             */
+            for ( j = (slot = i) + 1; j < n; ++j )
+            {
+                long idx2 = (b & 1 ? index.qw[j]
+                                   : index.dw[j]) * (1 << state->sib_scale);
+
+                if ( (op_mask & (1 << j)) &&
+                     truncate_ea(ea.mem.off + idx2) == offs )
+                    slot = j;
+            }
+
+            rc = ops->write(ea.mem.seg, offs,
+                            (void *)mmvalp + slot * op_bytes, op_bytes, ctxt);
             if ( rc != X86EMUL_OKAY )
             {
                 /* See comment in gather emulation. */

Ping: [PATCH v2] x86emul: de-duplicate scatters to the same linear address
Posted by Jan Beulich 2 years, 9 months ago
On 20.05.2021 15:34, Jan Beulich wrote:
> The SDM specifically allows for earlier writes to fully overlapping
> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
> would crash it if varying data was written to the same address. Detect
> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
> be quite a bit more difficult. To maintain proper faulting behavior,
> instead of dropping earlier write instances of fully overlapping slots
> altogether, write the data of the final of these slots multiple times.
> (We also can't pull ahead the [single] write of the data of the last of
> the slots, clearing all involved slots' op_mask bits together, as this
> would yield incorrect results if there were intervening partially
> overlapping ones.)
> 
> Note that due to cache slot use being linear address based, there's no
> similar issue with multiple writes to the same physical address (mapped
> through different linear addresses).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

As indicated before, this is an issue which - afaict - would be a
security issue if introspection was security supported. As such I
find it highly irritating that this has now been pending for well
over half a year (counting from the submission of v1).

Jan

> ---
> v2: Maintain correct faulting behavior.
> 
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -10073,15 +10073,36 @@ x86_emulate(
>  
>          for ( i = 0; op_mask; ++i )
>          {
> -            long idx = b & 1 ? index.qw[i] : index.dw[i];
> +            long idx = (b & 1 ? index.qw[i]
> +                              : index.dw[i]) * (1 << state->sib_scale);
> +            unsigned long offs = truncate_ea(ea.mem.off + idx);
> +            unsigned int j, slot;
>  
>              if ( !(op_mask & (1 << i)) )
>                  continue;
>  
> -            rc = ops->write(ea.mem.seg,
> -                            truncate_ea(ea.mem.off +
> -                                        idx * (1 << state->sib_scale)),
> -                            (void *)mmvalp + i * op_bytes, op_bytes, ctxt);
> +            /*
> +             * hvmemul_linear_mmio_access() will find a cache slot based on
> +             * linear address.  hvmemul_phys_mmio_access() will crash the
> +             * domain if observing varying data getting written to the same
> +             * cache slot.  Utilize that squashing earlier writes to fully
> +             * overlapping addresses is permitted by the spec.  We can't,
> +             * however, drop the writes altogether, to maintain correct
> +             * faulting behavior.  Instead write the data from the last of
> +             * the fully overlapping slots multiple times.
> +             */
> +            for ( j = (slot = i) + 1; j < n; ++j )
> +            {
> +                long idx2 = (b & 1 ? index.qw[j]
> +                                   : index.dw[j]) * (1 << state->sib_scale);
> +
> +                if ( (op_mask & (1 << j)) &&
> +                     truncate_ea(ea.mem.off + idx2) == offs )
> +                    slot = j;
> +            }
> +
> +            rc = ops->write(ea.mem.seg, offs,
> +                            (void *)mmvalp + slot * op_bytes, op_bytes, ctxt);
>              if ( rc != X86EMUL_OKAY )
>              {
>                  /* See comment in gather emulation. */
> 


Ping²: [PATCH v2] x86emul: de-duplicate scatters to the same linear address
Posted by Jan Beulich 2 years, 6 months ago
On 28.06.2021 14:13, Jan Beulich wrote:
> On 20.05.2021 15:34, Jan Beulich wrote:
>> The SDM specifically allows for earlier writes to fully overlapping
>> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
>> would crash it if varying data was written to the same address. Detect
>> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
>> be quite a bit more difficult. To maintain proper faulting behavior,
>> instead of dropping earlier write instances of fully overlapping slots
>> altogether, write the data of the final of these slots multiple times.
>> (We also can't pull ahead the [single] write of the data of the last of
>> the slots, clearing all involved slots' op_mask bits together, as this
>> would yield incorrect results if there were intervening partially
>> overlapping ones.)
>>
>> Note that due to cache slot use being linear address based, there's no
>> similar issue with multiple writes to the same physical address (mapped
>> through different linear addresses).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> As indicated before, this is an issue which - afaict - would be a
> security issue if introspection was security supported. As such I
> find it highly irritating that this has now been pending for well
> over half a year (counting from the submission of v1).

This continues to be pending, despite having got submitted well in time
for 4.15 (Nov 2020). Are we meaning to ship another major release with
this bug unaddressed?

Jan

>> ---
>> v2: Maintain correct faulting behavior.
>>
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -10073,15 +10073,36 @@ x86_emulate(
>>  
>>          for ( i = 0; op_mask; ++i )
>>          {
>> -            long idx = b & 1 ? index.qw[i] : index.dw[i];
>> +            long idx = (b & 1 ? index.qw[i]
>> +                              : index.dw[i]) * (1 << state->sib_scale);
>> +            unsigned long offs = truncate_ea(ea.mem.off + idx);
>> +            unsigned int j, slot;
>>  
>>              if ( !(op_mask & (1 << i)) )
>>                  continue;
>>  
>> -            rc = ops->write(ea.mem.seg,
>> -                            truncate_ea(ea.mem.off +
>> -                                        idx * (1 << state->sib_scale)),
>> -                            (void *)mmvalp + i * op_bytes, op_bytes, ctxt);
>> +            /*
>> +             * hvmemul_linear_mmio_access() will find a cache slot based on
>> +             * linear address.  hvmemul_phys_mmio_access() will crash the
>> +             * domain if observing varying data getting written to the same
>> +             * cache slot.  Utilize that squashing earlier writes to fully
>> +             * overlapping addresses is permitted by the spec.  We can't,
>> +             * however, drop the writes altogether, to maintain correct
>> +             * faulting behavior.  Instead write the data from the last of
>> +             * the fully overlapping slots multiple times.
>> +             */
>> +            for ( j = (slot = i) + 1; j < n; ++j )
>> +            {
>> +                long idx2 = (b & 1 ? index.qw[j]
>> +                                   : index.dw[j]) * (1 << state->sib_scale);
>> +
>> +                if ( (op_mask & (1 << j)) &&
>> +                     truncate_ea(ea.mem.off + idx2) == offs )
>> +                    slot = j;
>> +            }
>> +
>> +            rc = ops->write(ea.mem.seg, offs,
>> +                            (void *)mmvalp + slot * op_bytes, op_bytes, ctxt);
>>              if ( rc != X86EMUL_OKAY )
>>              {
>>                  /* See comment in gather emulation. */
>>
> 
> 


Re: [PATCH v2] x86emul: de-duplicate scatters to the same linear address
Posted by Roger Pau Monné 2 years, 6 months ago
On Thu, May 20, 2021 at 03:34:28PM +0200, Jan Beulich wrote:
> The SDM specifically allows for earlier writes to fully overlapping
> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
> would crash it if varying data was written to the same address. Detect
> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
> be quite a bit more difficult. To maintain proper faulting behavior,
> instead of dropping earlier write instances of fully overlapping slots
> altogether, write the data of the final of these slots multiple times.

Is it possible for a later (non duplicated slot) to cause a fault
ending the instruction without reaching that final slot that contains
the written data?

Thanks, Roger.

Re: [PATCH v2] x86emul: de-duplicate scatters to the same linear address
Posted by Jan Beulich 2 years, 6 months ago
On 18.10.2021 13:19, Roger Pau Monné wrote:
> On Thu, May 20, 2021 at 03:34:28PM +0200, Jan Beulich wrote:
>> The SDM specifically allows for earlier writes to fully overlapping
>> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
>> would crash it if varying data was written to the same address. Detect
>> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
>> be quite a bit more difficult. To maintain proper faulting behavior,
>> instead of dropping earlier write instances of fully overlapping slots
>> altogether, write the data of the final of these slots multiple times.
> 
> Is it possible for a later (non duplicated slot) to cause a fault
> ending the instruction without reaching that final slot that contains
> the written data?

Yes, but that's not a problem: Only faults are required to be ordered,
and when a fault occurs guarantees are made only towards lower indices
(read: all lower index writes would have completed, while nothing can
be said about higher indices). All non-faulting writes can go out in
any order (unless there are [partial] overlaps, but afaict that case
still gets dealt with within spec by the proposed new logic).

Jan


Re: [PATCH v2] x86emul: de-duplicate scatters to the same linear address
Posted by Roger Pau Monné 2 years, 6 months ago
On Mon, Oct 18, 2021 at 02:17:39PM +0200, Jan Beulich wrote:
> On 18.10.2021 13:19, Roger Pau Monné wrote:
> > On Thu, May 20, 2021 at 03:34:28PM +0200, Jan Beulich wrote:
> >> The SDM specifically allows for earlier writes to fully overlapping
> >> ranges to be dropped. If a guest did so, hvmemul_phys_mmio_access()
> >> would crash it if varying data was written to the same address. Detect
> >> overlaps early, as doing so in hvmemul_{linear,phys}_mmio_access() would
> >> be quite a bit more difficult. To maintain proper faulting behavior,
> >> instead of dropping earlier write instances of fully overlapping slots
> >> altogether, write the data of the final of these slots multiple times.
> > 
> > Is it possible for a later (non duplicated slot) to cause a fault
> > ending the instruction without reaching that final slot that contains
> > the written data?
> 
> Yes, but that's not a problem: Only faults are required to be ordered,
> and when a fault occurs guarantees are made only towards lower indices
> (read: all lower index writes would have completed, while nothing can
> be said about higher indices). All non-faulting writes can go out in
> any order (unless there are [partial] overlaps, but afaict that case
> still gets dealt with within spec by the proposed new logic).

Oh, OK, so it's fine for a later write to be 'completed' even if one
of the previous ones faulted. In that case:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.