The implementation of MMUEXT_FLUSH_CACHE is bogus, as it doesn't account to
flush the cache of any previous pCPU where the current vCPU might have run,
and hence is likely to not work as expected.
Fix this by resorting to use the same logic as MMUEXT_FLUSH_CACHE_GLOBAL,
which will be correct in all cases.
Fixes: 534ffcb416bb ("Fix WBINVD by adding a new hypercall.")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Alternatively, the hypercall could be made correct by keeping track of the
pCPUs the vCPU has run on since the last cache flush.  That's however more
work.  See later in the series.
---
 xen/arch/x86/mm.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 66c15a3c864f..38e214352201 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3805,14 +3805,11 @@ long do_mmuext_op(
             break;
 
         case MMUEXT_FLUSH_CACHE:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(!cache_flush_permitted(currd)) )
-                rc = -EACCES;
-            else
-                wbinvd();
-            break;
-
+            /*
+             * Dirty pCPU caches where the current vCPU has been scheduled are
+             * not tracked, and hence we need to resort to a global cache
+             * flush for correctness.
+             */
         case MMUEXT_FLUSH_CACHE_GLOBAL:
             if ( unlikely(currd != pg_owner) )
                 rc = -EPERM;
-- 
2.48.1
On 06.05.2025 10:31, Roger Pau Monne wrote:
> The implementation of MMUEXT_FLUSH_CACHE is bogus, as it doesn't account to
> flush the cache of any previous pCPU where the current vCPU might have run,
> and hence is likely to not work as expected.
> 
> Fix this by resorting to use the same logic as MMUEXT_FLUSH_CACHE_GLOBAL,
> which will be correct in all cases.
> 
> Fixes: 534ffcb416bb ("Fix WBINVD by adding a new hypercall.")
Oh, and: I've looked up this hash, and found a "trivial merge". Are you sure
here?
Jan
                
            On Mon, May 12, 2025 at 04:11:51PM +0200, Jan Beulich wrote:
> On 06.05.2025 10:31, Roger Pau Monne wrote:
> > The implementation of MMUEXT_FLUSH_CACHE is bogus, as it doesn't account to
> > flush the cache of any previous pCPU where the current vCPU might have run,
> > and hence is likely to not work as expected.
> > 
> > Fix this by resorting to use the same logic as MMUEXT_FLUSH_CACHE_GLOBAL,
> > which will be correct in all cases.
> > 
> > Fixes: 534ffcb416bb ("Fix WBINVD by adding a new hypercall.")
> 
> Oh, and: I've looked up this hash, and found a "trivial merge". Are you sure
> here?
Indeed, sorry.  I've made a mistake and copied the wrong hash, it
should be 8e90e37e6db8.  The change subject is correct however.
Thanks, Roger.
                
            On 06.05.2025 10:31, Roger Pau Monne wrote:
> The implementation of MMUEXT_FLUSH_CACHE is bogus, as it doesn't account to
> flush the cache of any previous pCPU where the current vCPU might have run,
> and hence is likely to not work as expected.
> 
> Fix this by resorting to use the same logic as MMUEXT_FLUSH_CACHE_GLOBAL,
> which will be correct in all cases.
> 
> Fixes: 534ffcb416bb ("Fix WBINVD by adding a new hypercall.")
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Alternatively, the hypercall could be made correct by keeping track of the
> pCPUs the vCPU has run on since the last cache flush.  That's however more
> work.  See later in the series.
Since, as iirc you indicated elsewhere, there's no actual user of this sub-op,
doing as you do here is likely good enough. Just one concern:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -3805,14 +3805,11 @@ long do_mmuext_op(
>              break;
>  
>          case MMUEXT_FLUSH_CACHE:
> -            if ( unlikely(currd != pg_owner) )
> -                rc = -EPERM;
> -            else if ( unlikely(!cache_flush_permitted(currd)) )
> -                rc = -EACCES;
This error code will change to ...
> -            else
> -                wbinvd();
> -            break;
> -
> +            /*
> +             * Dirty pCPU caches where the current vCPU has been scheduled are
> +             * not tracked, and hence we need to resort to a global cache
> +             * flush for correctness.
> +             */
>          case MMUEXT_FLUSH_CACHE_GLOBAL:
>              if ( unlikely(currd != pg_owner) )
>                  rc = -EPERM;
... -EINVAL (sitting out of context). If we accept any error code change here,
I think it wants to be the other way around, as EINVAL isn't quite appropriate
to signal !cache_flush_permitted() to the caller. With that extra adjustment:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
                
            On Mon, May 12, 2025 at 04:09:28PM +0200, Jan Beulich wrote:
> On 06.05.2025 10:31, Roger Pau Monne wrote:
> > The implementation of MMUEXT_FLUSH_CACHE is bogus, as it doesn't account to
> > flush the cache of any previous pCPU where the current vCPU might have run,
> > and hence is likely to not work as expected.
> > 
> > Fix this by resorting to use the same logic as MMUEXT_FLUSH_CACHE_GLOBAL,
> > which will be correct in all cases.
> > 
> > Fixes: 534ffcb416bb ("Fix WBINVD by adding a new hypercall.")
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Alternatively, the hypercall could be made correct by keeping track of the
> > pCPUs the vCPU has run on since the last cache flush.  That's however more
> > work.  See later in the series.
> 
> Since, as iirc you indicated elsewhere, there's no actual user of this sub-op,
> doing as you do here is likely good enough. Just one concern:
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -3805,14 +3805,11 @@ long do_mmuext_op(
> >              break;
> >  
> >          case MMUEXT_FLUSH_CACHE:
> > -            if ( unlikely(currd != pg_owner) )
> > -                rc = -EPERM;
> > -            else if ( unlikely(!cache_flush_permitted(currd)) )
> > -                rc = -EACCES;
> 
> This error code will change to ...
> 
> > -            else
> > -                wbinvd();
> > -            break;
> > -
> > +            /*
> > +             * Dirty pCPU caches where the current vCPU has been scheduled are
> > +             * not tracked, and hence we need to resort to a global cache
> > +             * flush for correctness.
> > +             */
> >          case MMUEXT_FLUSH_CACHE_GLOBAL:
> >              if ( unlikely(currd != pg_owner) )
> >                  rc = -EPERM;
> 
> ... -EINVAL (sitting out of context). If we accept any error code change here,
> I think it wants to be the other way around, as EINVAL isn't quite appropriate
> to signal !cache_flush_permitted() to the caller. With that extra adjustment:
> Acked-by: Jan Beulich <jbeulich@suse.com>
Oh, good catch.  I didn't realize the return code change.
Thanks, Roger.
                
            © 2016 - 2025 Red Hat, Inc.