[v1] xen/grant-table: Simplify the update to the per-vCPU maptrack freelist

[PATCH] xen/grant-table: Simplify the update to the per-vCPU maptrack freelist

Posted by Julien Grall 4 years, 8 months ago

From: Julien Grall <jgrall@amazon.com>

Since XSA-288 (commit 02cbeeb62075 "gnttab: split maptrack lock to make
it fulfill its purpose again"), v->maptrack_head and v->maptrack_tail
are with the lock v->maptrack_freelist_lock held.

Therefore it is not necessary to update the fields using cmpxchg()
and also read them atomically.

Note that there are two cases where v->maptrack_tail is accessed without
the lock. They both happen _get_maptrack_handle() when the current vCPU
list is empty. Therefore there is no possible race.

The code is now reworked to remove any use of cmpxch() and read_atomic()
when accessing the fields v->maptrack_{head, tail}.

Take the opportunity to add a comment on top of the lock definition
and explain what it protects.

Signed-off-by: Julien Grall <jgrall@amazon.com>

----

I am not sure whether we should try to protect the remaining unprotected
access with the lock or maybe add a comment?
---
 xen/common/grant_table.c | 60 +++++++++++++++-------------------------
 xen/include/xen/sched.h  |  5 +++-
 2 files changed, 27 insertions(+), 38 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index ab30e2e8cfb6..cac9d1d73446 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -543,34 +543,26 @@ double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
 static inline grant_handle_t
 _get_maptrack_handle(struct grant_table *t, struct vcpu *v)
 {
-    unsigned int head, next, prev_head;
+    unsigned int head, next;
 
     spin_lock(&v->maptrack_freelist_lock);
 
-    do {
-        /* No maptrack pages allocated for this VCPU yet? */
-        head = read_atomic(&v->maptrack_head);
-        if ( unlikely(head == MAPTRACK_TAIL) )
-        {
-            spin_unlock(&v->maptrack_freelist_lock);
-            return INVALID_MAPTRACK_HANDLE;
-        }
-
-        /*
-         * Always keep one entry in the free list to make it easier to
-         * add free entries to the tail.
-         */
-        next = read_atomic(&maptrack_entry(t, head).ref);
-        if ( unlikely(next == MAPTRACK_TAIL) )
-        {
-            spin_unlock(&v->maptrack_freelist_lock);
-            return INVALID_MAPTRACK_HANDLE;
-        }
+    /* No maptrack pages allocated for this VCPU yet? */
+    head = v->maptrack_head;
+    if ( unlikely(head == MAPTRACK_TAIL) )
+        goto out;
 
-        prev_head = head;
-        head = cmpxchg(&v->maptrack_head, prev_head, next);
-    } while ( head != prev_head );
+    /*
+     * Always keep one entry in the free list to make it easier to
+     * add free entries to the tail.
+     */
+    next = read_atomic(&maptrack_entry(t, head).ref);
+    if ( unlikely(next == MAPTRACK_TAIL) )
+        head = MAPTRACK_TAIL;
+    else
+        v->maptrack_head = next;
 
+out:
     spin_unlock(&v->maptrack_freelist_lock);
 
     return head;
@@ -623,7 +615,7 @@ put_maptrack_handle(
 {
     struct domain *currd = current->domain;
     struct vcpu *v;
-    unsigned int prev_tail, cur_tail;
+    unsigned int prev_tail;
 
     /* 1. Set entry to be a tail. */
     maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
@@ -633,11 +625,8 @@ put_maptrack_handle(
 
     spin_lock(&v->maptrack_freelist_lock);
 
-    cur_tail = read_atomic(&v->maptrack_tail);
-    do {
-        prev_tail = cur_tail;
-        cur_tail = cmpxchg(&v->maptrack_tail, prev_tail, handle);
-    } while ( cur_tail != prev_tail );
+    prev_tail = v->maptrack_tail;
+    v->maptrack_tail = handle;
 
     /* 3. Update the old tail entry to point to the new entry. */
     write_atomic(&maptrack_entry(t, prev_tail).ref, handle);
@@ -650,7 +639,7 @@ get_maptrack_handle(
     struct grant_table *lgt)
 {
     struct vcpu          *curr = current;
-    unsigned int          i, head;
+    unsigned int          i;
     grant_handle_t        handle;
     struct grant_mapping *new_mt = NULL;
 
@@ -686,7 +675,7 @@ get_maptrack_handle(
             maptrack_entry(lgt, handle).ref = MAPTRACK_TAIL;
             curr->maptrack_tail = handle;
             if ( curr->maptrack_head == MAPTRACK_TAIL )
-                write_atomic(&curr->maptrack_head, handle);
+                curr->maptrack_head = handle;
             spin_unlock(&curr->maptrack_freelist_lock);
         }
         return steal_maptrack_handle(lgt, curr);
@@ -716,13 +705,10 @@ get_maptrack_handle(
     lgt->maptrack_limit += MAPTRACK_PER_PAGE;
 
     spin_unlock(&lgt->maptrack_lock);
-    spin_lock(&curr->maptrack_freelist_lock);
-
-    do {
-        new_mt[i - 1].ref = read_atomic(&curr->maptrack_head);
-        head = cmpxchg(&curr->maptrack_head, new_mt[i - 1].ref, handle + 1);
-    } while ( head != new_mt[i - 1].ref );
 
+    spin_lock(&curr->maptrack_freelist_lock);
+    new_mt[i - 1].ref = curr->maptrack_head;
+    curr->maptrack_head = handle + 1;
     spin_unlock(&curr->maptrack_freelist_lock);
 
     return handle;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3982167144c6..bd1cb08266d8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -255,7 +255,10 @@ struct vcpu
     /* VCPU paused by system controller. */
     int              controller_pause_count;
 
-    /* Grant table map tracking. */
+    /*
+     * Grant table map tracking. The lock maptrack_freelist_lock protect
+     * the access to maptrack_head and maptrack_tail.
+     */
     spinlock_t       maptrack_freelist_lock;
     unsigned int     maptrack_head;
     unsigned int     maptrack_tail;
-- 
2.17.1

Re: [PATCH] xen/grant-table: Simplify the update to the per-vCPU maptrack freelist

Posted by Jan Beulich 4 years, 8 months ago

On 26.05.2021 17:21, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Since XSA-288 (commit 02cbeeb62075 "gnttab: split maptrack lock to make

XSA-228 I suppose?

> it fulfill its purpose again"), v->maptrack_head and v->maptrack_tail
> are with the lock v->maptrack_freelist_lock held.

Nit: missing "accessed" or alike?

> Therefore it is not necessary to update the fields using cmpxchg()
> and also read them atomically.

Ah yes, very good observation. Should have noticed this back at the
time, for an immediate follow-up change.

> Note that there are two cases where v->maptrack_tail is accessed without
> the lock. They both happen _get_maptrack_handle() when the current vCPU
> list is empty. Therefore there is no possible race.

I think you mean the other function here, without a leading underscore
in its name. And if you want to explain the absence of a race, wouldn't
you then better also mention that the list can get initially filled
only on the local vCPU?

> I am not sure whether we should try to protect the remaining unprotected
> access with the lock or maybe add a comment?

As per above I don't view adding locking as sensible. If you feel like
adding a helpful comment, perhaps. I will admit that it took me more
than just a moment to recall that "local vCPU only" argument.

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -543,34 +543,26 @@ double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
>  static inline grant_handle_t
>  _get_maptrack_handle(struct grant_table *t, struct vcpu *v)
>  {
> -    unsigned int head, next, prev_head;
> +    unsigned int head, next;
>  
>      spin_lock(&v->maptrack_freelist_lock);
>  
> -    do {
> -        /* No maptrack pages allocated for this VCPU yet? */
> -        head = read_atomic(&v->maptrack_head);
> -        if ( unlikely(head == MAPTRACK_TAIL) )
> -        {
> -            spin_unlock(&v->maptrack_freelist_lock);
> -            return INVALID_MAPTRACK_HANDLE;

Where did this and ...

> -        }
> -
> -        /*
> -         * Always keep one entry in the free list to make it easier to
> -         * add free entries to the tail.
> -         */
> -        next = read_atomic(&maptrack_entry(t, head).ref);
> -        if ( unlikely(next == MAPTRACK_TAIL) )
> -        {
> -            spin_unlock(&v->maptrack_freelist_lock);
> -            return INVALID_MAPTRACK_HANDLE;

... this use of INVALID_MAPTRACK_HANDLE go? It is at present merely
coincidence that INVALID_MAPTRACK_HANDLE == MAPTRACK_TAIL. If you
want to fold them, you will need to do so properly (by eliminating
one of the two constants). But I think they're separate on purpose.

> -        }
> +    /* No maptrack pages allocated for this VCPU yet? */
> +    head = v->maptrack_head;
> +    if ( unlikely(head == MAPTRACK_TAIL) )
> +        goto out;
>  
> -        prev_head = head;
> -        head = cmpxchg(&v->maptrack_head, prev_head, next);
> -    } while ( head != prev_head );
> +    /*
> +     * Always keep one entry in the free list to make it easier to
> +     * add free entries to the tail.
> +     */
> +    next = read_atomic(&maptrack_entry(t, head).ref);

Since the lock protects the entire free list, why do you need to
keep read_atomic() here?

> +    if ( unlikely(next == MAPTRACK_TAIL) )
> +        head = MAPTRACK_TAIL;
> +    else
> +        v->maptrack_head = next;
>  
> +out:

Please indent labels by at least one blank, to avoid issues with
diff's -p option. In fact if you didn't introduce a goto here in
the first place, there'd be less code churn overall, as you'd
need to alter the indentation of fewer lines.

> @@ -623,7 +615,7 @@ put_maptrack_handle(
>  {
>      struct domain *currd = current->domain;
>      struct vcpu *v;
> -    unsigned int prev_tail, cur_tail;
> +    unsigned int prev_tail;
>  
>      /* 1. Set entry to be a tail. */
>      maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
> @@ -633,11 +625,8 @@ put_maptrack_handle(
>  
>      spin_lock(&v->maptrack_freelist_lock);
>  
> -    cur_tail = read_atomic(&v->maptrack_tail);
> -    do {
> -        prev_tail = cur_tail;
> -        cur_tail = cmpxchg(&v->maptrack_tail, prev_tail, handle);
> -    } while ( cur_tail != prev_tail );
> +    prev_tail = v->maptrack_tail;
> +    v->maptrack_tail = handle;
>  
>      /* 3. Update the old tail entry to point to the new entry. */
>      write_atomic(&maptrack_entry(t, prev_tail).ref, handle);

Since the write_atomic() here can then also be converted, may I
ask that you then rename the local variable to just "tail" as
well?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -255,7 +255,10 @@ struct vcpu
>      /* VCPU paused by system controller. */
>      int              controller_pause_count;
>  
> -    /* Grant table map tracking. */
> +    /*
> +     * Grant table map tracking. The lock maptrack_freelist_lock protect

Nit: protects

> +     * the access to maptrack_head and maptrack_tail.
> +     */

I'm inclined to suggest this doesn't need spelling out, considering ...

>      spinlock_t       maptrack_freelist_lock;
>      unsigned int     maptrack_head;
>      unsigned int     maptrack_tail;

... both the name of the lock and its placement next to the two
fields it protects. Also as per the docs change of the XSA-228 change,
the lock protects more than just these two fields, so the comment may
be misleading the way you have it now.

Jan

Re: [PATCH] xen/grant-table: Simplify the update to the per-vCPU maptrack freelist

Posted by Julien Grall 4 years, 8 months ago

Hi Jan,

On 28/05/2021 14:29, Jan Beulich wrote:
> On 26.05.2021 17:21, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Since XSA-288 (commit 02cbeeb62075 "gnttab: split maptrack lock to make
> 
> XSA-228 I suppose?

Yes, I will update in the next version.

>> it fulfill its purpose again"), v->maptrack_head and v->maptrack_tail
>> are with the lock v->maptrack_freelist_lock held.
> 
> Nit: missing "accessed" or alike?
I have added "accessed".

> 
>> Therefore it is not necessary to update the fields using cmpxchg()
>> and also read them atomically.
> 
> Ah yes, very good observation. Should have noticed this back at the
> time, for an immediate follow-up change.
> 
>> Note that there are two cases where v->maptrack_tail is accessed without
>> the lock. They both happen _get_maptrack_handle() when the current vCPU
>> list is empty. Therefore there is no possible race.
> 
> I think you mean the other function here, without a leading underscore
> in its name. 

Hmmm... Yes. I will update it.

> And if you want to explain the absence of a race, wouldn't
> you then better also mention that the list can get initially filled
> only on the local vCPU?

Sure. I will reword it.

> 
>> I am not sure whether we should try to protect the remaining unprotected
>> access with the lock or maybe add a comment?
> 
> As per above I don't view adding locking as sensible. If you feel like
> adding a helpful comment, perhaps. I will admit that it took me more
> than just a moment to recall that "local vCPU only" argument.

I will try to come up with an helpful comment.

>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -543,34 +543,26 @@ double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
>>   static inline grant_handle_t
>>   _get_maptrack_handle(struct grant_table *t, struct vcpu *v)
>>   {
>> -    unsigned int head, next, prev_head;
>> +    unsigned int head, next;
>>   
>>       spin_lock(&v->maptrack_freelist_lock);
>>   
>> -    do {
>> -        /* No maptrack pages allocated for this VCPU yet? */
>> -        head = read_atomic(&v->maptrack_head);
>> -        if ( unlikely(head == MAPTRACK_TAIL) )
>> -        {
>> -            spin_unlock(&v->maptrack_freelist_lock);
>> -            return INVALID_MAPTRACK_HANDLE;
> 
> Where did this and ...
> 
>> -        }
>> -
>> -        /*
>> -         * Always keep one entry in the free list to make it easier to
>> -         * add free entries to the tail.
>> -         */
>> -        next = read_atomic(&maptrack_entry(t, head).ref);
>> -        if ( unlikely(next == MAPTRACK_TAIL) )
>> -        {
>> -            spin_unlock(&v->maptrack_freelist_lock);
>> -            return INVALID_MAPTRACK_HANDLE;
> 
> ... this use of INVALID_MAPTRACK_HANDLE go? It is at present merely
> coincidence that INVALID_MAPTRACK_HANDLE == MAPTRACK_TAIL. If you
> want to fold them, you will need to do so properly (by eliminating
> one of the two constants). But I think they're separate on purpose.

Hmmm... Somehow I thought one was an alias to the other. But they are 
clearly not. I will update it on the next version.


> 
>> -        }
>> +    /* No maptrack pages allocated for this VCPU yet? */
>> +    head = v->maptrack_head;
>> +    if ( unlikely(head == MAPTRACK_TAIL) )
>> +        goto out;
>>   
>> -        prev_head = head;
>> -        head = cmpxchg(&v->maptrack_head, prev_head, next);
>> -    } while ( head != prev_head );
>> +    /*
>> +     * Always keep one entry in the free list to make it easier to
>> +     * add free entries to the tail.
>> +     */
>> +    next = read_atomic(&maptrack_entry(t, head).ref);
> 
> Since the lock protects the entire free list, why do you need to
> keep read_atomic() here?

Because I wasn't sure whether dropping {write, read}_atomic() when 
accessing the freelist would be fine.

Anyway, I can drop it in the next version.

> 
>> +    if ( unlikely(next == MAPTRACK_TAIL) )
>> +        head = MAPTRACK_TAIL;
>> +    else
>> +        v->maptrack_head = next;
>>   
>> +out:
> 
> Please indent labels by at least one blank, to avoid issues with
> diff's -p option. In fact if you didn't introduce a goto here in
> the first place, there'd be less code churn overall, as you'd
> need to alter the indentation of fewer lines.

I will have a look.

> 
>> @@ -623,7 +615,7 @@ put_maptrack_handle(
>>   {
>>       struct domain *currd = current->domain;
>>       struct vcpu *v;
>> -    unsigned int prev_tail, cur_tail;
>> +    unsigned int prev_tail;
>>   
>>       /* 1. Set entry to be a tail. */
>>       maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
>> @@ -633,11 +625,8 @@ put_maptrack_handle(
>>   
>>       spin_lock(&v->maptrack_freelist_lock);
>>   
>> -    cur_tail = read_atomic(&v->maptrack_tail);
>> -    do {
>> -        prev_tail = cur_tail;
>> -        cur_tail = cmpxchg(&v->maptrack_tail, prev_tail, handle);
>> -    } while ( cur_tail != prev_tail );
>> +    prev_tail = v->maptrack_tail;
>> +    v->maptrack_tail = handle;
>>   
>>       /* 3. Update the old tail entry to point to the new entry. */
>>       write_atomic(&maptrack_entry(t, prev_tail).ref, handle);
> 
> Since the write_atomic() here can then also be converted, may I
> ask that you then rename the local variable to just "tail" as
> well?

Sure.

> 
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -255,7 +255,10 @@ struct vcpu
>>       /* VCPU paused by system controller. */
>>       int              controller_pause_count;
>>   
>> -    /* Grant table map tracking. */
>> +    /*
>> +     * Grant table map tracking. The lock maptrack_freelist_lock protect
> 
> Nit: protects

I will fix it.

> 
>> +     * the access to maptrack_head and maptrack_tail.
>> +     */
> 
> I'm inclined to suggest this doesn't need spelling out, considering ...
> 
>>       spinlock_t       maptrack_freelist_lock;
>>       unsigned int     maptrack_head;
>>       unsigned int     maptrack_tail;
> 
> ... both the name of the lock and its placement next to the two
> fields it protects. Also as per the docs change of the XSA-228 change,
> the lock protects more than just these two fields, so the comment may
> be misleading the way you have it now.

So I think it would be good to document above the lock what it actually 
protects. I agree this is fairly clear that it protect maptrack_{head, 
tail} but this wasn't very clear to me that it would also protect the 
content of the freelist (so read_atomic()/write_atomic() could be dropped).

I will try to come up with a better comment.

Cheers,

-- 
Julien Grall