[PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction"

Roger Pau Monne posted 1 patch 3 years ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20211112120208.74387-1-roger.pau@citrix.com
xen/common/domain.c | 12 ++----------
xen/common/domctl.c |  5 +----
2 files changed, 3 insertions(+), 14 deletions(-)
[PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction"
Posted by Roger Pau Monne 3 years ago
This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.

Performance analysis has shown that dropping the domctl lock during
domain destruction greatly increases the contention in the heap_lock,
thus making parallel destruction of domains slower.

The following lockperf data shows the difference between the current
code and the reverted one:

lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
lock:  2788704(0.362311723s), block:   222681( 0.091152276s)

Those figures are from Dmitry Isaikin, and are gathered after
destroying 5 2GB HVM guests in parallel:

https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01515.html

Given the current point in the release, revert the commit and
reinstate holding the domctl lock during domain destruction. Further
work should be done in order to re-add more fine grained locking to
the domain destruction path once a proper solution to avoid the
heap_lock contention is found.

Reported-by: Hongyan Xia <hongyxia@amazon.com>
Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Ian Jackson <iwj@xenproject.org>

Since this is a revert and not new code I think the risk is lower.
There's however some risk, as the original commit was from 2017, and
hence the surrounding code has changed a bit. It's also a possibility
that some other parts of the domain destruction code now rely on this
more fine grained locking. Local tests however haven't shown issues.
---
Changes since v1:
 - Expand commit message.
---
 xen/common/domain.c | 12 ++----------
 xen/common/domctl.c |  5 +----
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 56d47dd664..093bb4403f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -917,21 +917,13 @@ int domain_kill(struct domain *d)
     if ( d == current->domain )
         return -EINVAL;
 
-    /* Protected by d->domain_lock. */
+    /* Protected by domctl_lock. */
     switch ( d->is_dying )
     {
     case DOMDYING_alive:
-        domain_unlock(d);
         domain_pause(d);
-        domain_lock(d);
-        /*
-         * With the domain lock dropped, d->is_dying may have changed. Call
-         * ourselves recursively if so, which is safe as then we won't come
-         * back here.
-         */
-        if ( d->is_dying != DOMDYING_alive )
-            return domain_kill(d);
         d->is_dying = DOMDYING_dying;
+        spin_barrier(&d->domain_lock);
         argo_destroy(d);
         vnuma_destroy(d->vnuma);
         domain_set_outstanding_pages(d, 0);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 271862ae58..879a2adcbe 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -497,14 +497,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         break;
 
     case XEN_DOMCTL_destroydomain:
-        domctl_lock_release();
-        domain_lock(d);
         ret = domain_kill(d);
-        domain_unlock(d);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(
                 __HYPERVISOR_domctl, "h", u_domctl);
-        goto domctl_out_unlock_domonly;
+        break;
 
     case XEN_DOMCTL_setnodeaffinity:
     {
-- 
2.33.0


Re: [PATCH for-4.16 v2] Revert "domctl: improve locking during domain destruction"
Posted by Julien Grall 3 years ago
Hi Roger,

On 12/11/2021 12:02, Roger Pau Monne wrote:
> This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
> 
> Performance analysis has shown that dropping the domctl lock during
> domain destruction greatly increases the contention in the heap_lock,
> thus making parallel destruction of domains slower.
> 
> The following lockperf data shows the difference between the current
> code and the reverted one:
> 
> lock:  3342357(2.268295505s), block:  3263853(18.556650797s)
> lock:  2788704(0.362311723s), block:   222681( 0.091152276s)
> 
> Those figures are from Dmitry Isaikin, and are gathered after
> destroying 5 2GB HVM guests in parallel:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01515.html
> 
> Given the current point in the release, revert the commit and
> reinstate holding the domctl lock during domain destruction. Further
> work should be done in order to re-add more fine grained locking to
> the domain destruction path once a proper solution to avoid the
> heap_lock contention is found.
> 
> Reported-by: Hongyan Xia <hongyxia@amazon.com>
> Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall