[Xen-devel] [PATCH v2] xen/mm: Avoid assuming the page is inuse in assign_pages()

Julien Grall posted 1 patch 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200206103833.15355-1-julien@xen.org
xen/common/page_alloc.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

[Xen-devel] [PATCH v2] xen/mm: Avoid assuming the page is inuse in assign_pages()

Posted by Julien Grall 1 week ago
From: Julien Grall <jgrall@amazon.com>

At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
and the state value to be 0.

However, the code may race with the page offlining code (see
offline_page()). Depending on the ordering, the page may be in offlining
state (PGC_state_offlining) before it is assigned to a domain.

On debug build, this may result to hit the assert or just clobber the
state. On non-debug build, the state will get clobbered.

Incidentally the flag PGC_broken will get clobbered as well.

Grab the heap_lock to prevent a race with offline_page() and keep the
state and broken flag around.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---
    Changes in v2:
        - Superseed <20200204133357.32101-1-julien@xen.org>
        - Fix the race with offline_page()
---
 xen/common/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 97902d42c1..a684dbf37c 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2283,15 +2283,27 @@ int assign_pages(
             get_knownalive_domain(d);
     }
 
+    spin_lock(&heap_lock);
     for ( i = 0; i < (1 << order); i++ )
     {
+        /*
+         * We should only be here if the page is inuse or offlining.
+         * The latter happen if we race with mark_page_offline() as we
+         * don't hold the heap_lock.
+         */
+        ASSERT(page_state_is(&pg[i], inuse) ||
+               page_state_is(&pg[i], offlining));
+        ASSERT(!(pg[i].count_info & ~(PGC_state | PGC_broken)));
         ASSERT(page_get_owner(&pg[i]) == NULL);
-        ASSERT(!pg[i].count_info);
         page_set_owner(&pg[i], d);
         smp_wmb(); /* Domain pointer must be visible before updating refcnt. */
-        pg[i].count_info = PGC_allocated | 1;
+
+        pg[i].count_info &= PGC_state | PGC_broken;
+        pg[i].count_info |= PGC_allocated | 1;
+
         page_list_add_tail(&pg[i], &d->page_list);
     }
+    spin_unlock(&heap_lock);
 
  out:
     spin_unlock(&d->page_alloc_lock);
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/mm: Avoid assuming the page is inuse in assign_pages()

Posted by Jan Beulich 1 week ago
On 06.02.2020 11:38, Julien Grall wrote:
> However, the code may race with the page offlining code (see
> offline_page()). Depending on the ordering, the page may be in offlining
> state (PGC_state_offlining) before it is assigned to a domain.
> 
> On debug build, this may result to hit the assert or just clobber the
> state. On non-debug build, the state will get clobbered.
> 
> Incidentally the flag PGC_broken will get clobbered as well.

As mentioned when I first pointed out this issue, it is wider than
just assign_pages() afaict, which is specifically why I said I
wouldn't expect you to want to deal with it alongside the "implicit
inuse" aspect. Fixing just one instance of it without also
addressing the others isn't going to help. IOW you could leave the
code the way it was in v1 in this regard, and then we (you, me, or
yet someone else) take care of the race aspect globally for the
tree.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel