[PATCH] memory: bail from page scrubbing when CPU is no longer online

Jan Beulich posted 1 patch 3 years, 2 months ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/251a14b5-01a5-0a9d-d269-f463a0759f1d@suse.com
[PATCH] memory: bail from page scrubbing when CPU is no longer online
Posted by Jan Beulich 3 years, 2 months ago
Scrubbing can significantly delay the offlining (parking) of a CPU (e.g.
because of booting into in smt=0 mode), to a degree that the "CPU <n>
still not dead..." messages logged on x86 in 1s intervals can be seen
multiple times. There are no softirqs involved in this process, so
extend the existing preemption check in the scrubbing logic to also exit
when the CPU is no longer observed online.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1324,9 +1324,11 @@ bool scrub_free_pages(void)
                      * Scrub a few (8) pages before becoming eligible for
                      * preemption. But also count non-scrubbing loop iterations
                      * so that we don't get stuck here with an almost clean
-                     * heap.
+                     * heap. Consider the CPU no longer being seen as online as
+                     * a request to preempt immediately, to not unduly delay
+                     * its offlining.
                      */
-                    if ( cnt > 800 && softirq_pending(cpu) )
+                    if ( !cpu_online(cpu) || (cnt > 800 && softirq_pending(cpu)) )
                     {
                         preempt = true;
                         break;

Re: [PATCH] memory: bail from page scrubbing when CPU is no longer online
Posted by Andrew Cooper 3 years, 2 months ago
On 28/01/2021 10:35, Jan Beulich wrote:
> Scrubbing can significantly delay the offlining (parking) of a CPU (e.g.
> because of booting into in smt=0 mode), to a degree that the "CPU <n>
> still not dead..." messages logged on x86 in 1s intervals can be seen
> multiple times. There are no softirqs involved in this process, so
> extend the existing preemption check in the scrubbing logic to also exit
> when the CPU is no longer observed online.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>