Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups()
with the new helper `cg_read_key_long_poll()`. This change improves
the robustness of the test by polling the "nr_dying_descendants"
counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.
Additionally, increase the retry timeout to 8 seconds (from 5 seconds)
based on testing results:
- With 5-second timeout: 4/20 runs passed.
- With 8-second timeout: 20/20 runs passed.
The 8 second timeout is based on stress testing of test_kmem_dead_cgroups()
under load: 5 seconds was occasionally not enough for reclaim of dying
descendants to complete, whereas 8 seconds consistently covered the observed
latencies. This value is intended as a generous upper bound for the
asynchronous reclaim and is not tied to any specific kernel constant, so it
can be adjusted in the future if reclaim behavior changes.
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
---
tools/testing/selftests/cgroup/test_kmem.c | 33 ++++++++++------------
1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index ca38525484e3..eeabd34bf083 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -26,6 +26,7 @@
*/
#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
+#define KMEM_DEAD_WAIT_RETRIES 80
static int alloc_dcache(const char *cgroup, void *arg)
{
@@ -306,9 +307,7 @@ static int test_kmem_dead_cgroups(const char *root)
{
int ret = KSFT_FAIL;
char *parent;
- long dead;
- int i;
- int max_time = 20;
+ long dead = -1;
parent = cg_name(root, "kmem_dead_cgroups_test");
if (!parent)
@@ -323,21 +322,19 @@ static int test_kmem_dead_cgroups(const char *root)
if (cg_run_in_subcgroups(parent, alloc_dcache, (void *)100, 30))
goto cleanup;
- for (i = 0; i < max_time; i++) {
- dead = cg_read_key_long(parent, "cgroup.stat",
- "nr_dying_descendants ");
- if (dead == 0) {
- ret = KSFT_PASS;
- break;
- }
- /*
- * Reclaiming cgroups might take some time,
- * let's wait a bit and repeat.
- */
- sleep(1);
- if (i > 5)
- printf("Waiting time longer than 5s; wait: %ds (dead: %ld)\n", i, dead);
- }
+ /*
+ * Allow up to ~8s for reclaim of dying descendants to complete.
+ * This is a generous upper bound derived from stress testing, not
+ * from a specific kernel constant, and can be adjusted if reclaim
+ * behavior changes in the future.
+ */
+ dead = cg_read_key_long_poll(parent, "cgroup.stat",
+ "nr_dying_descendants ", 0, KMEM_DEAD_WAIT_RETRIES,
+ DEFAULT_WAIT_INTERVAL_US);
+ if (dead)
+ goto cleanup;
+
+ ret = KSFT_PASS;
cleanup:
cg_destroy(parent);
--
2.25.1
On Wed, Dec 03, 2025 at 07:56:31PM +0800, Guopeng Zhang <zhangguopeng@kylinos.cn> wrote: > Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups() > with the new helper `cg_read_key_long_poll()`. This change improves > the robustness of the test by polling the "nr_dying_descendants" > counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded. > > Additionally, increase the retry timeout to 8 seconds (from 5 seconds) > based on testing results: > - With 5-second timeout: 4/20 runs passed. > - With 8-second timeout: 20/20 runs passed. > > The 8 second timeout is based on stress testing of test_kmem_dead_cgroups() > under load: 5 seconds was occasionally not enough for reclaim of dying > descendants to complete, whereas 8 seconds consistently covered the observed > latencies. This value is intended as a generous upper bound for the > asynchronous reclaim and is not tied to any specific kernel constant, so it > can be adjusted in the future if reclaim behavior changes. Great! > > Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> > --- > tools/testing/selftests/cgroup/test_kmem.c | 33 ++++++++++------------ > 1 file changed, 15 insertions(+), 18 deletions(-) Acked-by: Michal Koutný <mkoutny@suse.com>
© 2016 - 2025 Red Hat, Inc.