Replaced the manual sleep and retry logic in test_kmem_dead_cgroups() with the new
helper `cg_read_key_long_poll()`. This change improves the robustness of the test by
polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.
Additionally, increased the retry timeout to 8 seconds (from 5 seconds) based on testing results:
- With 5-second timeout: 4/20 runs passed.
- With 8-second timeout: 20/20 runs passed.
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
tools/testing/selftests/cgroup/test_kmem.c | 31 +++++++++-------------
1 file changed, 13 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index ca38525484e3..299d8e332e42 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -26,6 +26,7 @@
*/
#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
+#define KMEM_DEAD_WAIT_RETRIES 80 /* 8s total */
static int alloc_dcache(const char *cgroup, void *arg)
{
@@ -306,9 +307,7 @@ static int test_kmem_dead_cgroups(const char *root)
{
int ret = KSFT_FAIL;
char *parent;
- long dead;
- int i;
- int max_time = 20;
+ long dead = -1;
parent = cg_name(root, "kmem_dead_cgroups_test");
if (!parent)
@@ -323,21 +322,17 @@ static int test_kmem_dead_cgroups(const char *root)
if (cg_run_in_subcgroups(parent, alloc_dcache, (void *)100, 30))
goto cleanup;
- for (i = 0; i < max_time; i++) {
- dead = cg_read_key_long(parent, "cgroup.stat",
- "nr_dying_descendants ");
- if (dead == 0) {
- ret = KSFT_PASS;
- break;
- }
- /*
- * Reclaiming cgroups might take some time,
- * let's wait a bit and repeat.
- */
- sleep(1);
- if (i > 5)
- printf("Waiting time longer than 5s; wait: %ds (dead: %ld)\n", i, dead);
- }
+ /*
+ * Reclaiming cgroups may take some time,
+ * so let's wait a bit and retry.
+ */
+ dead = cg_read_key_long_poll(parent, "cgroup.stat",
+ "nr_dying_descendants ", 0, KMEM_DEAD_WAIT_RETRIES,
+ DEFAULT_WAIT_INTERVAL_US);
+ if (dead)
+ goto cleanup;
+
+ ret = KSFT_PASS;
cleanup:
cg_destroy(parent);
--
2.25.1
On Mon, Nov 24, 2025 at 08:38:16PM +0800, Guopeng Zhang wrote: > Replaced the manual sleep and retry logic in test_kmem_dead_cgroups() with the new > helper `cg_read_key_long_poll()`. This change improves the robustness of the test by > polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded. > > Additionally, increased the retry timeout to 8 seconds (from 5 seconds) based on testing results: Why 8 seconds? What does it depend on? For memcg stats I see the 3 seconds driven from the 2 sec periodic rstat flush. Mainly how can we make this more future proof? > - With 5-second timeout: 4/20 runs passed. > - With 8-second timeout: 20/20 runs passed. > > Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> Anyways, just add a sentence in the commit message on the reasoning behind 8 seconds and a comment in code as well. With that, you can add: Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
On 12/3/25 07:18, Shakeel Butt wrote: > On Mon, Nov 24, 2025 at 08:38:16PM +0800, Guopeng Zhang wrote: >> Replaced the manual sleep and retry logic in test_kmem_dead_cgroups() with the new >> helper `cg_read_key_long_poll()`. This change improves the robustness of the test by >> polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded. >> >> Additionally, increased the retry timeout to 8 seconds (from 5 seconds) based on testing results: > > Why 8 seconds? What does it depend on? For memcg stats I see the 3 > seconds driven from the 2 sec periodic rstat flush. Mainly how can we > make this more future proof? > Hi Shakeel, Thanks a lot for the review and for the guidance. The 8s timeout was chosen based on stress testing of test_kmem_dead_cgroups() on my setup: 5s was not always sufficient under load, while 8s consistently covered the reclaim of dying descendants. It is intended as a generous upper bound for the asynchronous reclaim and is not tied to any specific kernel constant. If the reclaim behavior changes significantly in the future, this timeout can be adjusted along with the test. >> - With 5-second timeout: 4/20 runs passed. >> - With 8-second timeout: 20/20 runs passed. >> >> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> > > Anyways, just add a sentence in the commit message on the reasoning > behind 8 seconds and a comment in code as well. With that, you can add: > > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> I’ll add a short sentence to the commit message and a comment next to KMEM_DEAD_WAIT_RETRIES explaining this rationale, and will include your: Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> in the next version. Thanks, Guopeng
© 2016 - 2026 Red Hat, Inc.