[PATCH v4 3/3] selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants

Guopeng Zhang posted 3 patches 2 months, 2 weeks ago
There is a newer version of this series
[PATCH v4 3/3] selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
Posted by Guopeng Zhang 2 months, 2 weeks ago
Replaced the manual sleep and retry logic in test_kmem_dead_cgroups() with the new
helper `cg_read_key_long_poll()`.  This change improves the robustness of the test by
polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.

Additionally, increased the retry timeout to 8 seconds (from 5 seconds) based on testing results:
  - With 5-second timeout: 4/20 runs passed.
  - With 8-second timeout: 20/20 runs passed.

Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
 tools/testing/selftests/cgroup/test_kmem.c | 31 +++++++++-------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index ca38525484e3..299d8e332e42 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -26,6 +26,7 @@
  */
 #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
 
+#define KMEM_DEAD_WAIT_RETRIES        80              /* 8s total */
 
 static int alloc_dcache(const char *cgroup, void *arg)
 {
@@ -306,9 +307,7 @@ static int test_kmem_dead_cgroups(const char *root)
 {
 	int ret = KSFT_FAIL;
 	char *parent;
-	long dead;
-	int i;
-	int max_time = 20;
+	long dead = -1;
 
 	parent = cg_name(root, "kmem_dead_cgroups_test");
 	if (!parent)
@@ -323,21 +322,17 @@ static int test_kmem_dead_cgroups(const char *root)
 	if (cg_run_in_subcgroups(parent, alloc_dcache, (void *)100, 30))
 		goto cleanup;
 
-	for (i = 0; i < max_time; i++) {
-		dead = cg_read_key_long(parent, "cgroup.stat",
-					"nr_dying_descendants ");
-		if (dead == 0) {
-			ret = KSFT_PASS;
-			break;
-		}
-		/*
-		 * Reclaiming cgroups might take some time,
-		 * let's wait a bit and repeat.
-		 */
-		sleep(1);
-		if (i > 5)
-			printf("Waiting time longer than 5s; wait: %ds (dead: %ld)\n", i, dead);
-	}
+	/*
+	 * Reclaiming cgroups may take some time,
+	 * so let's wait a bit and retry.
+	 */
+	dead = cg_read_key_long_poll(parent, "cgroup.stat",
+					"nr_dying_descendants ", 0, KMEM_DEAD_WAIT_RETRIES,
+					DEFAULT_WAIT_INTERVAL_US);
+	if (dead)
+		goto cleanup;
+
+	ret = KSFT_PASS;
 
 cleanup:
 	cg_destroy(parent);
-- 
2.25.1
Re: [PATCH v4 3/3] selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
Posted by Shakeel Butt 2 months, 1 week ago
On Mon, Nov 24, 2025 at 08:38:16PM +0800, Guopeng Zhang wrote:
> Replaced the manual sleep and retry logic in test_kmem_dead_cgroups() with the new
> helper `cg_read_key_long_poll()`.  This change improves the robustness of the test by
> polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.
> 
> Additionally, increased the retry timeout to 8 seconds (from 5 seconds) based on testing results:

Why 8 seconds? What does it depend on? For memcg stats I see the 3
seconds driven from the 2 sec periodic rstat flush. Mainly how can we
make this more future proof?

>   - With 5-second timeout: 4/20 runs passed.
>   - With 8-second timeout: 20/20 runs passed.
> 
> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>

Anyways, just add a sentence in the commit message on the reasoning
behind 8 seconds and a comment in code as well. With that, you can add:

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Re: [PATCH v4 3/3] selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
Posted by Guopeng Zhang 2 months, 1 week ago

On 12/3/25 07:18, Shakeel Butt wrote:
> On Mon, Nov 24, 2025 at 08:38:16PM +0800, Guopeng Zhang wrote:
>> Replaced the manual sleep and retry logic in test_kmem_dead_cgroups() with the new
>> helper `cg_read_key_long_poll()`.  This change improves the robustness of the test by
>> polling the "nr_dying_descendants" counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.
>>
>> Additionally, increased the retry timeout to 8 seconds (from 5 seconds) based on testing results:
> 
> Why 8 seconds? What does it depend on? For memcg stats I see the 3
> seconds driven from the 2 sec periodic rstat flush. Mainly how can we
> make this more future proof?
> 
Hi Shakeel,

Thanks a lot for the review and for the guidance.

The 8s timeout was chosen based on stress testing of test_kmem_dead_cgroups()
on my setup: 5s was not always sufficient under load, while 8s consistently
covered the reclaim of dying descendants. It is intended as a generous upper
bound for the asynchronous reclaim and is not tied to any specific kernel
constant. If the reclaim behavior changes significantly in the future, this
timeout can be adjusted along with the test.
>>   - With 5-second timeout: 4/20 runs passed.
>>   - With 8-second timeout: 20/20 runs passed.
>>
>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
> 
> Anyways, just add a sentence in the commit message on the reasoning
> behind 8 seconds and a comment in code as well. With that, you can add:
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
I’ll add a short sentence to the commit message and a comment next to
KMEM_DEAD_WAIT_RETRIES explaining this rationale, and will include your:

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>

in the next version.

Thanks,
Guopeng