[PATCH v2] zsmalloc: return -EBUSY for zspage migration lock contention

Hui Zhu posted 1 patch 2 weeks, 4 days ago
There is a newer version of this series
mm/zsmalloc.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
[PATCH v2] zsmalloc: return -EBUSY for zspage migration lock contention
Posted by Hui Zhu 2 weeks, 4 days ago
From: teawater <zhuhui@kylinos.cn>

movable_operations::migrate_page() should return an appropriate error
code for temporary migration failures so the migration core can handle
them correctly.

zs_page_migrate() currently returns -EINVAL when zspage_write_trylock()
fails. That path reflects transient lock contention, not invalid input,
so -EINVAL is clearly wrong.

However, -EAGAIN is also inappropriate here: the zspage's reader-lock
owner may hold the lock for an unbounded duration due to slow
decompression. Since migration retries are bounded by
NR_MAX_MIGRATE_PAGES_RETRY and performed with virtually no delay between
attempts, there is no guarantee the lock will be released in time for a
retry to succeed. -EAGAIN implies "try again soon", which does not hold
in this case.

Return -EBUSY instead, which more accurately conveys that the resource
is occupied and migration cannot proceed at this time.

Changelog:
v2:
According to the comments of Sergey Senozhatsky, change from -EAGAIN to
-EBUSY and add comments.

Signed-off-by: teawater <zhuhui@kylinos.cn>
---
 mm/zsmalloc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2c1430bf8d57..db42ec4ffcfa 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1727,7 +1727,19 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 	if (!zspage_write_trylock(zspage)) {
 		spin_unlock(&class->lock);
 		write_unlock(&pool->lock);
-		return -EINVAL;
+		/*
+		 * Return -EBUSY but not -EAGAIN: the zspage's reader-lock
+		 * owner may hold the lock for an unbounded duration due to a
+		 * slow decompression.
+		 * Since migration retries are bounded by
+		 * NR_MAX_MIGRATE_PAGES_RETRY and performed with virtually no
+		 * delay between attempts, there is no guarantee the lock will
+		 * be released in time for a retry to succeed.
+		 * -EAGAIN implies "try again soon", which does not hold here.
+		 * -EBUSY more accurately conveys "resource is occupied,
+		 * migration cannot proceed".
+		 */
+		return -EBUSY;
 	}
 
 	/* We're committed, tell the world that this is a Zsmalloc page. */
-- 
2.43.0
Re: [PATCH v2] zsmalloc: return -EBUSY for zspage migration lock contention
Posted by Sergey Senozhatsky 2 weeks, 4 days ago
On (26/03/19 11:07), Hui Zhu wrote:
> movable_operations::migrate_page() should return an appropriate error
> code for temporary migration failures so the migration core can handle
> them correctly.
> 
> zs_page_migrate() currently returns -EINVAL when zspage_write_trylock()
> fails. That path reflects transient lock contention, not invalid input,
> so -EINVAL is clearly wrong.
> 
> However, -EAGAIN is also inappropriate here: the zspage's reader-lock
> owner may hold the lock for an unbounded duration due to slow
> decompression. Since migration retries are bounded by
> NR_MAX_MIGRATE_PAGES_RETRY and performed with virtually no delay between
> attempts, there is no guarantee the lock will be released in time for a
> retry to succeed. -EAGAIN implies "try again soon", which does not hold
> in this case.
> 
> Return -EBUSY instead, which more accurately conveys that the resource
> is occupied and migration cannot proceed at this time.
> 
> Changelog:
> v2:
> According to the comments of Sergey Senozhatsky, change from -EAGAIN to
> -EBUSY and add comments.

I assume this Changelog was supposed to be under "---".

> +		/*
> +		 * Return -EBUSY but not -EAGAIN: the zspage's reader-lock
> +		 * owner may hold the lock for an unbounded duration due to a
> +		 * slow decompression.
		^^^ or reader-lock owner preemption

> +		 * Since migration retries are bounded by
> +		 * NR_MAX_MIGRATE_PAGES_RETRY and performed with virtually no
> +		 * delay between attempts, there is no guarantee the lock will
> +		 * be released in time for a retry to succeed.
> +		 * -EAGAIN implies "try again soon", which does not hold here.
> +		 * -EBUSY more accurately conveys "resource is occupied,
> +		 * migration cannot proceed".
> +		 */
> +		return -EBUSY;
>  	}

Otherwise,
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>