We use maxpages from read_swap_header() to initialize swap_info_struct,
however the maxpages might be reduced in setup_swap_extents() and the
si->max is assigned with the reduced maxpages from the
setup_swap_extents().
Obviously, this could lead to memory waste as we allocated memory based on
larger maxpages, besides, this could lead to a potensial deadloop as
following:
1) When calling setup_clusters() with larger maxpages, unavailable pages
within range [si->max, larger maxpages) are not accounted with
inc_cluster_info_page(). As a result, these pages are assumed available
but can not be allocated. The cluster contains these pages can be moved
to frag_clusters list after it's all available pages were allocated.
2) When the cluster mentioned in 1) is the only cluster in frag_clusters
list, cluster_alloc_swap_entry() assume order 0 allocation will never
failed and will enter a deadloop by keep trying to allocate page from the
only cluster in frag_clusters which contains no actually available page.
Call setup_swap_extents() to get the final maxpages before swap_info_struct
initialization to fix the issue.
Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/swapfile.c | 47 ++++++++++++++++++++---------------------------
1 file changed, 20 insertions(+), 27 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 75b69213c2e7..a82f4ebefca3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
return maxpages;
}
-static int setup_swap_map_and_extents(struct swap_info_struct *si,
- union swap_header *swap_header,
- unsigned char *swap_map,
- unsigned long maxpages,
- sector_t *span)
+static int setup_swap_map(struct swap_info_struct *si,
+ union swap_header *swap_header,
+ unsigned char *swap_map,
+ unsigned long maxpages)
{
- unsigned int nr_good_pages;
unsigned long i;
- int nr_extents;
-
- nr_good_pages = maxpages - 1; /* omit header page */
+ swap_map[0] = SWAP_MAP_BAD; /* omit header page */
for (i = 0; i < swap_header->info.nr_badpages; i++) {
unsigned int page_nr = swap_header->info.badpages[i];
if (page_nr == 0 || page_nr > swap_header->info.last_page)
return -EINVAL;
if (page_nr < maxpages) {
swap_map[page_nr] = SWAP_MAP_BAD;
- nr_good_pages--;
+ si->pages--;
}
}
- if (nr_good_pages) {
- swap_map[0] = SWAP_MAP_BAD;
- si->max = maxpages;
- si->pages = nr_good_pages;
- nr_extents = setup_swap_extents(si, span);
- if (nr_extents < 0)
- return nr_extents;
- nr_good_pages = si->pages;
- }
- if (!nr_good_pages) {
+ if (!si->pages) {
pr_warn("Empty swap-file\n");
return -EINVAL;
}
- return nr_extents;
+ return 0;
}
#define SWAP_CLUSTER_INFO_COLS \
@@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
* Mark unusable pages as unavailable. The clusters aren't
* marked free yet, so no list operations are involved yet.
*
- * See setup_swap_map_and_extents(): header page, bad pages,
+ * See setup_swap_map(): header page, bad pages,
* and the EOF part of the last cluster.
*/
inc_cluster_info_page(si, cluster_info, 0);
@@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
goto bad_swap_unlock_inode;
}
+ si->max = maxpages;
+ si->pages = maxpages - 1;
+ nr_extents = setup_swap_extents(si, &span);
+ if (nr_extents < 0) {
+ error = nr_extents;
+ goto bad_swap_unlock_inode;
+ }
+ maxpages = si->max;
+
/* OK, set up the swap map and apply the bad block list */
swap_map = vzalloc(maxpages);
if (!swap_map) {
@@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
if (error)
goto bad_swap_unlock_inode;
- nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map,
- maxpages, &span);
- if (unlikely(nr_extents < 0)) {
- error = nr_extents;
+ error = setup_swap_map(si, swap_header, swap_map, maxpages);
+ if (error)
goto bad_swap_unlock_inode;
- }
/*
* Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
--
2.30.0
On 05/22/25 at 08:25pm, Kemeng Shi wrote:
> We use maxpages from read_swap_header() to initialize swap_info_struct,
> however the maxpages might be reduced in setup_swap_extents() and the
> si->max is assigned with the reduced maxpages from the
> setup_swap_extents().
> Obviously, this could lead to memory waste as we allocated memory based on
> larger maxpages, besides, this could lead to a potensial deadloop as
^ typo, potential
> following:
> 1) When calling setup_clusters() with larger maxpages, unavailable pages
> within range [si->max, larger maxpages) are not accounted with
> inc_cluster_info_page(). As a result, these pages are assumed available
> but can not be allocated. The cluster contains these pages can be moved
> to frag_clusters list after it's all available pages were allocated.
> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters
> list, cluster_alloc_swap_entry() assume order 0 allocation will never
> failed and will enter a deadloop by keep trying to allocate page from the
> only cluster in frag_clusters which contains no actually available page.
>
> Call setup_swap_extents() to get the final maxpages before swap_info_struct
> initialization to fix the issue.
>
> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned")
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/swapfile.c | 47 ++++++++++++++++++++---------------------------
> 1 file changed, 20 insertions(+), 27 deletions(-)
Reviedwed-by: Baoquan He <bhe@redhat.com>
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 75b69213c2e7..a82f4ebefca3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
> return maxpages;
> }
>
> -static int setup_swap_map_and_extents(struct swap_info_struct *si,
> - union swap_header *swap_header,
> - unsigned char *swap_map,
> - unsigned long maxpages,
> - sector_t *span)
> +static int setup_swap_map(struct swap_info_struct *si,
> + union swap_header *swap_header,
> + unsigned char *swap_map,
> + unsigned long maxpages)
> {
> - unsigned int nr_good_pages;
> unsigned long i;
> - int nr_extents;
> -
> - nr_good_pages = maxpages - 1; /* omit header page */
>
> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */
> for (i = 0; i < swap_header->info.nr_badpages; i++) {
> unsigned int page_nr = swap_header->info.badpages[i];
> if (page_nr == 0 || page_nr > swap_header->info.last_page)
> return -EINVAL;
> if (page_nr < maxpages) {
> swap_map[page_nr] = SWAP_MAP_BAD;
> - nr_good_pages--;
> + si->pages--;
> }
> }
>
> - if (nr_good_pages) {
> - swap_map[0] = SWAP_MAP_BAD;
> - si->max = maxpages;
> - si->pages = nr_good_pages;
> - nr_extents = setup_swap_extents(si, span);
> - if (nr_extents < 0)
> - return nr_extents;
> - nr_good_pages = si->pages;
> - }
> - if (!nr_good_pages) {
> + if (!si->pages) {
> pr_warn("Empty swap-file\n");
> return -EINVAL;
> }
>
> - return nr_extents;
> + return 0;
> }
>
> #define SWAP_CLUSTER_INFO_COLS \
> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
> * Mark unusable pages as unavailable. The clusters aren't
> * marked free yet, so no list operations are involved yet.
> *
> - * See setup_swap_map_and_extents(): header page, bad pages,
> + * See setup_swap_map(): header page, bad pages,
> * and the EOF part of the last cluster.
> */
> inc_cluster_info_page(si, cluster_info, 0);
> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> goto bad_swap_unlock_inode;
> }
>
> + si->max = maxpages;
> + si->pages = maxpages - 1;
> + nr_extents = setup_swap_extents(si, &span);
> + if (nr_extents < 0) {
> + error = nr_extents;
> + goto bad_swap_unlock_inode;
> + }
> + maxpages = si->max;
> +
> /* OK, set up the swap map and apply the bad block list */
> swap_map = vzalloc(maxpages);
> if (!swap_map) {
> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> if (error)
> goto bad_swap_unlock_inode;
>
> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map,
> - maxpages, &span);
> - if (unlikely(nr_extents < 0)) {
> - error = nr_extents;
> + error = setup_swap_map(si, swap_header, swap_map, maxpages);
> + if (error)
> goto bad_swap_unlock_inode;
> - }
>
> /*
> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
> --
> 2.30.0
>
on 5/30/2025 10:50 AM, Baoquan He wrote:
> On 05/22/25 at 08:25pm, Kemeng Shi wrote:
>> We use maxpages from read_swap_header() to initialize swap_info_struct,
>> however the maxpages might be reduced in setup_swap_extents() and the
>> si->max is assigned with the reduced maxpages from the
>> setup_swap_extents().
>> Obviously, this could lead to memory waste as we allocated memory based on
>> larger maxpages, besides, this could lead to a potensial deadloop as
> ^ typo, potential
Thanks, will fix this in next version.
>> following:
>> 1) When calling setup_clusters() with larger maxpages, unavailable pages
>> within range [si->max, larger maxpages) are not accounted with
>> inc_cluster_info_page(). As a result, these pages are assumed available
>> but can not be allocated. The cluster contains these pages can be moved
>> to frag_clusters list after it's all available pages were allocated.
>> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters
>> list, cluster_alloc_swap_entry() assume order 0 allocation will never
>> failed and will enter a deadloop by keep trying to allocate page from the
>> only cluster in frag_clusters which contains no actually available page.
>>
>> Call setup_swap_extents() to get the final maxpages before swap_info_struct
>> initialization to fix the issue.
>>
>> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned")
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> mm/swapfile.c | 47 ++++++++++++++++++++---------------------------
>> 1 file changed, 20 insertions(+), 27 deletions(-)
>
> Reviedwed-by: Baoquan He <bhe@redhat.com>
>
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 75b69213c2e7..a82f4ebefca3 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
>> return maxpages;
>> }
>>
>> -static int setup_swap_map_and_extents(struct swap_info_struct *si,
>> - union swap_header *swap_header,
>> - unsigned char *swap_map,
>> - unsigned long maxpages,
>> - sector_t *span)
>> +static int setup_swap_map(struct swap_info_struct *si,
>> + union swap_header *swap_header,
>> + unsigned char *swap_map,
>> + unsigned long maxpages)
>> {
>> - unsigned int nr_good_pages;
>> unsigned long i;
>> - int nr_extents;
>> -
>> - nr_good_pages = maxpages - 1; /* omit header page */
>>
>> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */
>> for (i = 0; i < swap_header->info.nr_badpages; i++) {
>> unsigned int page_nr = swap_header->info.badpages[i];
>> if (page_nr == 0 || page_nr > swap_header->info.last_page)
>> return -EINVAL;
>> if (page_nr < maxpages) {
>> swap_map[page_nr] = SWAP_MAP_BAD;
>> - nr_good_pages--;
>> + si->pages--;
>> }
>> }
>>
>> - if (nr_good_pages) {
>> - swap_map[0] = SWAP_MAP_BAD;
>> - si->max = maxpages;
>> - si->pages = nr_good_pages;
>> - nr_extents = setup_swap_extents(si, span);
>> - if (nr_extents < 0)
>> - return nr_extents;
>> - nr_good_pages = si->pages;
>> - }
>> - if (!nr_good_pages) {
>> + if (!si->pages) {
>> pr_warn("Empty swap-file\n");
>> return -EINVAL;
>> }
>>
>> - return nr_extents;
>> + return 0;
>> }
>>
>> #define SWAP_CLUSTER_INFO_COLS \
>> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
>> * Mark unusable pages as unavailable. The clusters aren't
>> * marked free yet, so no list operations are involved yet.
>> *
>> - * See setup_swap_map_and_extents(): header page, bad pages,
>> + * See setup_swap_map(): header page, bad pages,
>> * and the EOF part of the last cluster.
>> */
>> inc_cluster_info_page(si, cluster_info, 0);
>> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>> goto bad_swap_unlock_inode;
>> }
>>
>> + si->max = maxpages;
>> + si->pages = maxpages - 1;
>> + nr_extents = setup_swap_extents(si, &span);
>> + if (nr_extents < 0) {
>> + error = nr_extents;
>> + goto bad_swap_unlock_inode;
>> + }
>> + maxpages = si->max;
>> +
>> /* OK, set up the swap map and apply the bad block list */
>> swap_map = vzalloc(maxpages);
>> if (!swap_map) {
>> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>> if (error)
>> goto bad_swap_unlock_inode;
>>
>> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map,
>> - maxpages, &span);
>> - if (unlikely(nr_extents < 0)) {
>> - error = nr_extents;
>> + error = setup_swap_map(si, swap_header, swap_map, maxpages);
>> + if (error)
>> goto bad_swap_unlock_inode;
>> - }
>>
>> /*
>> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
>> --
>> 2.30.0
>>
>
On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>
> We use maxpages from read_swap_header() to initialize swap_info_struct,
> however the maxpages might be reduced in setup_swap_extents() and the
> si->max is assigned with the reduced maxpages from the
> setup_swap_extents().
>
> Obviously, this could lead to memory waste as we allocated memory based on
> larger maxpages, besides, this could lead to a potensial deadloop as
> following:
> 1) When calling setup_clusters() with larger maxpages, unavailable pages
> within range [si->max, larger maxpages) are not accounted with
> inc_cluster_info_page(). As a result, these pages are assumed available
> but can not be allocated. The cluster contains these pages can be moved
> to frag_clusters list after it's all available pages were allocated.
> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters
> list, cluster_alloc_swap_entry() assume order 0 allocation will never
> failed and will enter a deadloop by keep trying to allocate page from the
> only cluster in frag_clusters which contains no actually available page.
>
> Call setup_swap_extents() to get the final maxpages before swap_info_struct
> initialization to fix the issue.
>
> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned")
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/swapfile.c | 47 ++++++++++++++++++++---------------------------
> 1 file changed, 20 insertions(+), 27 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 75b69213c2e7..a82f4ebefca3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
> return maxpages;
> }
>
> -static int setup_swap_map_and_extents(struct swap_info_struct *si,
> - union swap_header *swap_header,
> - unsigned char *swap_map,
> - unsigned long maxpages,
> - sector_t *span)
> +static int setup_swap_map(struct swap_info_struct *si,
> + union swap_header *swap_header,
> + unsigned char *swap_map,
> + unsigned long maxpages)
> {
> - unsigned int nr_good_pages;
> unsigned long i;
> - int nr_extents;
> -
> - nr_good_pages = maxpages - 1; /* omit header page */
>
> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */
> for (i = 0; i < swap_header->info.nr_badpages; i++) {
> unsigned int page_nr = swap_header->info.badpages[i];
> if (page_nr == 0 || page_nr > swap_header->info.last_page)
> return -EINVAL;
> if (page_nr < maxpages) {
> swap_map[page_nr] = SWAP_MAP_BAD;
> - nr_good_pages--;
> + si->pages--;
> }
> }
>
> - if (nr_good_pages) {
> - swap_map[0] = SWAP_MAP_BAD;
> - si->max = maxpages;
> - si->pages = nr_good_pages;
> - nr_extents = setup_swap_extents(si, span);
> - if (nr_extents < 0)
> - return nr_extents;
> - nr_good_pages = si->pages;
> - }
> - if (!nr_good_pages) {
> + if (!si->pages) {
> pr_warn("Empty swap-file\n");
> return -EINVAL;
> }
>
>
> - return nr_extents;
> + return 0;
> }
>
> #define SWAP_CLUSTER_INFO_COLS \
> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
> * Mark unusable pages as unavailable. The clusters aren't
> * marked free yet, so no list operations are involved yet.
> *
> - * See setup_swap_map_and_extents(): header page, bad pages,
> + * See setup_swap_map(): header page, bad pages,
> * and the EOF part of the last cluster.
> */
> inc_cluster_info_page(si, cluster_info, 0);
> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> goto bad_swap_unlock_inode;
> }
>
> + si->max = maxpages;
> + si->pages = maxpages - 1;
> + nr_extents = setup_swap_extents(si, &span);
> + if (nr_extents < 0) {
> + error = nr_extents;
> + goto bad_swap_unlock_inode;
> + }
> + maxpages = si->max;
There seems to be a trivial problem here, previously the si->pages
will be seen by swap_activate after bad blocks have been counted and
si->pages means the actual available slots. But now si->pages will be
seen by swap_active as `maxpages - 1`.
One current side effect now is the span value will not be updated
properly so the pr_info in swap on may print a larger value, if the
swap header contains badblocks and swapfile is on nfs/cifs.
This should not be a problem but it's better to mention or add
comments about it.
And I think it's better to add a sanity check here to check if
si->pages still equal to si->max - 1, setup_swap_map_and_extents /
setup_swap_map assumes the header section was already counted. This
also helps indicate the setup_swap_extents may shrink and modify these
two values.
BTW, I was thinking that we should get rid of the whole extents design
after the swap table series is ready, so mTHP allocation will be
usable for swap over fs too.
> /* OK, set up the swap map and apply the bad block list */
> swap_map = vzalloc(maxpages);
> if (!swap_map) {
> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> if (error)
> goto bad_swap_unlock_inode;
>
> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map,
> - maxpages, &span);
> - if (unlikely(nr_extents < 0)) {
> - error = nr_extents;
> + error = setup_swap_map(si, swap_header, swap_map, maxpages);
> + if (error)
> goto bad_swap_unlock_inode;
> - }
>
> /*
> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
> --
> 2.30.0
>
Other than that:
Reviewed-by: Kairui Song <kasong@tencent.com>
on 5/26/2025 1:08 AM, Kairui Song wrote:
> On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>>
>> We use maxpages from read_swap_header() to initialize swap_info_struct,
>> however the maxpages might be reduced in setup_swap_extents() and the
>> si->max is assigned with the reduced maxpages from the
>> setup_swap_extents().
>>
>> Obviously, this could lead to memory waste as we allocated memory based on
>> larger maxpages, besides, this could lead to a potensial deadloop as
>> following:
>> 1) When calling setup_clusters() with larger maxpages, unavailable pages
>> within range [si->max, larger maxpages) are not accounted with
>> inc_cluster_info_page(). As a result, these pages are assumed available
>> but can not be allocated. The cluster contains these pages can be moved
>> to frag_clusters list after it's all available pages were allocated.
>> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters
>> list, cluster_alloc_swap_entry() assume order 0 allocation will never
>> failed and will enter a deadloop by keep trying to allocate page from the
>> only cluster in frag_clusters which contains no actually available page.
>>
>> Call setup_swap_extents() to get the final maxpages before swap_info_struct
>> initialization to fix the issue.
>>
>> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned")
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> mm/swapfile.c | 47 ++++++++++++++++++++---------------------------
>> 1 file changed, 20 insertions(+), 27 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 75b69213c2e7..a82f4ebefca3 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si,
>> return maxpages;
>> }
>>
>> -static int setup_swap_map_and_extents(struct swap_info_struct *si,
>> - union swap_header *swap_header,
>> - unsigned char *swap_map,
>> - unsigned long maxpages,
>> - sector_t *span)
>> +static int setup_swap_map(struct swap_info_struct *si,
>> + union swap_header *swap_header,
>> + unsigned char *swap_map,
>> + unsigned long maxpages)
>> {
>> - unsigned int nr_good_pages;
>> unsigned long i;
>> - int nr_extents;
>> -
>> - nr_good_pages = maxpages - 1; /* omit header page */
>>
>> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */
>> for (i = 0; i < swap_header->info.nr_badpages; i++) {
>> unsigned int page_nr = swap_header->info.badpages[i];
>> if (page_nr == 0 || page_nr > swap_header->info.last_page)
>> return -EINVAL;
>> if (page_nr < maxpages) {
>> swap_map[page_nr] = SWAP_MAP_BAD;
>> - nr_good_pages--;
>> + si->pages--;
>> }
>> }
>>
>> - if (nr_good_pages) {
>> - swap_map[0] = SWAP_MAP_BAD;
>> - si->max = maxpages;
>> - si->pages = nr_good_pages;
>> - nr_extents = setup_swap_extents(si, span);
>> - if (nr_extents < 0)
>> - return nr_extents;
>> - nr_good_pages = si->pages;
>> - }
>> - if (!nr_good_pages) {
>> + if (!si->pages) {
>> pr_warn("Empty swap-file\n");
>> return -EINVAL;
>> }
>>
>>
>> - return nr_extents;
>> + return 0;
>> }
>>
>> #define SWAP_CLUSTER_INFO_COLS \
>> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si,
>> * Mark unusable pages as unavailable. The clusters aren't
>> * marked free yet, so no list operations are involved yet.
>> *
>> - * See setup_swap_map_and_extents(): header page, bad pages,
>> + * See setup_swap_map(): header page, bad pages,
>> * and the EOF part of the last cluster.
>> */
>> inc_cluster_info_page(si, cluster_info, 0);
>> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>> goto bad_swap_unlock_inode;
>> }
>>
>> + si->max = maxpages;
>> + si->pages = maxpages - 1;
>> + nr_extents = setup_swap_extents(si, &span);
>> + if (nr_extents < 0) {
>> + error = nr_extents;
>> + goto bad_swap_unlock_inode;
>> + }
>> + maxpages = si->max;
>
Hello,
> There seems to be a trivial problem here, previously the si->pages
> will be seen by swap_activate after bad blocks have been counted and
> si->pages means the actual available slots. But now si->pages will be
> seen by swap_active as `maxpages - 1`.
>
> One current side effect now is the span value will not be updated
> properly so the pr_info in swap on may print a larger value, if the
> swap header contains badblocks and swapfile is on nfs/cifs.
Thanks for point this out. But I think the larger value is actually
correct result.
In summary, there are two kinds of swapfile_activate operations.
1. Filesystem style: Treat all blocks logical continuity and find
useable physical extents in logical range. In this way, si->pages
will be actual useable physical blocks and span will be "1 +
highest_block - lowest_block".
2. Block device style: Treat all blocks physically continue and
only one single extent is added. In this way, si->pages will be
si->max and span will be "si->pages - 1".
Actually, si->pages and si->max is only used in block device style
and span value is set with si->pages. As a result, span value in
block device style will become a larger value as you mentioned.
I think larger value is correct based on:
1. Span value in filesystem style is "1 + highest_block -
lowest_block" which is the range cover all possible phisical blocks
including the badblocks.
2. For block device style, si->pages is the actual useable block
number and is already in pr_info. The orignal span value before
this patch is also refer to useable block number which is redundant
in pr_info.
>
> This should not be a problem but it's better to mention or add
> comments about it
I'd like to mention this change as a fix in changelog in next version.
>
> And I think it's better to add a sanity check here to check if
> si->pages still equal to si->max - 1, setup_swap_map_and_extents /
> setup_swap_map assumes the header section was already counted. This
> also helps indicate the setup_swap_extents may shrink and modify these
> two values.
Sure, will add this in next version.
>
> BTW, I was thinking that we should get rid of the whole extents design
> after the swap table series is ready, so mTHP allocation will be
> usable for swap over fs too.
I also noticed this limitation but have not taken a deep look. Look
forward to your solution in future.
>
>> /* OK, set up the swap map and apply the bad block list */
>> swap_map = vzalloc(maxpages);
>> if (!swap_map) {
>> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>> if (error)
>> goto bad_swap_unlock_inode;
>>
>> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map,
>> - maxpages, &span);
>> - if (unlikely(nr_extents < 0)) {
>> - error = nr_extents;
>> + error = setup_swap_map(si, swap_header, swap_map, maxpages);
>> + if (error)
>> goto bad_swap_unlock_inode;
>> - }
>>
>> /*
>> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might
>> --
>> 2.30.0
>>
>
> Other than that:
>
> Reviewed-by: Kairui Song <kasong@tencent.com>
>
On Wed, 11 Jun 2025 15:54:21 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > > on 5/26/2025 1:08 AM, Kairui Song wrote: Nearly two months! > Sure, will add this in next version. Do we actually need a new version? Having rescanned the v1 review I'm inclined to merge this series as-is?
on 7/18/2025 7:21 AM, Andrew Morton wrote: > On Wed, 11 Jun 2025 15:54:21 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > >> >> >> on 5/26/2025 1:08 AM, Kairui Song wrote: > > Nearly two months! > >> Sure, will add this in next version. > > Do we actually need a new version? Having rescanned the v1 review I'm > inclined to merge this series as-is? > > So sorry for the late. I will write and send a v2 version soon.
© 2016 - 2025 Red Hat, Inc.