From nobody Thu Apr 2 11:26:28 2026 Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42E123F2113 for ; Fri, 20 Mar 2026 19:28:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774034894; cv=none; b=e1VZqriw9yevIrUVpIXrXAo0okhh4mJnDTmFjJ5j4ekFJJMS+dVW+/ewT1acdgXgZ7sj2n+ZSGBqCZa/Mg3pWM6YOaB4HPVzYs9W1fpGL7E/cVAMWQWqYwYJtzEuLO1hbjfu6QarNSEhNWckEhfzYAZTz5eFqoQqTWU27GMM6f0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774034894; c=relaxed/simple; bh=fLrqs/MUnUNB7tDySRKUurHQIx3PY5X8jg3B7iWPLHY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tysP4T4BWpENFqt7U4b6TLSz+aPJ2HgXGu7WePkmPVYdr+kZdD06Yvt2I+vO9J+AT9FoJ4fzsFVWyodlII9uJT/l1SWaNldttamE/WT3ZOBkC4a/FgOXpCrV3RFhfVVvCe1pPFi1UOg4faVLEA8+5Pw9ouvZxxclRjwRtiCBhL8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EhIUErbB; arc=none smtp.client-ip=209.85.160.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EhIUErbB" Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-40f1a1f77a6so1696751fac.2 for ; Fri, 20 Mar 2026 12:28:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774034887; x=1774639687; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=03afpDvtGuNWbxO8rXYZMIlB93FyoeHq7W5GxGEi1h8=; b=EhIUErbBg+AFHWS0U0U491e3xkKoMi21Qb45PHSr4FWG7KPKO9tlz7t+DJoF4t4eje a27LfUQVkd9jIoPyhzgAXvOj9bqfiltpWO6gKax8ACMIrXKW/FYa9z5D3dokcKiE1+36 hr8C9u2JQeKj7WXJ4ambJsyhK+yoTcOU3HKxmRUe4tvDHqfItRNB57hg0fXwLg1aSzby aWNE9JF+ra2ejV+vY4Nza6ZSRscU0Ajg9v+hKZebXzjq67MjK0FJ1k8BL2UpI3mhG6hE KLb9f6UQkj4TA6zPdE2U2W/Iv33Dgc6G5rBfrXZpwJL9bdJRAAJwGJWHkTCoHLnaGt8e tzkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774034887; x=1774639687; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=03afpDvtGuNWbxO8rXYZMIlB93FyoeHq7W5GxGEi1h8=; b=GJzwFvjssNGeJAMFKgabKTHSUeDaPAOl+xIeh8PJJ4cnCQvUcvHz5O4aNk3dP7Mvnl zkC28Zt+0guxhcBpXi9jO4LodZOndU2C22V5lya6PGn1yAjHDIQkcW2aVV8CVSJRXm5t ZOZMTwdbrFI6fmLKiEn77japMrjgJ1Z0J3/4a2Hj/Jd0wdYCuUx3KKb0V43I06gwvn5x SqAIkiWDH4u9V80c7lv8MUI5eTikS5VAc4ckYxjfpoV/3JUww7Tprk1PCrIEunjq92um WZRFYmkz4zJkMtM5ggpHwP0jizEXntTGU2crVJUJkDC/4V5kWibrJbTIltvF84fa1Ndb MKrg== X-Forwarded-Encrypted: i=1; AJvYcCU8FNLFCWSRhTxsPMHzaN2FdNtRBdmqECpdM7mK+PQGiVq37BpJJmqMIU8TjB/AccLSpxriq8v6shT/mbw=@vger.kernel.org X-Gm-Message-State: AOJu0YzNctUamWI9o9/Y+ybs1evNlmurzIcixVh5xcXZFZEDLqPc4/qh /5MmfCVLEHFXkw+mO6mdLSM2ioWRWKG8aUJrk+KByEZDztO9iKHXvrl8 X-Gm-Gg: ATEYQzx+qhEZF/Ln5E1jV84D/FBGm7zo9g/dlvKPd8X/ZHUo6s76yM3yj6f2j8EzNLK q6F3FMX7GpJP8J0YKsiGqMKxbmEb0u3FWIXVdaD9pLUbyv/7XnAA5J6j+8+5FBSznaeFP3E9t8x qo4JOcffLsggivK7zCEJD5zz/1L3CBuI1ei+p3U2JG6YML6rfeCMn5exf5g80PmE4tRMxYY97N/ 9aTXavVcH44VNRUMOrrYb+gvlBAgO/BzwuEQoknku0goOnGKpVACZ5VsOhZ1Aotx00iE7aeJs5S 137R/wfAcf6zWzuvr+60L54Z+sKsT0yIoWRdj0ZtrQgNr9kCYryweAZKWsT5lZXTz8a34qo39Lj k66g7XUyfX+cOg0373rRd0Kk1F/MFQP+aFy1eq6WLx6qWX9twE+/g+KV5XeNmSgza390KTBkVAq 8xq2wbot0n2fNkxGcQWEUHiQ6cLfMgelNLonoDq4aZTK/ZVA== X-Received: by 2002:a05:6871:1cf:b0:409:54ac:12ae with SMTP id 586e51a60fabf-41c10f6ae20mr2677223fac.8.1774034886900; Fri, 20 Mar 2026 12:28:06 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:41::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-41c148a5f99sm3030503fac.2.2026.03.20.12.28.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Mar 2026 12:28:06 -0700 (PDT) From: Nhat Pham To: kasong@tencent.com Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org, apopple@nvidia.com, axelrasmussen@google.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, bhe@redhat.com, byungchul@sk.com, cgroups@vger.kernel.org, chengming.zhou@linux.dev, chrisl@kernel.org, corbet@lwn.net, david@kernel.org, dev.jain@arm.com, gourry@gourry.net, hannes@cmpxchg.org, hughd@google.com, jannh@google.com, joshua.hahnjy@gmail.com, lance.yang@linux.dev, lenb@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-pm@vger.kernel.org, lorenzo.stoakes@oracle.com, matthew.brost@intel.com, mhocko@suse.com, muchun.song@linux.dev, npache@redhat.com, nphamcs@gmail.com, pavel@kernel.org, peterx@redhat.com, peterz@infradead.org, pfalcato@suse.de, rafael@kernel.org, rakie.kim@sk.com, roman.gushchin@linux.dev, rppt@kernel.org, ryan.roberts@arm.com, shakeel.butt@linux.dev, shikemeng@huaweicloud.com, surenb@google.com, tglx@kernel.org, vbabka@suse.cz, weixugc@google.com, ying.huang@linux.alibaba.com, yosry.ahmed@linux.dev, yuanchu@google.com, zhengqi.arch@bytedance.com, ziy@nvidia.com, kernel-team@meta.com, riel@surriel.com Subject: [PATCH v5 20/21] swapfile: replace the swap map with bitmaps Date: Fri, 20 Mar 2026 12:27:34 -0700 Message-ID: <20260320192735.748051-21-nphamcs@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260320192735.748051-1-nphamcs@gmail.com> References: <20260320192735.748051-1-nphamcs@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that we have moved the swap count state to virtual swap layer, each swap map entry only has 3 possible states: free, allocated, and bad. Replace the swap map with 2 bitmaps (one for allocated state and one for bad state), saving 6 bits per swap entry. Signed-off-by: Nhat Pham --- include/linux/swap.h | 3 +- mm/swapfile.c | 81 +++++++++++++++++++++++--------------------- 2 files changed, 44 insertions(+), 40 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 21e528d8d3480..3c789149996c5 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -259,7 +259,8 @@ struct swap_info_struct { struct plist_node list; /* entry in swap_active_head */ signed char type; /* strange name for an index */ unsigned int max; /* extent of the swap_map */ - unsigned char *swap_map; /* vmalloc'ed array of usage counts */ + unsigned long *swap_map; /* bitmap for allocated state */ + unsigned long *bad_map; /* bitmap for bad state */ struct swap_cluster_info *cluster_info; /* cluster info. Only for SSD */ struct list_head free_clusters; /* free clusters list */ struct list_head full_clusters; /* full clusters list */ diff --git a/mm/swapfile.c b/mm/swapfile.c index b553652125d11..3e2bfcf1aa789 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -760,25 +760,19 @@ static bool cluster_reclaim_range(struct swap_info_st= ruct *si, struct swap_cluster_info *ci, unsigned long start, unsigned long end) { - unsigned char *map =3D si->swap_map; unsigned long offset =3D start; int nr_reclaim; =20 spin_unlock(&ci->lock); do { - switch (READ_ONCE(map[offset])) { - case 0: + if (!test_bit(offset, si->swap_map)) { offset++; - break; - case SWAP_MAP_ALLOCATED: + } else { nr_reclaim =3D __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); if (nr_reclaim > 0) offset +=3D nr_reclaim; else goto out; - break; - default: - goto out; } } while (offset < end); out: @@ -787,11 +781,7 @@ static bool cluster_reclaim_range(struct swap_info_str= uct *si, * Recheck the range no matter reclaim succeeded or not, the slot * could have been be freed while we are not holding the lock. */ - for (offset =3D start; offset < end; offset++) - if (READ_ONCE(map[offset])) - return false; - - return true; + return find_next_bit(si->swap_map, end, start) >=3D end; } =20 static bool cluster_scan_range(struct swap_info_struct *si, @@ -800,15 +790,16 @@ static bool cluster_scan_range(struct swap_info_struc= t *si, bool *need_reclaim) { unsigned long offset, end =3D start + nr_pages; - unsigned char *map =3D si->swap_map; - unsigned char count; =20 if (cluster_is_empty(ci)) return true; =20 for (offset =3D start; offset < end; offset++) { - count =3D READ_ONCE(map[offset]); - if (!count) + /* Bad slots cannot be used for allocation */ + if (test_bit(offset, si->bad_map)) + return false; + + if (!test_bit(offset, si->swap_map)) continue; =20 if (swap_cache_only(si, offset)) { @@ -841,7 +832,7 @@ static bool cluster_alloc_range(struct swap_info_struct= *si, struct swap_cluster if (cluster_is_empty(ci)) ci->order =3D order; =20 - memset(si->swap_map + start, usage, nr_pages); + bitmap_set(si->swap_map, start, nr_pages); swap_range_alloc(si, nr_pages); ci->count +=3D nr_pages; =20 @@ -1407,7 +1398,7 @@ static struct swap_info_struct *_swap_info_get(swp_sl= ot_t slot) offset =3D swp_slot_offset(slot); if (offset >=3D si->max) goto bad_offset; - if (data_race(!si->swap_map[swp_slot_offset(slot)])) + if (data_race(!test_bit(offset, si->swap_map))) goto bad_free; return si; =20 @@ -1521,8 +1512,7 @@ static void swap_slots_free(struct swap_info_struct *= si, swp_slot_t slot, unsigned int nr_pages) { unsigned long offset =3D swp_slot_offset(slot); - unsigned char *map =3D si->swap_map + offset; - unsigned char *map_end =3D map + nr_pages; + unsigned long end =3D offset + nr_pages; =20 /* It should never free entries across different clusters */ VM_BUG_ON(ci !=3D __swap_offset_to_cluster(si, offset + nr_pages - 1)); @@ -1530,10 +1520,8 @@ static void swap_slots_free(struct swap_info_struct = *si, VM_BUG_ON(ci->count < nr_pages); =20 ci->count -=3D nr_pages; - do { - VM_BUG_ON(!swap_is_last_ref(*map)); - *map =3D 0; - } while (++map < map_end); + VM_BUG_ON(find_next_zero_bit(si->swap_map, end, offset) < end); + bitmap_clear(si->swap_map, offset, nr_pages); =20 swap_range_free(si, offset, nr_pages); =20 @@ -1744,9 +1732,7 @@ unsigned int count_swap_pages(int type, int free) static bool swap_slot_allocated(struct swap_info_struct *si, unsigned long offset) { - unsigned char count =3D READ_ONCE(si->swap_map[offset]); - - return count && swap_count(count) !=3D SWAP_MAP_BAD; + return test_bit(offset, si->swap_map); } =20 /* @@ -2067,7 +2053,7 @@ static int setup_swap_extents(struct swap_info_struct= *sis, sector_t *span) } =20 static void setup_swap_info(struct swap_info_struct *si, int prio, - unsigned char *swap_map, + unsigned long *swap_map, struct swap_cluster_info *cluster_info) { si->prio =3D prio; @@ -2095,7 +2081,7 @@ static void _enable_swap_info(struct swap_info_struct= *si) } =20 static void enable_swap_info(struct swap_info_struct *si, int prio, - unsigned char *swap_map, + unsigned long *swap_map, struct swap_cluster_info *cluster_info) { spin_lock(&swap_lock); @@ -2188,7 +2174,8 @@ static void flush_percpu_swap_cluster(struct swap_inf= o_struct *si) SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) { struct swap_info_struct *p =3D NULL; - unsigned char *swap_map; + unsigned long *swap_map; + unsigned long *bad_map; struct swap_cluster_info *cluster_info; struct file *swap_file, *victim; struct address_space *mapping; @@ -2283,6 +2270,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, special= file) p->swap_file =3D NULL; swap_map =3D p->swap_map; p->swap_map =3D NULL; + bad_map =3D p->bad_map; + p->bad_map =3D NULL; maxpages =3D p->max; cluster_info =3D p->cluster_info; p->max =3D 0; @@ -2293,7 +2282,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, special= file) mutex_unlock(&swapon_mutex); kfree(p->global_cluster); p->global_cluster =3D NULL; - vfree(swap_map); + kvfree(swap_map); + kvfree(bad_map); free_cluster_info(cluster_info, maxpages); =20 inode =3D mapping->host; @@ -2641,18 +2631,20 @@ static unsigned long read_swap_header(struct swap_i= nfo_struct *si, =20 static int setup_swap_map(struct swap_info_struct *si, union swap_header *swap_header, - unsigned char *swap_map, + unsigned long *swap_map, + unsigned long *bad_map, unsigned long maxpages) { unsigned long i; =20 - swap_map[0] =3D SWAP_MAP_BAD; /* omit header page */ + set_bit(0, bad_map); /* omit header page */ + for (i =3D 0; i < swap_header->info.nr_badpages; i++) { unsigned int page_nr =3D swap_header->info.badpages[i]; if (page_nr =3D=3D 0 || page_nr > swap_header->info.last_page) return -EINVAL; if (page_nr < maxpages) { - swap_map[page_nr] =3D SWAP_MAP_BAD; + set_bit(page_nr, bad_map); si->pages--; } } @@ -2756,7 +2748,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialf= ile, int, swap_flags) int nr_extents; sector_t span; unsigned long maxpages; - unsigned char *swap_map =3D NULL; + unsigned long *swap_map =3D NULL, *bad_map =3D NULL; struct swap_cluster_info *cluster_info =3D NULL; struct folio *folio =3D NULL; struct inode *inode =3D NULL; @@ -2852,16 +2844,24 @@ SYSCALL_DEFINE2(swapon, const char __user *, specia= lfile, int, swap_flags) maxpages =3D si->max; =20 /* OK, set up the swap map and apply the bad block list */ - swap_map =3D vzalloc(maxpages); + swap_map =3D kvcalloc(BITS_TO_LONGS(maxpages), sizeof(long), GFP_KERNEL); if (!swap_map) { error =3D -ENOMEM; goto bad_swap_unlock_inode; } =20 - error =3D setup_swap_map(si, swap_header, swap_map, maxpages); + bad_map =3D kvcalloc(BITS_TO_LONGS(maxpages), sizeof(long), GFP_KERNEL); + if (!bad_map) { + error =3D -ENOMEM; + goto bad_swap_unlock_inode; + } + + error =3D setup_swap_map(si, swap_header, swap_map, bad_map, maxpages); if (error) goto bad_swap_unlock_inode; =20 + si->bad_map =3D bad_map; + if (si->bdev && bdev_stable_writes(si->bdev)) si->flags |=3D SWP_STABLE_WRITES; =20 @@ -2955,7 +2955,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, special= file, int, swap_flags) si->swap_file =3D NULL; si->flags =3D 0; spin_unlock(&swap_lock); - vfree(swap_map); + if (swap_map) + kvfree(swap_map); + if (bad_map) + kvfree(bad_map); if (cluster_info) free_cluster_info(cluster_info, maxpages); if (inced_nr_rotate_swap) --=20 2.52.0