From nobody Wed Feb 11 08:12:21 2026 Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C070341AB6 for ; Sun, 8 Feb 2026 21:58:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770587937; cv=none; b=FPRUs9jinKVQ9LHTG55/638rXLr/HL/NIoJ5xfUAyHqnK7Q7z3Ym5MNcLq0pbDVGaKH1upXXsAzLioE2kbuvXvo6ooLWmjY2twr2NI2mIO5FNShkYZR1c3CkNN4uy+GM+eoLBOqb+mybiGCrW1HlAzj8sLDdhbbwbFmo650gZ5M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770587937; c=relaxed/simple; bh=5bC6faUaLGy/PHu5yuTmAWzwC4vvMLZWKbDbaPI2Z88=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ut+ozCy3joIf93GdDMLOAmbhbcLgXy2PT7rLzJHq9C/9/JRFdjrFcZpAFnrlAnvAnjRyceYiP0jbi53+x/rAH1AFwasPS9Kwcqs/98sps3OUhcn0TnvAcB1D1zytO7oh7vj5rGj+09UgO6GHhf/1PDGjzBU3QAW0KL8pqEJ6Phw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XjCEo7po; arc=none smtp.client-ip=209.85.160.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XjCEo7po" Received: by mail-oa1-f48.google.com with SMTP id 586e51a60fabf-4044d3ff57bso835135fac.0 for ; Sun, 08 Feb 2026 13:58:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770587936; x=1771192736; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eTZcPotpU0ERl3GaGoQbZ01AvvkCTOsuVIuaEZ1XAS4=; b=XjCEo7poFH2QJRxUaSgn9Lt6vVWp7Feynki2IRS9ML6vuNM8eC9K1AQdo3Ll1YyWQS hsW9lN2AzYNSyqnk6havaJUbOHPJ5shhcnXSGXpVz7ly7Ll4LoWlHytNSczq8aSLRE0+ c5slcD3y2x7NAaGaCDiAbGzES8ZTMNl0ojh6BAJZgxOJdSCva7JfJZbTSl9rz+vi0fkq D877w1kVO9zLMofOZ6cBZVHgG8TgH3Dvep27GC4VzhLkYiX9YVpmDPlPKdxiSNAEDGTW EO7eTPC3lQVgSUZB2NO96Q5BMjo4Cr4zVfBIrvoUikxU6Kls+pRVjBFzwjRj/twu/ZPf +0Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770587936; x=1771192736; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=eTZcPotpU0ERl3GaGoQbZ01AvvkCTOsuVIuaEZ1XAS4=; b=uKbrKyRuLcNBDE39IYv9OAzJKgGikYHC1vBMLXkk0pKzNB2G0T8sFe2eMXe9CUiQs1 hqUh6RWWZ6l5ZgJsj6wOVwbVeCR/Z8Cqo2jz4XicKEP66jcp8ZWKAV2MJ7eomjEmiuPR zOz9VHz3zVMVUAYmICHVYRNHJQd4Sl9gGH1Mvak8GHm/E31wMbZDBJx7PUVKOT//XERX FOXVs0aBHnPBZSA/MDMewow5i0ADXu2/rqkNqFWBGzx0ONKlyRAsdn/KrpR/6az4FbVQ HDve9fjTJ5Z4os1QzLhOK0oAdA0Knp7aZHzdHi0FQvWiNitBJCydd74d6m88kH1B+/vi +GiQ== X-Forwarded-Encrypted: i=1; AJvYcCVXVABsAUdYjFQFxvkcmXOkAd0ZeSt0P37FnLGPGi2YL3FRXbnKmDLl6mvQ2WlHJI2DA9Ah9VB8HqeW1QU=@vger.kernel.org X-Gm-Message-State: AOJu0Yx63HPRxxeaMQ22P3+8ZA9+Is7LNEUJtgZaNCPWMBkX4CdYdfWu v2ZYNwMuyuIqm4Q3gljppteyO53i80aaWGNpwZ51BkhN84HiUhDWDz52 X-Gm-Gg: AZuq6aKBYkP8hLj3Ot41lYMlzEW98aEGKRNjypRrwqcBVjY0N6/TxH4ruBoUZN3h7V0 yaMnlCE5dhoLcjOVzxStQpThEFU0zPJ3NlSbszjz6CU7UD8DqHx/MkAkL03Fnex/eZp+XzccFqu Piwezws97GzzZBzdIHwcv3gBGMXE3DFOtNHSRT91a0rqL0MhbtiK4rymwXzdzrftlkKQdaicGR4 eEzMaNqSaXMfsU/366ls9Zk43j6qg7qTwj04/4bmwyy0mJGGgccZTWsblhP/9YvL1VQjN3y2DSj UC0jAKJJYagHXSIT5mimif8mfDvdBWfDKi+X8g27xrrGjj1RN22QfwF34Piz3isjsuJ6L5WiFvv 5MZQZDc7qSLXbP3cwVviAjhVMWWVuNoWDz6KoUYNGYzGk7t/F5+LNi5XbZNhtpZoL58NBpVW5xo x5Fo4dn3PSLo/sz74+wAg94+Wk3yl9Z1tw X-Received: by 2002:a05:6820:207:b0:662:c684:f44b with SMTP id 006d021491bc7-66d09abcb4bmr3800043eaf.1.1770587936113; Sun, 08 Feb 2026 13:58:56 -0800 (PST) Received: from localhost ([2a03:2880:10ff:5::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-66d3b2a5aefsm4654492eaf.13.2026.02.08.13.58.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Feb 2026 13:58:54 -0800 (PST) From: Nhat Pham To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com, yosry.ahmed@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, len.brown@intel.com, chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org, huang.ying.caritas@gmail.com, ryan.roberts@arm.com, shikemeng@huaweicloud.com, viro@zeniv.linux.org.uk, baohua@kernel.org, bhe@redhat.com, osalvador@suse.de, lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu, pavel@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-pm@vger.kernel.org, peterx@redhat.com, riel@surriel.com, joshua.hahnjy@gmail.com, npache@redhat.com, gourry@gourry.net, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, rafael@kernel.org, jannh@google.com, pfalcato@suse.de, zhengqi.arch@bytedance.com Subject: [PATCH v3 06/20] mm: swap: add a separate type for physical swap slots Date: Sun, 8 Feb 2026 13:58:19 -0800 Message-ID: <20260208215839.87595-7-nphamcs@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260208215839.87595-1-nphamcs@gmail.com> References: <20260208215839.87595-1-nphamcs@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for swap virtualization, add a new type to represent the physical swap slots of swapfile. This allows us to separates: 1. The logical view of the swap entry (i.e what is stored in page table entries and used to index into the swap cache), represented by the old swp_entry_t type. from: 2. Its physical backing state (i.e the actual backing slot on the swap device), represented by the new swp_slot_t type. The functions that operate at the physical level (i.e on the swp_slot_t types) are also renamed where appropriate (prefixed with swp_slot_* for e.g). Note that we have not made any behavioral change - the mapping between the two types is the identity mapping. In later patches, we shall dynamically allocate a virtual swap slot (of type swp_entry_t) for each swapped out page to store in the page table entry, and associate it with a backing store. A physical swap slot (i.e a slot on a physical swap device) is one of the backing options. Signed-off-by: Nhat Pham --- include/linux/mm_types.h | 16 +++ include/linux/swap.h | 47 ++++-- include/linux/swapops.h | 25 ++++ kernel/power/swap.c | 6 +- mm/internal.h | 10 +- mm/page_io.c | 33 +++-- mm/shmem.c | 19 ++- mm/swap.h | 52 +++---- mm/swap_cgroup.c | 18 +-- mm/swap_state.c | 32 +++-- mm/swapfile.c | 300 ++++++++++++++++++++++----------------- 11 files changed, 352 insertions(+), 206 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 78950eb8926dc..bffde812decc5 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -279,6 +279,13 @@ static __always_inline unsigned long encoded_nr_pages(= struct encoded_page *page) } =20 /* + * Virtual swap slot. + * + * This type is used to represent a virtual swap slot, i.e an identifier of + * a swap entry. This is stored in PTEs that originally refer to the swapp= ed + * out page, and is used to index into various swap architectures (swap ca= che, + * zswap tree, swap cgroup array, etc.). + * * A swap entry has to fit into a "unsigned long", as the entry is hidden * in the "index" field of the swapper address space. */ @@ -286,6 +293,15 @@ typedef struct { unsigned long val; } swp_entry_t; =20 +/* + * Physical swap slot. + * + * This type is used to represent a PAGE_SIZED slot on a swapfile. + */ +typedef struct { + unsigned long val; +} swp_slot_t; + /** * typedef softleaf_t - Describes a page table software leaf entry, abstra= cted * from its architecture-specific encoding. diff --git a/include/linux/swap.h b/include/linux/swap.h index f91a442ac0e82..918b47da55f44 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -241,7 +241,7 @@ enum { * cluster to which it belongs being marked free. Therefore 0 is safe to u= se as * a sentinel to indicate an entry is not valid. */ -#define SWAP_ENTRY_INVALID 0 +#define SWAP_SLOT_INVALID 0 =20 #ifdef CONFIG_THP_SWAP #define SWAP_NR_ORDERS (PMD_ORDER + 1) @@ -442,11 +442,14 @@ static inline unsigned long total_swapcache_pages(voi= d) { return global_node_page_state(NR_SWAPCACHE); } + void free_folio_and_swap_cache(struct folio *folio); void free_pages_and_swap_cache(struct encoded_page **, int); void free_swap_cache(struct folio *folio); =20 /* Physical swap allocator and swap device API (mm/swapfile.c) */ +void swap_slot_free_nr(swp_slot_t slot, int nr_pages); + int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, @@ -468,28 +471,28 @@ static inline long get_nr_swap_pages(void) } =20 void si_swapinfo(struct sysinfo *); -swp_entry_t get_swap_page_of_type(int); +swp_slot_t swap_slot_alloc_of_type(int); int add_swap_count_continuation(swp_entry_t, gfp_t); int swap_type_of(dev_t device, sector_t offset); int find_first_swap(dev_t *device); unsigned int count_swap_pages(int, int); sector_t swapdev_block(int, pgoff_t); struct backing_dev_info; -struct swap_info_struct *get_swap_device(swp_entry_t entry); +struct swap_info_struct *swap_slot_tryget_swap_info(swp_slot_t slot); sector_t swap_folio_sector(struct folio *folio); =20 -static inline void put_swap_device(struct swap_info_struct *si) +static inline void swap_slot_put_swap_info(struct swap_info_struct *si) { percpu_ref_put(&si->users); } =20 #else /* CONFIG_SWAP */ -static inline struct swap_info_struct *get_swap_device(swp_entry_t entry) +static inline struct swap_info_struct *swap_slot_tryget_swap_info(swp_slot= _t slot) { return NULL; } =20 -static inline void put_swap_device(struct swap_info_struct *si) +static inline void swap_slot_put_swap_info(struct swap_info_struct *si) { } =20 @@ -536,7 +539,7 @@ static inline void swap_free_nr(swp_entry_t entry, int = nr_pages) { } =20 -static inline void put_swap_folio(struct folio *folio, swp_entry_t swp) +static inline void put_swap_folio(struct folio *folio, swp_entry_t entry) { } =20 @@ -576,6 +579,7 @@ static inline int add_swap_extent(struct swap_info_stru= ct *sis, { return -EINVAL; } + #endif /* CONFIG_SWAP */ =20 static inline void free_swap_and_cache(swp_entry_t entry) @@ -665,10 +669,35 @@ static inline bool mem_cgroup_swap_full(struct folio = *folio) } #endif =20 +/** + * swp_entry_to_swp_slot - look up the physical swap slot corresponding to= a + * virtual swap slot. + * @entry: the virtual swap slot. + * + * Return: the physical swap slot corresponding to the virtual swap slot. + */ +static inline swp_slot_t swp_entry_to_swp_slot(swp_entry_t entry) +{ + return (swp_slot_t) { entry.val }; +} + +/** + * swp_slot_to_swp_entry - look up the virtual swap slot corresponding to a + * physical swap slot. + * @slot: the physical swap slot. + * + * Return: the virtual swap slot corresponding to the physical swap slot. + */ +static inline swp_entry_t swp_slot_to_swp_entry(swp_slot_t slot) +{ + return (swp_entry_t) { slot.val }; +} + static inline bool tryget_swap_entry(swp_entry_t entry, struct swap_info_struct **sip) { - struct swap_info_struct *si =3D get_swap_device(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + struct swap_info_struct *si =3D swap_slot_tryget_swap_info(slot); =20 if (sip) *sip =3D si; @@ -679,7 +708,7 @@ static inline bool tryget_swap_entry(swp_entry_t entry, static inline void put_swap_entry(swp_entry_t entry, struct swap_info_struct *si) { - put_swap_device(si); + swap_slot_put_swap_info(si); } =20 #endif /* __KERNEL__*/ diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 8cfc966eae48e..9e41c35664a95 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -360,5 +360,30 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) =20 #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ =20 +/* Physical swap slots operations */ + +/* + * Store a swap device type + offset into a swp_slot_t handle. + */ +static inline swp_slot_t swp_slot(unsigned long type, pgoff_t offset) +{ + swp_slot_t ret; + + ret.val =3D (type << SWP_TYPE_SHIFT) | (offset & SWP_OFFSET_MASK); + return ret; +} + +/* Extract the `type' field from a swp_slot_t. */ +static inline unsigned swp_slot_type(swp_slot_t slot) +{ + return (slot.val >> SWP_TYPE_SHIFT); +} + +/* Extract the `offset' field from a swp_slot_t. */ +static inline pgoff_t swp_slot_offset(swp_slot_t slot) +{ + return slot.val & SWP_OFFSET_MASK; +} + #endif /* CONFIG_MMU */ #endif /* _LINUX_SWAPOPS_H */ diff --git a/kernel/power/swap.c b/kernel/power/swap.c index 8050e51828351..0129c5ffa649d 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -174,10 +174,10 @@ sector_t alloc_swapdev_block(int swap) * Allocate a swap page and register that it has been allocated, so that * it can be freed in case of an error. */ - offset =3D swp_offset(get_swap_page_of_type(swap)); + offset =3D swp_slot_offset(swap_slot_alloc_of_type(swap)); if (offset) { if (swsusp_extents_insert(offset)) - swap_free(swp_entry(swap, offset)); + swap_slot_free_nr(swp_slot(swap, offset), 1); else return swapdev_block(swap, offset); } @@ -197,7 +197,7 @@ void free_all_swap_pages(int swap) =20 ext =3D rb_entry(node, struct swsusp_extent, node); rb_erase(node, &swsusp_extents); - swap_free_nr(swp_entry(swap, ext->start), + swap_slot_free_nr(swp_slot(swap, ext->start), ext->end - ext->start + 1); =20 kfree(ext); diff --git a/mm/internal.h b/mm/internal.h index f35dbcf99a86b..e739e8cac5b55 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -334,9 +334,13 @@ unsigned int folio_pte_batch(struct folio *folio, pte_= t *ptep, pte_t pte, */ static inline pte_t pte_move_swp_offset(pte_t pte, long delta) { - const softleaf_t entry =3D softleaf_from_pte(pte); - pte_t new =3D __swp_entry_to_pte(__swp_entry(swp_type(entry), - (swp_offset(entry) + delta))); + softleaf_t entry =3D softleaf_from_pte(pte), new_entry; + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + pte_t new; + + new_entry =3D swp_slot_to_swp_entry(swp_slot(swp_slot_type(slot), + swp_slot_offset(slot) + delta)); + new =3D swp_entry_to_pte(new_entry); =20 if (pte_swp_soft_dirty(pte)) new =3D pte_swp_mksoft_dirty(new); diff --git a/mm/page_io.c b/mm/page_io.c index 3c342db77ce38..0b02bcc85e2a8 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -204,14 +204,17 @@ static bool is_folio_zero_filled(struct folio *folio) static void swap_zeromap_folio_set(struct folio *folio) { struct obj_cgroup *objcg =3D get_obj_cgroup_from_folio(folio); - struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); + struct swap_info_struct *sis =3D + __swap_slot_to_info(swp_entry_to_swp_slot(folio->swap)); int nr_pages =3D folio_nr_pages(folio); swp_entry_t entry; + swp_slot_t slot; unsigned int i; =20 for (i =3D 0; i < folio_nr_pages(folio); i++) { entry =3D page_swap_entry(folio_page(folio, i)); - set_bit(swp_offset(entry), sis->zeromap); + slot =3D swp_entry_to_swp_slot(entry); + set_bit(swp_slot_offset(slot), sis->zeromap); } =20 count_vm_events(SWPOUT_ZERO, nr_pages); @@ -223,13 +226,16 @@ static void swap_zeromap_folio_set(struct folio *foli= o) =20 static void swap_zeromap_folio_clear(struct folio *folio) { - struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); + struct swap_info_struct *sis =3D + __swap_slot_to_info(swp_entry_to_swp_slot(folio->swap)); swp_entry_t entry; + swp_slot_t slot; unsigned int i; =20 for (i =3D 0; i < folio_nr_pages(folio); i++) { entry =3D page_swap_entry(folio_page(folio, i)); - clear_bit(swp_offset(entry), sis->zeromap); + slot =3D swp_entry_to_swp_slot(entry); + clear_bit(swp_slot_offset(slot), sis->zeromap); } } =20 @@ -357,7 +363,8 @@ static void sio_write_complete(struct kiocb *iocb, long= ret) * messages. */ pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", - ret, swap_dev_pos(page_swap_entry(page))); + ret, + swap_slot_pos(swp_entry_to_swp_slot(page_swap_entry(page)))); for (p =3D 0; p < sio->pages; p++) { page =3D sio->bvec[p].bv_page; set_page_dirty(page); @@ -374,9 +381,10 @@ static void sio_write_complete(struct kiocb *iocb, lon= g ret) static void swap_writepage_fs(struct folio *folio, struct swap_iocb **swap= _plug) { struct swap_iocb *sio =3D swap_plug ? *swap_plug : NULL; - struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); + swp_slot_t slot =3D swp_entry_to_swp_slot(folio->swap); + struct swap_info_struct *sis =3D __swap_slot_to_info(slot); struct file *swap_file =3D sis->swap_file; - loff_t pos =3D swap_dev_pos(folio->swap); + loff_t pos =3D swap_slot_pos(slot); =20 count_swpout_vm_event(folio); folio_start_writeback(folio); @@ -446,7 +454,8 @@ static void swap_writepage_bdev_async(struct folio *fol= io, =20 void __swap_writepage(struct folio *folio, struct swap_iocb **swap_plug) { - struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); + struct swap_info_struct *sis =3D + __swap_slot_to_info(swp_entry_to_swp_slot(folio->swap)); =20 VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio); /* @@ -537,9 +546,10 @@ static bool swap_read_folio_zeromap(struct folio *foli= o) =20 static void swap_read_folio_fs(struct folio *folio, struct swap_iocb **plu= g) { - struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); + swp_slot_t slot =3D swp_entry_to_swp_slot(folio->swap); + struct swap_info_struct *sis =3D __swap_slot_to_info(slot); struct swap_iocb *sio =3D NULL; - loff_t pos =3D swap_dev_pos(folio->swap); + loff_t pos =3D swap_slot_pos(slot); =20 if (plug) sio =3D *plug; @@ -608,7 +618,8 @@ static void swap_read_folio_bdev_async(struct folio *fo= lio, =20 void swap_read_folio(struct folio *folio, struct swap_iocb **plug) { - struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); + struct swap_info_struct *sis =3D + __swap_slot_to_info(swp_entry_to_swp_slot(folio->swap)); bool synchronous =3D sis->flags & SWP_SYNCHRONOUS_IO; bool workingset =3D folio_test_workingset(folio); unsigned long pflags; diff --git a/mm/shmem.c b/mm/shmem.c index b40be22fa5f09..400e2fa8e77cb 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1442,6 +1442,7 @@ static unsigned int shmem_find_swap_entries(struct ad= dress_space *mapping, XA_STATE(xas, &mapping->i_pages, start); struct folio *folio; swp_entry_t entry; + swp_slot_t slot; =20 rcu_read_lock(); xas_for_each(&xas, folio, ULONG_MAX) { @@ -1452,11 +1453,13 @@ static unsigned int shmem_find_swap_entries(struct = address_space *mapping, continue; =20 entry =3D radix_to_swp_entry(folio); + slot =3D swp_entry_to_swp_slot(entry); + /* * swapin error entries can be found in the mapping. But they're * deliberately ignored here as we've done everything we can do. */ - if (swp_type(entry) !=3D type) + if (swp_slot_type(slot) !=3D type) continue; =20 indices[folio_batch_count(fbatch)] =3D xas.xa_index; @@ -2224,6 +2227,7 @@ static int shmem_split_large_entry(struct inode *inod= e, pgoff_t index, XA_STATE_ORDER(xas, &mapping->i_pages, index, 0); int split_order =3D 0; int i; + swp_slot_t slot =3D swp_entry_to_swp_slot(swap); =20 /* Convert user data gfp flags to xarray node gfp flags */ gfp &=3D GFP_RECLAIM_MASK; @@ -2264,13 +2268,16 @@ static int shmem_split_large_entry(struct inode *in= ode, pgoff_t index, */ for (i =3D 0; i < 1 << cur_order; i +=3D (1 << split_order)) { - swp_entry_t tmp; + swp_entry_t tmp_entry; + swp_slot_t tmp_slot; + + tmp_slot =3D + swp_slot(swp_slot_type(slot), + swp_slot_offset(slot) + swap_offset + i); + tmp_entry =3D swp_slot_to_swp_entry(tmp_slot); =20 - tmp =3D swp_entry(swp_type(swap), - swp_offset(swap) + swap_offset + - i); __xa_store(&mapping->i_pages, aligned_index + i, - swp_to_radix_entry(tmp), 0); + swp_to_radix_entry(tmp_entry), 0); } cur_order =3D split_order; split_order =3D xas_try_split_min_order(split_order); diff --git a/mm/swap.h b/mm/swap.h index 8726b587a5b5d..bdf7aca146643 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -10,10 +10,10 @@ extern int page_cluster; =20 #ifdef CONFIG_THP_SWAP #define SWAPFILE_CLUSTER HPAGE_PMD_NR -#define swap_entry_order(order) (order) +#define swap_slot_order(order) (order) #else #define SWAPFILE_CLUSTER 256 -#define swap_entry_order(order) 0 +#define swap_slot_order(order) 0 #endif =20 extern struct swap_info_struct *swap_info[]; @@ -57,9 +57,9 @@ enum swap_cluster_flags { #include /* for swp_offset */ #include /* for bio_end_io_t */ =20 -static inline unsigned int swp_cluster_offset(swp_entry_t entry) +static inline unsigned int swp_cluster_offset(swp_slot_t slot) { - return swp_offset(entry) % SWAPFILE_CLUSTER; + return swp_slot_offset(slot) % SWAPFILE_CLUSTER; } =20 /* @@ -75,9 +75,9 @@ static inline struct swap_info_struct *__swap_type_to_inf= o(int type) return si; } =20 -static inline struct swap_info_struct *__swap_entry_to_info(swp_entry_t en= try) +static inline struct swap_info_struct *__swap_slot_to_info(swp_slot_t slot) { - return __swap_type_to_info(swp_type(entry)); + return __swap_type_to_info(swp_slot_type(slot)); } =20 static inline struct swap_cluster_info *__swap_offset_to_cluster( @@ -88,10 +88,10 @@ static inline struct swap_cluster_info *__swap_offset_t= o_cluster( return &si->cluster_info[offset / SWAPFILE_CLUSTER]; } =20 -static inline struct swap_cluster_info *__swap_entry_to_cluster(swp_entry_= t entry) +static inline struct swap_cluster_info *__swap_slot_to_cluster(swp_slot_t = slot) { - return __swap_offset_to_cluster(__swap_entry_to_info(entry), - swp_offset(entry)); + return __swap_offset_to_cluster(__swap_slot_to_info(slot), + swp_slot_offset(slot)); } =20 static __always_inline struct swap_cluster_info *__swap_cluster_lock( @@ -120,7 +120,7 @@ static __always_inline struct swap_cluster_info *__swap= _cluster_lock( /** * swap_cluster_lock - Lock and return the swap cluster of given offset. * @si: swap device the cluster belongs to. - * @offset: the swap entry offset, pointing to a valid slot. + * @offset: the swap slot offset, pointing to a valid slot. * * Context: The caller must ensure the offset is in the valid range and * protect the swap device with reference count or locks. @@ -134,10 +134,12 @@ static inline struct swap_cluster_info *swap_cluster_= lock( static inline struct swap_cluster_info *__swap_cluster_get_and_lock( const struct folio *folio, bool irq) { + swp_slot_t slot =3D swp_entry_to_swp_slot(folio->swap); + VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio); - return __swap_cluster_lock(__swap_entry_to_info(folio->swap), - swp_offset(folio->swap), irq); + return __swap_cluster_lock(__swap_slot_to_info(slot), + swp_slot_offset(slot), irq); } =20 /* @@ -209,12 +211,10 @@ static inline struct address_space *swap_address_spac= e(swp_entry_t entry) return &swap_space; } =20 -/* - * Return the swap device position of the swap entry. - */ -static inline loff_t swap_dev_pos(swp_entry_t entry) +/* Return the swap device position of the swap slot. */ +static inline loff_t swap_slot_pos(swp_slot_t slot) { - return ((loff_t)swp_offset(entry)) << PAGE_SHIFT; + return ((loff_t)swp_slot_offset(slot)) << PAGE_SHIFT; } =20 /** @@ -276,7 +276,9 @@ void swap_update_readahead(struct folio *folio, struct = vm_area_struct *vma, =20 static inline unsigned int folio_swap_flags(struct folio *folio) { - return __swap_entry_to_info(folio->swap)->flags; + swp_slot_t swp_slot =3D swp_entry_to_swp_slot(folio->swap); + + return __swap_slot_to_info(swp_slot)->flags; } =20 /* @@ -287,8 +289,9 @@ static inline unsigned int folio_swap_flags(struct foli= o *folio) static inline int swap_zeromap_batch(swp_entry_t entry, int max_nr, bool *is_zeromap) { - struct swap_info_struct *sis =3D __swap_entry_to_info(entry); - unsigned long start =3D swp_offset(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + struct swap_info_struct *sis =3D __swap_slot_to_info(slot); + unsigned long start =3D swp_slot_offset(slot); unsigned long end =3D start + max_nr; bool first_bit; =20 @@ -306,8 +309,9 @@ static inline int swap_zeromap_batch(swp_entry_t entry,= int max_nr, =20 static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) { - struct swap_info_struct *si =3D __swap_entry_to_info(entry); - pgoff_t offset =3D swp_offset(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + struct swap_info_struct *si =3D __swap_slot_to_info(slot); + pgoff_t offset =3D swp_slot_offset(slot); int i; =20 /* @@ -326,7 +330,7 @@ static inline int non_swapcache_batch(swp_entry_t entry= , int max_nr) #else /* CONFIG_SWAP */ struct swap_iocb; static inline struct swap_cluster_info *swap_cluster_lock( - struct swap_info_struct *si, pgoff_t offset, bool irq) + struct swap_info_struct *si, unsigned long offset) { return NULL; } @@ -351,7 +355,7 @@ static inline void swap_cluster_unlock_irq(struct swap_= cluster_info *ci) { } =20 -static inline struct swap_info_struct *__swap_entry_to_info(swp_entry_t en= try) +static inline struct swap_info_struct *__swap_slot_to_info(swp_slot_t slot) { return NULL; } diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index de779fed8c210..77ce1d66c318d 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -65,13 +65,14 @@ void swap_cgroup_record(struct folio *folio, unsigned s= hort id, swp_entry_t ent) { unsigned int nr_ents =3D folio_nr_pages(folio); + swp_slot_t slot =3D swp_entry_to_swp_slot(ent); struct swap_cgroup *map; pgoff_t offset, end; unsigned short old; =20 - offset =3D swp_offset(ent); + offset =3D swp_slot_offset(slot); end =3D offset + nr_ents; - map =3D swap_cgroup_ctrl[swp_type(ent)].map; + map =3D swap_cgroup_ctrl[swp_slot_type(slot)].map; =20 do { old =3D __swap_cgroup_id_xchg(map, offset, id); @@ -92,13 +93,13 @@ void swap_cgroup_record(struct folio *folio, unsigned s= hort id, */ unsigned short swap_cgroup_clear(swp_entry_t ent, unsigned int nr_ents) { - pgoff_t offset, end; + swp_slot_t slot =3D swp_entry_to_swp_slot(ent); + pgoff_t offset =3D swp_slot_offset(slot); + pgoff_t end =3D offset + nr_ents; struct swap_cgroup *map; unsigned short old, iter =3D 0; =20 - offset =3D swp_offset(ent); - end =3D offset + nr_ents; - map =3D swap_cgroup_ctrl[swp_type(ent)].map; + map =3D swap_cgroup_ctrl[swp_slot_type(slot)].map; =20 do { old =3D __swap_cgroup_id_xchg(map, offset, 0); @@ -119,12 +120,13 @@ unsigned short swap_cgroup_clear(swp_entry_t ent, uns= igned int nr_ents) unsigned short lookup_swap_cgroup_id(swp_entry_t ent) { struct swap_cgroup_ctrl *ctrl; + swp_slot_t slot =3D swp_entry_to_swp_slot(ent); =20 if (mem_cgroup_disabled()) return 0; =20 - ctrl =3D &swap_cgroup_ctrl[swp_type(ent)]; - return __swap_cgroup_id_lookup(ctrl->map, swp_offset(ent)); + ctrl =3D &swap_cgroup_ctrl[swp_slot_type(slot)]; + return __swap_cgroup_id_lookup(ctrl->map, swp_slot_offset(slot)); } =20 int swap_cgroup_swapon(int type, unsigned long max_pages) diff --git a/mm/swap_state.c b/mm/swap_state.c index bece18eb540fa..e2e9f55bea3bb 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -421,7 +421,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry= , gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, bool skip_if_exists) { - struct swap_info_struct *si =3D __swap_entry_to_info(entry); + struct swap_info_struct *si =3D + __swap_slot_to_info(swp_entry_to_swp_slot(entry)); struct folio *folio; struct folio *new_folio =3D NULL; struct folio *result =3D NULL; @@ -636,11 +637,12 @@ struct folio *swap_cluster_readahead(swp_entry_t entr= y, gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx) { struct folio *folio; - unsigned long entry_offset =3D swp_offset(entry); - unsigned long offset =3D entry_offset; + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + unsigned long slot_offset =3D swp_slot_offset(slot); + unsigned long offset =3D slot_offset; unsigned long start_offset, end_offset; unsigned long mask; - struct swap_info_struct *si =3D __swap_entry_to_info(entry); + struct swap_info_struct *si =3D __swap_slot_to_info(slot); struct blk_plug plug; struct swap_iocb *splug =3D NULL; bool page_allocated; @@ -661,13 +663,13 @@ struct folio *swap_cluster_readahead(swp_entry_t entr= y, gfp_t gfp_mask, for (offset =3D start_offset; offset <=3D end_offset ; offset++) { /* Ok, do the async read-ahead now */ folio =3D __read_swap_cache_async( - swp_entry(swp_type(entry), offset), + swp_slot_to_swp_entry(swp_slot(swp_slot_type(slot), offset)), gfp_mask, mpol, ilx, &page_allocated, false); if (!folio) continue; if (page_allocated) { swap_read_folio(folio, &splug); - if (offset !=3D entry_offset) { + if (offset !=3D slot_offset) { folio_set_readahead(folio); count_vm_event(SWAP_RA); } @@ -779,16 +781,20 @@ static struct folio *swap_vma_readahead(swp_entry_t t= arg_entry, gfp_t gfp_mask, /* * Readahead entry may come from a device that we are not * holding a reference to, try to grab a reference, or skip. + * + * XXX: for now, always try to pin the swap entries in the + * readahead window to avoid the annoying conversion to physical + * swap slots. Once we move all swap metadata to virtual swap + * layer, we can simply compare the clusters of the target + * swap entry and the current swap entry, and pin the latter + * swap entry's cluster if it differ from the former's. */ - if (swp_type(entry) !=3D swp_type(targ_entry)) { - swapoff_locked =3D tryget_swap_entry(entry, &si); - if (!swapoff_locked) - continue; - } + swapoff_locked =3D tryget_swap_entry(entry, &si); + if (!swapoff_locked) + continue; folio =3D __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); - if (swapoff_locked) - put_swap_entry(entry, si); + put_swap_entry(entry, si); if (!folio) continue; if (page_allocated) { diff --git a/mm/swapfile.c b/mm/swapfile.c index 3c89dedbd5718..4b4126d4e2769 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -53,9 +53,9 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); -static void swap_entries_free(struct swap_info_struct *si, +static void swap_slots_free(struct swap_info_struct *si, struct swap_cluster_info *ci, - swp_entry_t entry, unsigned int nr_pages); + swp_slot_t slot, unsigned int nr_pages); static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries); static bool folio_swapcache_freeable(struct folio *folio); @@ -126,7 +126,7 @@ struct percpu_swap_cluster { =20 static DEFINE_PER_CPU(struct percpu_swap_cluster, percpu_swap_cluster) =3D= { .si =3D { NULL }, - .offset =3D { SWAP_ENTRY_INVALID }, + .offset =3D { SWAP_SLOT_INVALID }, .lock =3D INIT_LOCAL_LOCK(), }; =20 @@ -139,9 +139,9 @@ static struct swap_info_struct *swap_type_to_info(int t= ype) } =20 /* May return NULL on invalid entry, caller must check for NULL return */ -static struct swap_info_struct *swap_entry_to_info(swp_entry_t entry) +static struct swap_info_struct *swap_slot_to_info(swp_slot_t slot) { - return swap_type_to_info(swp_type(entry)); + return swap_type_to_info(swp_slot_type(slot)); } =20 static inline unsigned char swap_count(unsigned char ent) @@ -204,9 +204,11 @@ static bool swap_only_has_cache(struct swap_info_struc= t *si, */ bool is_swap_cached(swp_entry_t entry) { - struct swap_info_struct *si =3D __swap_entry_to_info(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + struct swap_info_struct *si =3D swap_slot_to_info(slot); + unsigned long offset =3D swp_slot_offset(slot); =20 - return READ_ONCE(si->swap_map[swp_offset(entry)]) & SWAP_HAS_CACHE; + return READ_ONCE(si->swap_map[offset]) & SWAP_HAS_CACHE; } =20 static bool swap_is_last_map(struct swap_info_struct *si, @@ -236,7 +238,9 @@ static bool swap_is_last_map(struct swap_info_struct *s= i, static int __try_to_reclaim_swap(struct swap_info_struct *si, unsigned long offset, unsigned long flags) { - const swp_entry_t entry =3D swp_entry(si->type, offset); + const swp_entry_t entry =3D + swp_slot_to_swp_entry(swp_slot(si->type, offset)); + swp_slot_t slot; struct swap_cluster_info *ci; struct folio *folio; int ret, nr_pages; @@ -268,7 +272,8 @@ static int __try_to_reclaim_swap(struct swap_info_struc= t *si, folio_put(folio); goto again; } - offset =3D swp_offset(folio->swap); + slot =3D swp_entry_to_swp_slot(folio->swap); + offset =3D swp_slot_offset(slot); =20 need_reclaim =3D ((flags & TTRS_ANYWAY) || ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) || @@ -368,12 +373,12 @@ offset_to_swap_extent(struct swap_info_struct *sis, u= nsigned long offset) =20 sector_t swap_folio_sector(struct folio *folio) { - struct swap_info_struct *sis =3D __swap_entry_to_info(folio->swap); + swp_slot_t slot =3D swp_entry_to_swp_slot(folio->swap); + struct swap_info_struct *sis =3D __swap_slot_to_info(slot); struct swap_extent *se; sector_t sector; - pgoff_t offset; + pgoff_t offset =3D swp_slot_offset(slot); =20 - offset =3D swp_offset(folio->swap); se =3D offset_to_swap_extent(sis, offset); sector =3D se->start_block + (offset - se->start_page); return sector << (PAGE_SHIFT - 9); @@ -890,7 +895,7 @@ static unsigned int alloc_swap_scan_cluster(struct swap= _info_struct *si, unsigned int order, unsigned char usage) { - unsigned int next =3D SWAP_ENTRY_INVALID, found =3D SWAP_ENTRY_INVALID; + unsigned int next =3D SWAP_SLOT_INVALID, found =3D SWAP_SLOT_INVALID; unsigned long start =3D ALIGN_DOWN(offset, SWAPFILE_CLUSTER); unsigned long end =3D min(start + SWAPFILE_CLUSTER, si->max); unsigned int nr_pages =3D 1 << order; @@ -947,7 +952,7 @@ static unsigned int alloc_swap_scan_list(struct swap_in= fo_struct *si, unsigned char usage, bool scan_all) { - unsigned int found =3D SWAP_ENTRY_INVALID; + unsigned int found =3D SWAP_SLOT_INVALID; =20 do { struct swap_cluster_info *ci =3D isolate_lock_cluster(si, list); @@ -1017,11 +1022,11 @@ static void swap_reclaim_work(struct work_struct *w= ork) * Try to allocate swap entries with specified order and try set a new * cluster for current CPU too. */ -static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si,= int order, +static unsigned long cluster_alloc_swap_slot(struct swap_info_struct *si, = int order, unsigned char usage) { struct swap_cluster_info *ci; - unsigned int offset =3D SWAP_ENTRY_INVALID, found =3D SWAP_ENTRY_INVALID; + unsigned int offset =3D SWAP_SLOT_INVALID, found =3D SWAP_SLOT_INVALID; =20 /* * Swapfile is not block device so unable @@ -1034,7 +1039,7 @@ static unsigned long cluster_alloc_swap_entry(struct = swap_info_struct *si, int o /* Serialize HDD SWAP allocation for each device. */ spin_lock(&si->global_cluster_lock); offset =3D si->global_cluster->next[order]; - if (offset =3D=3D SWAP_ENTRY_INVALID) + if (offset =3D=3D SWAP_SLOT_INVALID) goto new_cluster; =20 ci =3D swap_cluster_lock(si, offset); @@ -1255,7 +1260,7 @@ static void swap_range_free(struct swap_info_struct *= si, unsigned long offset, */ for (i =3D 0; i < nr_entries; i++) { clear_bit(offset + i, si->zeromap); - zswap_invalidate(swp_entry(si->type, offset + i)); + zswap_invalidate(swp_slot_to_swp_entry(swp_slot(si->type, offset + i))); } =20 if (si->flags & SWP_BLKDEV) @@ -1300,12 +1305,11 @@ static bool get_swap_device_info(struct swap_info_s= truct *si) * Fast path try to get swap entries with specified order from current * CPU's swap entry pool (a cluster). */ -static bool swap_alloc_fast(swp_entry_t *entry, - int order) +static bool swap_alloc_fast(swp_slot_t *slot, int order) { struct swap_cluster_info *ci; struct swap_info_struct *si; - unsigned int offset, found =3D SWAP_ENTRY_INVALID; + unsigned int offset, found =3D SWAP_SLOT_INVALID; =20 /* * Once allocated, swap_info_struct will never be completely freed, @@ -1322,18 +1326,17 @@ static bool swap_alloc_fast(swp_entry_t *entry, offset =3D cluster_offset(si, ci); found =3D alloc_swap_scan_cluster(si, ci, offset, order, SWAP_HAS_CACHE); if (found) - *entry =3D swp_entry(si->type, found); + *slot =3D swp_slot(si->type, found); } else { swap_cluster_unlock(ci); } =20 - put_swap_device(si); + swap_slot_put_swap_info(si); return !!found; } =20 /* Rotate the device and switch to a new cluster */ -static void swap_alloc_slow(swp_entry_t *entry, - int order) +static void swap_alloc_slow(swp_slot_t *slot, int order) { unsigned long offset; struct swap_info_struct *si, *next; @@ -1345,10 +1348,10 @@ static void swap_alloc_slow(swp_entry_t *entry, plist_requeue(&si->avail_list, &swap_avail_head); spin_unlock(&swap_avail_lock); if (get_swap_device_info(si)) { - offset =3D cluster_alloc_swap_entry(si, order, SWAP_HAS_CACHE); - put_swap_device(si); + offset =3D cluster_alloc_swap_slot(si, order, SWAP_HAS_CACHE); + swap_slot_put_swap_info(si); if (offset) { - *entry =3D swp_entry(si->type, offset); + *slot =3D swp_slot(si->type, offset); return; } if (order) @@ -1388,7 +1391,7 @@ static bool swap_sync_discard(void) if (get_swap_device_info(si)) { if (si->flags & SWP_PAGE_DISCARD) ret =3D swap_do_scheduled_discard(si); - put_swap_device(si); + swap_slot_put_swap_info(si); } if (ret) return true; @@ -1402,25 +1405,9 @@ static bool swap_sync_discard(void) return false; } =20 -/** - * folio_alloc_swap - allocate swap space for a folio - * @folio: folio we want to move to swap - * - * Allocate swap space for the folio and add the folio to the - * swap cache. - * - * Context: Caller needs to hold the folio lock. - * Return: Whether the folio was added to the swap cache. - */ -int folio_alloc_swap(struct folio *folio) +static int swap_slot_alloc(swp_slot_t *slot, unsigned int order) { - unsigned int order =3D folio_order(folio); unsigned int size =3D 1 << order; - swp_entry_t entry =3D {}; - int err; - - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); =20 if (order) { /* @@ -1442,22 +1429,52 @@ int folio_alloc_swap(struct folio *folio) =20 again: local_lock(&percpu_swap_cluster.lock); - if (!swap_alloc_fast(&entry, order)) - swap_alloc_slow(&entry, order); + if (!swap_alloc_fast(slot, order)) + swap_alloc_slow(slot, order); local_unlock(&percpu_swap_cluster.lock); =20 - if (unlikely(!order && !entry.val)) { + if (unlikely(!order && !slot->val)) { if (swap_sync_discard()) goto again; } =20 + return 0; +} + +/** + * folio_alloc_swap - allocate swap space for a folio + * @folio: folio we want to move to swap + * + * Allocate swap space for the folio and add the folio to the + * swap cache. + * + * Context: Caller needs to hold the folio lock. + * Return: Whether the folio was added to the swap cache. + */ +int folio_alloc_swap(struct folio *folio) +{ + unsigned int order =3D folio_order(folio); + swp_slot_t slot =3D { 0 }; + swp_entry_t entry =3D {}; + int err =3D 0, ret; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(!folio_test_uptodate(folio), folio); + + ret =3D swap_slot_alloc(&slot, order); + if (ret) + return ret; + + /* XXX: for now, physical and virtual swap slots are identical */ + entry.val =3D slot.val; + /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */ if (mem_cgroup_try_charge_swap(folio, entry)) { err =3D -ENOMEM; goto out_free; } =20 - if (!entry.val) + if (!slot.val) return -ENOMEM; =20 err =3D swap_cache_add_folio(folio, entry, __GFP_HIGH | __GFP_NOMEMALLOC = | __GFP_NOWARN, NULL); @@ -1471,46 +1488,46 @@ int folio_alloc_swap(struct folio *folio) return err; } =20 -static struct swap_info_struct *_swap_info_get(swp_entry_t entry) +static struct swap_info_struct *_swap_info_get(swp_slot_t slot) { struct swap_info_struct *si; unsigned long offset; =20 - if (!entry.val) + if (!slot.val) goto out; - si =3D swap_entry_to_info(entry); + si =3D swap_slot_to_info(slot); if (!si) goto bad_nofile; if (data_race(!(si->flags & SWP_USED))) goto bad_device; - offset =3D swp_offset(entry); + offset =3D swp_slot_offset(slot); if (offset >=3D si->max) goto bad_offset; - if (data_race(!si->swap_map[swp_offset(entry)])) + if (data_race(!si->swap_map[swp_slot_offset(slot)])) goto bad_free; return si; =20 bad_free: - pr_err("%s: %s%08lx\n", __func__, Unused_offset, entry.val); + pr_err("%s: %s%08lx\n", __func__, Unused_offset, slot.val); goto out; bad_offset: - pr_err("%s: %s%08lx\n", __func__, Bad_offset, entry.val); + pr_err("%s: %s%08lx\n", __func__, Bad_offset, slot.val); goto out; bad_device: - pr_err("%s: %s%08lx\n", __func__, Unused_file, entry.val); + pr_err("%s: %s%08lx\n", __func__, Unused_file, slot.val); goto out; bad_nofile: - pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val); + pr_err("%s: %s%08lx\n", __func__, Bad_file, slot.val); out: return NULL; } =20 -static unsigned char swap_entry_put_locked(struct swap_info_struct *si, +static unsigned char swap_slot_put_locked(struct swap_info_struct *si, struct swap_cluster_info *ci, - swp_entry_t entry, + swp_slot_t slot, unsigned char usage) { - unsigned long offset =3D swp_offset(entry); + unsigned long offset =3D swp_slot_offset(slot); unsigned char count; unsigned char has_cache; =20 @@ -1542,7 +1559,7 @@ static unsigned char swap_entry_put_locked(struct swa= p_info_struct *si, if (usage) WRITE_ONCE(si->swap_map[offset], usage); else - swap_entries_free(si, ci, entry, 1); + swap_slots_free(si, ci, slot, 1); =20 return usage; } @@ -1552,8 +1569,9 @@ static unsigned char swap_entry_put_locked(struct swa= p_info_struct *si, * prevent swapoff, such as the folio in swap cache is locked, RCU * reader side is locked, etc., the swap entry may become invalid * because of swapoff. Then, we need to enclose all swap related - * functions with get_swap_device() and put_swap_device(), unless the - * swap functions call get/put_swap_device() by themselves. + * functions with swap_slot_tryget_swap_info() and + * swap_slot_put_swap_info(), unless the swap functions call + * swap_slot_(tryget|put)_swap_info by themselves. * * RCU reader side lock (including any spinlock) is sufficient to * prevent swapoff, because synchronize_rcu() is called in swapoff() @@ -1562,11 +1580,11 @@ static unsigned char swap_entry_put_locked(struct s= wap_info_struct *si, * Check whether swap entry is valid in the swap device. If so, * return pointer to swap_info_struct, and keep the swap entry valid * via preventing the swap device from being swapoff, until - * put_swap_device() is called. Otherwise return NULL. + * swap_slot_put_swap_info() is called. Otherwise return NULL. * * Notice that swapoff or swapoff+swapon can still happen before the - * percpu_ref_tryget_live() in get_swap_device() or after the - * percpu_ref_put() in put_swap_device() if there isn't any other way + * percpu_ref_tryget_live() in swap_slot_tryget_swap_info() or after the + * percpu_ref_put() in swap_slot_put_swap_info() if there isn't any other = way * to prevent swapoff. The caller must be prepared for that. For * example, the following situation is possible. * @@ -1586,53 +1604,53 @@ static unsigned char swap_entry_put_locked(struct s= wap_info_struct *si, * changed with the page table locked to check whether the swap device * has been swapoff or swapoff+swapon. */ -struct swap_info_struct *get_swap_device(swp_entry_t entry) +struct swap_info_struct *swap_slot_tryget_swap_info(swp_slot_t slot) { struct swap_info_struct *si; unsigned long offset; =20 - if (!entry.val) + if (!slot.val) goto out; - si =3D swap_entry_to_info(entry); + si =3D swap_slot_to_info(slot); if (!si) goto bad_nofile; if (!get_swap_device_info(si)) goto out; - offset =3D swp_offset(entry); + offset =3D swp_slot_offset(slot); if (offset >=3D si->max) goto put_out; =20 return si; bad_nofile: - pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val); + pr_err("%s: %s%08lx\n", __func__, Bad_file, slot.val); out: return NULL; put_out: - pr_err("%s: %s%08lx\n", __func__, Bad_offset, entry.val); + pr_err("%s: %s%08lx\n", __func__, Bad_offset, slot.val); percpu_ref_put(&si->users); return NULL; } =20 -static void swap_entries_put_cache(struct swap_info_struct *si, - swp_entry_t entry, int nr) +static void swap_slots_put_cache(struct swap_info_struct *si, + swp_slot_t slot, int nr) { - unsigned long offset =3D swp_offset(entry); + unsigned long offset =3D swp_slot_offset(slot); struct swap_cluster_info *ci; =20 ci =3D swap_cluster_lock(si, offset); if (swap_only_has_cache(si, offset, nr)) { - swap_entries_free(si, ci, entry, nr); + swap_slots_free(si, ci, slot, nr); } else { - for (int i =3D 0; i < nr; i++, entry.val++) - swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE); + for (int i =3D 0; i < nr; i++, slot.val++) + swap_slot_put_locked(si, ci, slot, SWAP_HAS_CACHE); } swap_cluster_unlock(ci); } =20 -static bool swap_entries_put_map(struct swap_info_struct *si, - swp_entry_t entry, int nr) +static bool swap_slots_put_map(struct swap_info_struct *si, + swp_slot_t slot, int nr) { - unsigned long offset =3D swp_offset(entry); + unsigned long offset =3D swp_slot_offset(slot); struct swap_cluster_info *ci; bool has_cache =3D false; unsigned char count; @@ -1649,7 +1667,7 @@ static bool swap_entries_put_map(struct swap_info_str= uct *si, goto locked_fallback; } if (!has_cache) - swap_entries_free(si, ci, entry, nr); + swap_slots_free(si, ci, slot, nr); else for (i =3D 0; i < nr; i++) WRITE_ONCE(si->swap_map[offset + i], SWAP_HAS_CACHE); @@ -1660,8 +1678,8 @@ static bool swap_entries_put_map(struct swap_info_str= uct *si, fallback: ci =3D swap_cluster_lock(si, offset); locked_fallback: - for (i =3D 0; i < nr; i++, entry.val++) { - count =3D swap_entry_put_locked(si, ci, entry, 1); + for (i =3D 0; i < nr; i++, slot.val++) { + count =3D swap_slot_put_locked(si, ci, slot, 1); if (count =3D=3D SWAP_HAS_CACHE) has_cache =3D true; } @@ -1674,20 +1692,20 @@ static bool swap_entries_put_map(struct swap_info_s= truct *si, * cross multi clusters, so ensure the range is within a single cluster * when freeing entries with functions without "_nr" suffix. */ -static bool swap_entries_put_map_nr(struct swap_info_struct *si, - swp_entry_t entry, int nr) +static bool swap_slots_put_map_nr(struct swap_info_struct *si, + swp_slot_t slot, int nr) { int cluster_nr, cluster_rest; - unsigned long offset =3D swp_offset(entry); + unsigned long offset =3D swp_slot_offset(slot); bool has_cache =3D false; =20 cluster_rest =3D SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER; while (nr) { cluster_nr =3D min(nr, cluster_rest); - has_cache |=3D swap_entries_put_map(si, entry, cluster_nr); + has_cache |=3D swap_slots_put_map(si, slot, cluster_nr); cluster_rest =3D SWAPFILE_CLUSTER; nr -=3D cluster_nr; - entry.val +=3D cluster_nr; + slot.val +=3D cluster_nr; } =20 return has_cache; @@ -1707,13 +1725,14 @@ static inline bool __maybe_unused swap_is_last_ref(= unsigned char count) * Drop the last ref of swap entries, caller have to ensure all entries * belong to the same cgroup and cluster. */ -static void swap_entries_free(struct swap_info_struct *si, +static void swap_slots_free(struct swap_info_struct *si, struct swap_cluster_info *ci, - swp_entry_t entry, unsigned int nr_pages) + swp_slot_t slot, unsigned int nr_pages) { - unsigned long offset =3D swp_offset(entry); + unsigned long offset =3D swp_slot_offset(slot); unsigned char *map =3D si->swap_map + offset; unsigned char *map_end =3D map + nr_pages; + swp_entry_t entry =3D swp_slot_to_swp_entry(slot); =20 /* It should never free entries across different clusters */ VM_BUG_ON(ci !=3D __swap_offset_to_cluster(si, offset + nr_pages - 1)); @@ -1739,43 +1758,54 @@ static void swap_entries_free(struct swap_info_stru= ct *si, * Caller has made sure that the swap device corresponding to entry * is still around or has not been recycled. */ -void swap_free_nr(swp_entry_t entry, int nr_pages) +void swap_slot_free_nr(swp_slot_t slot, int nr_pages) { int nr; struct swap_info_struct *sis; - unsigned long offset =3D swp_offset(entry); + unsigned long offset =3D swp_slot_offset(slot); =20 - sis =3D _swap_info_get(entry); + sis =3D _swap_info_get(slot); if (!sis) return; =20 while (nr_pages) { nr =3D min_t(int, nr_pages, SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER= ); - swap_entries_put_map(sis, swp_entry(sis->type, offset), nr); + swap_slots_put_map(sis, swp_slot(sis->type, offset), nr); offset +=3D nr; nr_pages -=3D nr; } } =20 +/* + * Caller has made sure that the swap device corresponding to entry + * is still around or has not been recycled. + */ +void swap_free_nr(swp_entry_t entry, int nr_pages) +{ + swap_slot_free_nr(swp_entry_to_swp_slot(entry), nr_pages); +} + /* * Called after dropping swapcache to decrease refcnt to swap entries. */ void put_swap_folio(struct folio *folio, swp_entry_t entry) { + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); struct swap_info_struct *si; - int size =3D 1 << swap_entry_order(folio_order(folio)); + int size =3D 1 << swap_slot_order(folio_order(folio)); =20 - si =3D _swap_info_get(entry); + si =3D _swap_info_get(slot); if (!si) return; =20 - swap_entries_put_cache(si, entry, size); + swap_slots_put_cache(si, slot, size); } =20 int __swap_count(swp_entry_t entry) { - struct swap_info_struct *si =3D __swap_entry_to_info(entry); - pgoff_t offset =3D swp_offset(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + struct swap_info_struct *si =3D __swap_slot_to_info(slot); + pgoff_t offset =3D swp_slot_offset(slot); =20 return swap_count(si->swap_map[offset]); } @@ -1787,7 +1817,8 @@ int __swap_count(swp_entry_t entry) */ bool swap_entry_swapped(struct swap_info_struct *si, swp_entry_t entry) { - pgoff_t offset =3D swp_offset(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + pgoff_t offset =3D swp_slot_offset(slot); struct swap_cluster_info *ci; int count; =20 @@ -1803,6 +1834,7 @@ bool swap_entry_swapped(struct swap_info_struct *si, = swp_entry_t entry) */ int swp_swapcount(swp_entry_t entry) { + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); int count, tmp_count, n; struct swap_info_struct *si; struct swap_cluster_info *ci; @@ -1810,11 +1842,11 @@ int swp_swapcount(swp_entry_t entry) pgoff_t offset; unsigned char *map; =20 - si =3D _swap_info_get(entry); + si =3D _swap_info_get(slot); if (!si) return 0; =20 - offset =3D swp_offset(entry); + offset =3D swp_slot_offset(slot); =20 ci =3D swap_cluster_lock(si, offset); =20 @@ -1846,10 +1878,11 @@ int swp_swapcount(swp_entry_t entry) static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, swp_entry_t entry, int order) { + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); struct swap_cluster_info *ci; unsigned char *map =3D si->swap_map; unsigned int nr_pages =3D 1 << order; - unsigned long roffset =3D swp_offset(entry); + unsigned long roffset =3D swp_slot_offset(slot); unsigned long offset =3D round_down(roffset, nr_pages); int i; bool ret =3D false; @@ -1874,7 +1907,8 @@ static bool swap_page_trans_huge_swapped(struct swap_= info_struct *si, static bool folio_swapped(struct folio *folio) { swp_entry_t entry =3D folio->swap; - struct swap_info_struct *si =3D _swap_info_get(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + struct swap_info_struct *si =3D _swap_info_get(slot); =20 if (!si) return false; @@ -1948,13 +1982,14 @@ bool folio_free_swap(struct folio *folio) */ void free_swap_and_cache_nr(swp_entry_t entry, int nr) { - const unsigned long start_offset =3D swp_offset(entry); + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); + const unsigned long start_offset =3D swp_slot_offset(slot); const unsigned long end_offset =3D start_offset + nr; struct swap_info_struct *si; bool any_only_cache =3D false; unsigned long offset; =20 - si =3D get_swap_device(entry); + si =3D swap_slot_tryget_swap_info(slot); if (!si) return; =20 @@ -1964,7 +1999,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) /* * First free all entries in the range. */ - any_only_cache =3D swap_entries_put_map_nr(si, entry, nr); + any_only_cache =3D swap_slots_put_map_nr(si, slot, nr); =20 /* * Short-circuit the below loop if none of the entries had their @@ -1998,16 +2033,16 @@ void free_swap_and_cache_nr(swp_entry_t entry, int = nr) } =20 out: - put_swap_device(si); + swap_slot_put_swap_info(si); } =20 #ifdef CONFIG_HIBERNATION =20 -swp_entry_t get_swap_page_of_type(int type) +swp_slot_t swap_slot_alloc_of_type(int type) { struct swap_info_struct *si =3D swap_type_to_info(type); unsigned long offset; - swp_entry_t entry =3D {0}; + swp_slot_t slot =3D {0}; =20 if (!si) goto fail; @@ -2020,15 +2055,15 @@ swp_entry_t get_swap_page_of_type(int type) * with swap table allocation. */ local_lock(&percpu_swap_cluster.lock); - offset =3D cluster_alloc_swap_entry(si, 0, 1); + offset =3D cluster_alloc_swap_slot(si, 0, 1); local_unlock(&percpu_swap_cluster.lock); if (offset) - entry =3D swp_entry(si->type, offset); + slot =3D swp_slot(si->type, offset); } - put_swap_device(si); + swap_slot_put_swap_info(si); } fail: - return entry; + return slot; } =20 /* @@ -2257,6 +2292,7 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, unsigned long offset; unsigned char swp_count; softleaf_t entry; + swp_slot_t slot; int ret; pte_t ptent; =20 @@ -2271,10 +2307,12 @@ static int unuse_pte_range(struct vm_area_struct *v= ma, pmd_t *pmd, =20 if (!softleaf_is_swap(entry)) continue; - if (swp_type(entry) !=3D type) + + slot =3D swp_entry_to_swp_slot(entry); + if (swp_slot_type(slot) !=3D type) continue; =20 - offset =3D swp_offset(entry); + offset =3D swp_slot_offset(slot); pte_unmap(pte); pte =3D NULL; =20 @@ -2459,6 +2497,7 @@ static int try_to_unuse(unsigned int type) struct swap_info_struct *si =3D swap_info[type]; struct folio *folio; swp_entry_t entry; + swp_slot_t slot; unsigned int i; =20 if (!swap_usage_in_pages(si)) @@ -2506,7 +2545,8 @@ static int try_to_unuse(unsigned int type) !signal_pending(current) && (i =3D find_next_to_unuse(si, i)) !=3D 0) { =20 - entry =3D swp_entry(type, i); + slot =3D swp_slot(type, i); + entry =3D swp_slot_to_swp_entry(slot); folio =3D swap_cache_get_folio(entry); if (!folio) continue; @@ -2890,7 +2930,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, special= file) } =20 /* - * Wait for swap operations protected by get/put_swap_device() + * Wait for swap operations protected by swap_slot_(tryget|put)_swap_info= () * to complete. Because of synchronize_rcu() here, all swap * operations protected by RCU reader side lock (including any * spinlock) will be waited too. This makes it easy to @@ -3331,7 +3371,7 @@ static struct swap_cluster_info *setup_clusters(struc= t swap_info_struct *si, if (!si->global_cluster) goto err; for (i =3D 0; i < SWAP_NR_ORDERS; i++) - si->global_cluster->next[i] =3D SWAP_ENTRY_INVALID; + si->global_cluster->next[i] =3D SWAP_SLOT_INVALID; spin_lock_init(&si->global_cluster_lock); } =20 @@ -3669,6 +3709,7 @@ void si_swapinfo(struct sysinfo *val) */ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) { + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); struct swap_info_struct *si; struct swap_cluster_info *ci; unsigned long offset; @@ -3676,13 +3717,13 @@ static int __swap_duplicate(swp_entry_t entry, unsi= gned char usage, int nr) unsigned char has_cache; int err, i; =20 - si =3D swap_entry_to_info(entry); + si =3D swap_slot_to_info(slot); if (WARN_ON_ONCE(!si)) { pr_err("%s%08lx\n", Bad_file, entry.val); return -EINVAL; } =20 - offset =3D swp_offset(entry); + offset =3D swp_slot_offset(slot); VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); VM_WARN_ON(usage =3D=3D 1 && nr > 1); ci =3D swap_cluster_lock(si, offset); @@ -3788,7 +3829,7 @@ int swapcache_prepare(swp_entry_t entry, int nr) */ void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry, int n= r) { - swap_entries_put_cache(si, entry, nr); + swap_slots_put_cache(si, swp_entry_to_swp_slot(entry), nr); } =20 /* @@ -3815,6 +3856,7 @@ int add_swap_count_continuation(swp_entry_t entry, gf= p_t gfp_mask) struct page *list_page; pgoff_t offset; unsigned char count; + swp_slot_t slot =3D swp_entry_to_swp_slot(entry); int ret =3D 0; =20 /* @@ -3823,7 +3865,7 @@ int add_swap_count_continuation(swp_entry_t entry, gf= p_t gfp_mask) */ page =3D alloc_page(gfp_mask | __GFP_HIGHMEM); =20 - si =3D get_swap_device(entry); + si =3D swap_slot_tryget_swap_info(slot); if (!si) { /* * An acceptable race has occurred since the failing @@ -3832,7 +3874,7 @@ int add_swap_count_continuation(swp_entry_t entry, gf= p_t gfp_mask) goto outer; } =20 - offset =3D swp_offset(entry); + offset =3D swp_slot_offset(slot); =20 ci =3D swap_cluster_lock(si, offset); =20 @@ -3895,7 +3937,7 @@ int add_swap_count_continuation(swp_entry_t entry, gf= p_t gfp_mask) spin_unlock(&si->cont_lock); out: swap_cluster_unlock(ci); - put_swap_device(si); + swap_slot_put_swap_info(si); outer: if (page) __free_page(page); --=20 2.47.3