From nobody Wed Dec 24 03:16:50 2025 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC3A93F9CD for ; Mon, 29 Jan 2024 17:55:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550908; cv=none; b=W/rrf3taxBwnOVVA4LQZhw4IDulip//E1R7Qbnt7AAolS7vEGGZ0EZSplW7R/OdYZnpKZv/6i2RS2agpz2Ned7vK627C7Zpama0GfDf9hbkAt6ZI7S79uSKTI+uniqigCNA3EIJtUpFAd0igXXo8GwtahsqCebOWBmgrReR2Ykk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550908; c=relaxed/simple; bh=kRJzgUNYG78tqYjV4uFsGxS1v+z5PuURjbt6RULZG/Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HkkApGvZTO0GBFNgF1WIBtgS4S+Ab0/y/2JalaNL0AaQTvtcp7pg1sLnP10n24hNeEeiLitDO9rYUBFJISzkE3wvlbdDE7Ras1VFSWbYEHdWwFFnBfDLG1TSU/xHCSWi73/Eza8S9pble8sQYggIqr6Ge5WOavGBIZRj7yHYgy4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VENJkoIN; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VENJkoIN" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6dde65d585bso1390592b3a.0 for ; Mon, 29 Jan 2024 09:55:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706550906; x=1707155706; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=NOwwXIxywhgTYH3p3yq4nMufS2IRUoqO/tf1yKZ7qLU=; b=VENJkoIN49Q9ZKzdOX1XAvMgkE4nH6XNNqlBagT+sJpXW/qoohUJJVUjxCSwiWoZpy fcWPJYdPYPwRnP150KPhSRbNdnyv7BIWT1fWWG7NUVM2JJ2tb2xbaK29tO4Ae0g7IRl0 m7TRpBxUJfLhJMqQsqj0iJxrHiuPZ8ao5wZ0nWF+NcSwSCeKJGMTQGjl6wsmoIySMFWM jgghHCMON330FO9+wjfRTyaoXBdMwqcfVn72l0sE3LdGLVHRaKO89xSjoDB7SVTLUvBL efBJtgurkzsdKncHKwpsUpBwTDXdSpxDwq2s4znuy7VK4OCt6fwbrX5iIoPhlNpXb64G f7+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706550906; x=1707155706; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=NOwwXIxywhgTYH3p3yq4nMufS2IRUoqO/tf1yKZ7qLU=; b=irjQFyLc63pSBSvx71SuAUBhCQqOPssvYE6CTS64kyHqiYt34nwk3pS5fDOBNbpeY0 guaMCwQDLbMVaDmqv02R9z/Z1RkM+XeJetAwlcj10tD6WTjbE50DlyQWyvxwTvchFVao yS76uN8/CgXept8e1Fxp9sedznpnQsHWQUlZTT3lG6bPkHKSrvWkekDPXOUGeNL0/s4/ MOcIMNGJygk6hvh5foCn9aQxYI69Y1yOl5zzrARMBCI1I4QBCCaxGrl59T4MCTsSEhif G6zsxjgfurUrPvEoZg23/i/uKy3MIGAZr5vmp+xEWy7vSI2Y7lCesLUrErP9Ta3E3Oc4 LoKQ== X-Gm-Message-State: AOJu0YwShPOS5nEmng3rjppmkdjhIXkzLIZDT1acsT/nqTBIBj7bGkM7 kmHz9rXXq6mp0JVeCOcEGiaqmqkFl47y2uePkML7NL1aU3vU6NXD X-Google-Smtp-Source: AGHT+IHcsxUcn8preTc9iTpcswRaxjzdS4S22LBey4Xk5lFq3gP76HzVu0IIMkVly7esBhMrp+Y1Kw== X-Received: by 2002:aa7:8a13:0:b0:6db:e14f:3956 with SMTP id m19-20020aa78a13000000b006dbe14f3956mr2218904pfa.20.1706550906055; Mon, 29 Jan 2024 09:55:06 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id h8-20020aa79f48000000b006ddcadb1e2csm6116676pfr.29.2024.01.29.09.55.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Jan 2024 09:55:05 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 1/7] mm/swapfile.c: add back some comment Date: Tue, 30 Jan 2024 01:54:16 +0800 Message-ID: <20240129175423.1987-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240129175423.1987-1-ryncsn@gmail.com> References: <20240129175423.1987-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Some useful comments were dropped in commit b56a2d8af914 ("mm: rid swapoff of quadratic complexity"), add them back. Signed-off-by: Kairui Song --- mm/swapfile.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 0008cd39af42..606d95b56304 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1881,6 +1881,17 @@ static int unuse_pte_range(struct vm_area_struct *vm= a, pmd_t *pmd, folio =3D page_folio(page); } if (!folio) { + /* + * The entry could have been freed, and will not + * be reused since swapoff() already disabled + * allocation from here, or alloc_page() failed. + * + * We don't hold lock here, so the swap entry could be + * SWAP_MAP_BAD (when the cluster is discarding). + * Instead of fail out, We can just skip the swap + * entry because swapoff will wait for discarding + * finish anyway. + */ swp_count =3D READ_ONCE(si->swap_map[offset]); if (swp_count =3D=3D 0 || swp_count =3D=3D SWAP_MAP_BAD) continue; --=20 2.43.0 From nobody Wed Dec 24 03:16:50 2025 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8ED196F06A for ; Mon, 29 Jan 2024 17:55:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550912; cv=none; b=WZsZXT7V8bVToFlAznZrVEpbvr4lGgvdXgwpAYnYumUxzm2EdUWW+B2MGA7jtQf537WdcrLXJZGjw4nAzbmeoA9sIYr8+eQCcBjSPYkzulhg8Lkf59v8RsJvg80bkWCcMSLUNXhFiKhq1Qe7AE4ggbxlLZpcY4xyt/x7rvGYMpg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550912; c=relaxed/simple; bh=ybejsy/GXIw9/hKLCEWjCQvkSuJ7x1aDmtjjWidoh2I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RImAKdc8oTx/r0Yp4WjMUWbUoBvIxs/vOtx++4E7qpRFGGh/hc0+ZKxXoqWv7oJbBZU9DInzCiuaIKeMH2kizwLf0Ly5pXwtOmHaRJUkTaD8qRR7fL/xliWhm/JHn8ey/IPZEswia4VWA0USh1UMpKSmBlvmT9OKSCN+MicBbV4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QS7gb/RH; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QS7gb/RH" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6ddce722576so1365803b3a.1 for ; Mon, 29 Jan 2024 09:55:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706550910; x=1707155710; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=7+Pb9pW7pRGXoZWLwVdMMxRSRMr0QxM87xvWg4BF/x0=; b=QS7gb/RH+8J1fENzDBRjBw/JSsnaXq1AonzZ0B4wcFwu0d4IjbOXwSYwA4qPM0x+L0 kIBB8xgkGB/OQD/pHIkYgHwzepN8Q+OV3FQoBMF3wAy5JZ45ZoHC1nQ49KrAh5w8+LGu gx6H/fc6ZOZBonifwv57Yn+NgBhCc+MV7T+n892zL3x0oegkSEVt2Vij0xdun0HwLwrj A7SA19FGvRAoEVwS8srUSQphcL2dc8rX4H1uBS5ivSYe5pzxEBjd5hQxgiwl8iPBDp5A pOKkT1r1mFK2PdZVENtuQHP+fAzZarq8iowOMRpGAK5Wumh0JAIgqpuaHG6j9/4Hxuq4 bX2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706550910; x=1707155710; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=7+Pb9pW7pRGXoZWLwVdMMxRSRMr0QxM87xvWg4BF/x0=; b=WIhSHRSIDD3aYIYtW1Tnex5Jw8c96jKnn9E5Z8y0SZXVDSS41P4rr14nDtjMpWynh9 A92Gm7rx1Fs3wSxEzpJ58e5lS4/9cvQuR2eezwZ/+dZAnSUIauLVpQvaryD6v7iFrTjE XfXTQtiM880VwWo1pmRpr0vcbcwskz+vffbp8ysgGkaa62DP89iG+maEhxhcsHrWjpC6 bb0jSoYtj1eplLN3k24dnPx2BI/zL/XCDgLpyp9d1+mXzUcnW23BWGc5CiPgOdhcfy79 84OVc22KLb+6VxRIXwk9Twx0NPMPIAMFatn2LJZFwmVNfUS1rhS5HOxbAWEVooTqFcm7 XdVQ== X-Gm-Message-State: AOJu0Yz6/RgxScFeg25kvbUsz6UPPB6FqIbC+bbMS26d69V1uUZecRdN LvmSOAuhjKaupZQ7PxMDSh+4RbYnaBREs4jlN5COGJ6V5fq1UjT/ X-Google-Smtp-Source: AGHT+IGbEgLtee7lvyw4iD4o6YyT7UJepc/uZ6izKXgvLtPBgmrhSNa0jHfm+bBxa5TucYAtg4TF4g== X-Received: by 2002:a05:6a00:3ccf:b0:6da:cb36:6c00 with SMTP id ln15-20020a056a003ccf00b006dacb366c00mr3552820pfb.15.1706550909761; Mon, 29 Jan 2024 09:55:09 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id h8-20020aa79f48000000b006ddcadb1e2csm6116676pfr.29.2024.01.29.09.55.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Jan 2024 09:55:09 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 2/7] mm/swap: move no readahead swapin code to a stand-alone helper Date: Tue, 30 Jan 2024 01:54:17 +0800 Message-ID: <20240129175423.1987-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240129175423.1987-1-ryncsn@gmail.com> References: <20240129175423.1987-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song No feature change, simply move the routine to a standalone function to be re-used later. The error path handling is copied from the "out_page" label, to make the code change minimized for easier reviewing. Signed-off-by: Kairui Song --- mm/memory.c | 32 ++++---------------------------- mm/swap.h | 8 ++++++++ mm/swap_state.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 28 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7e1f4849463a..81dc9d467f4e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3803,7 +3803,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swp_entry_t entry; pte_t pte; vm_fault_t ret =3D 0; - void *shadow =3D NULL; =20 if (!pte_unmap_same(vmf)) goto out; @@ -3867,33 +3866,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) =3D=3D 1) { - /* skip swapcache */ - folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); - page =3D &folio->page; - if (folio) { - __folio_set_locked(folio); - __folio_set_swapbacked(folio); - - if (mem_cgroup_swapin_charge_folio(folio, - vma->vm_mm, GFP_KERNEL, - entry)) { - ret =3D VM_FAULT_OOM; - goto out_page; - } - mem_cgroup_swapin_uncharge_swap(entry); - - shadow =3D get_shadow_from_swap_cache(entry); - if (shadow) - workingset_refault(folio, shadow); - - folio_add_lru(folio); - - /* To provide entry to swap_read_folio() */ - folio->swap =3D entry; - swap_read_folio(folio, true, NULL); - folio->private =3D NULL; - } + /* skip swapcache and readahead */ + folio =3D swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf); + if (folio) + page =3D &folio->page; } else { page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf); diff --git a/mm/swap.h b/mm/swap.h index 758c46ca671e..83eab7b67e77 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -56,6 +56,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, g= fp_t flag, struct mempolicy *mpol, pgoff_t ilx); struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -86,6 +88,12 @@ static inline struct folio *swap_cluster_readahead(swp_e= ntry_t entry, return NULL; } =20 +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf) +{ + return NULL; +} + static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mas= k, struct vm_fault *vmf) { diff --git a/mm/swap_state.c b/mm/swap_state.c index e671266ad772..645f5bcad123 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -861,6 +861,53 @@ static struct folio *swap_vma_readahead(swp_entry_t ta= rg_entry, gfp_t gfp_mask, return folio; } =20 +/** + * swapin_direct - swap in a folio skipping swap cache and readahead + * @entry: swap entry of this memory + * @gfp_mask: memory allocation flags + * @vmf: fault information + * + * Returns the struct folio for entry and addr after the swap entry is read + * in. + */ +struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, + struct vm_fault *vmf) +{ + struct vm_area_struct *vma =3D vmf->vma; + struct folio *folio; + void *shadow =3D NULL; + + /* skip swapcache */ + folio =3D vma_alloc_folio(gfp_mask, 0, + vma, vmf->address, false); + if (folio) { + __folio_set_locked(folio); + __folio_set_swapbacked(folio); + + if (mem_cgroup_swapin_charge_folio(folio, + vma->vm_mm, GFP_KERNEL, + entry)) { + folio_unlock(folio); + folio_put(folio); + return NULL; + } + mem_cgroup_swapin_uncharge_swap(entry); + + shadow =3D get_shadow_from_swap_cache(entry); + if (shadow) + workingset_refault(folio, shadow); + + folio_add_lru(folio); + + /* To provide entry to swap_read_folio() */ + folio->swap =3D entry; + swap_read_folio(folio, true, NULL); + folio->private =3D NULL; + } + + return folio; +} + /** * swapin_readahead - swap in pages in hope we need them soon * @entry: swap entry of this memory --=20 2.43.0 From nobody Wed Dec 24 03:16:50 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E4356F080 for ; Mon, 29 Jan 2024 17:55:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550915; cv=none; b=ZkxeeHBLthKFEZ843DOkchw6amfUFzwu1SYqsB+/LRV3oMa+msYBkbU0Uomb7wvhWqH6XfxS0/aDuhejcDATqfNfQ88gpci4g8BhyeLyYg87qHIBflhWFZ+M4HyRxOo/MVTGzX5LiM/QAomSW0npWIU4taybKAqw687CRBSvegE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550915; c=relaxed/simple; bh=gi+xiGC4W0zt4Y0r8hfjrFyj6k4RkYn5+xN32KMpot8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Sf+lFvPKW2sIkvTlMpOQFjz40Thf/Yi9Gh7Xe46Rdu0ISs+7AFKGkvUGcP6oBJSDv9zIWuOlcXjNcRHrRfDT30ta8yZSek38Es0uw4NElg2ESd8je5FWnNqZ4SiMbb5MK3SSd6GbF1SvFw425SH+2ROrQtV1PJFu76dwqNsatFY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Hg1gEsng; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Hg1gEsng" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6de029f88d8so2204263b3a.3 for ; Mon, 29 Jan 2024 09:55:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706550913; x=1707155713; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=4HCCIzc96WSKi6JLRF97+BAewUvCrXyaCTHCA/v+eQk=; b=Hg1gEsngYLiMRW0UlgeCDbyuqaOl40DHyb/lE8NMItm7E+TDCR6cE41+Mcyfzlwmns +F00ARoocMM56w548EklvutM428qn3rkjxvaK5pP6HlxvR8ECEUw24cSrwa2Aisx5arj WVAjLb8NsEhI2vAxZJG8b+3kuhh6EwL6dnCDKZz2yRsMOe/Zt8xj6cHR3xkJIvaj6Esy sxrONZcbewrQn2cGHiTBe9FH3YDSOI5z1UMON/3/PD94AbCcRgPakyP0AMag9HkJ1Fjc AG9dfGXR2Cmza332qvBOZnZ/o2W1JDmByCxoEwNfQ5FB3z+FdJvYz9bfDziZIUwvS9ge cqQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706550913; x=1707155713; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=4HCCIzc96WSKi6JLRF97+BAewUvCrXyaCTHCA/v+eQk=; b=aWTMlBBoBI4SWCbBF+rrvIdjIfNWZdheAM1FlN4xa3w0wHcoh+PyDn/sElMK/2IcCU 4OW01aj6qfnWPu5sH8LeM6cwiNIFiqjQrHGta88yTgqAyYtzodXvBPXqtxNy8XPInjn8 /8WZLlKcH9UX49RQhqoSV7mzcKb0OfEeTsvriVvUJ7Pup8ti0NaAedd3pOE5QV1u5334 G/ZcaocHFGJMPgOLbBbngwTXG2o4LwumJXcs0fPpKTiCpw5ddXX21Leqcm5SR4ACuHp9 0Rs4f239Oh9Dkl6sg8Zgzs8pTYnPHWn/R2YQM3v7wc1y30HOaU3R9BNmIEyrkknQLqUT Y2Rg== X-Gm-Message-State: AOJu0YzHMlRF4eBpvMB7W9zZ30w+I8tquhxg4jZud5Js+tSiG+HlcumL bRg/HNgsZKqL7uiJ6IEynxSVqbELtvK9J84o2QspueTiPJznmWNRBkVgM6euVVquxg== X-Google-Smtp-Source: AGHT+IFgfEMwLjjlfjYQs4zDPrkINQJQhzD4VfcEfeCzLRYP1ozN1jTtfQN+mykLWdBBe4DQTBUsWg== X-Received: by 2002:a05:6a20:438d:b0:19c:a398:4a67 with SMTP id i13-20020a056a20438d00b0019ca3984a67mr4049800pzl.55.1706550913483; Mon, 29 Jan 2024 09:55:13 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id h8-20020aa79f48000000b006ddcadb1e2csm6116676pfr.29.2024.01.29.09.55.10 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Jan 2024 09:55:12 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 3/7] mm/swap: always account swapped in page into current memcg Date: Tue, 30 Jan 2024 01:54:18 +0800 Message-ID: <20240129175423.1987-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240129175423.1987-1-ryncsn@gmail.com> References: <20240129175423.1987-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently, mem_cgroup_swapin_charge_folio is always called with mm =3D=3D NULL, except in swapin_direct. swapin_direct is only used when swapin should skip readahead and swapcache (SWP_SYNCHRONOUS_IO). All other callers of mem_cgroup_swapin_charge_folio are for swapin that should not skip readahead and cache. This could cause swapin charging to behave differently depending on swap device, which is unexpected. This is currently not happening because the only caller of swapin_direct is the direct anon page fault path, where mm always equals to current->mm, but will no longer be true if swapin_direct is shared and have other callers (eg, swapoff) to share the readahead skipping logic. So make swapin_direct also pass NULL for mm, so swpain charge will behave consistently and not effected by type of swapin device or readahead policy. After this, the second param of mem_cgroup_swapin_charge_folio is never used now, so it can be safely dropped. Signed-off-by: Kairui Song --- include/linux/memcontrol.h | 4 ++-- mm/memcontrol.c | 5 ++--- mm/swap_state.c | 7 +++---- 3 files changed, 7 insertions(+), 9 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 20ff87f8e001..540590d80958 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -693,7 +693,7 @@ static inline int mem_cgroup_charge(struct folio *folio= , struct mm_struct *mm, int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp, long nr_pages); =20 -int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *= mm, +int mem_cgroup_swapin_charge_folio(struct folio *folio, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); =20 @@ -1281,7 +1281,7 @@ static inline int mem_cgroup_hugetlb_try_charge(struc= t mem_cgroup *memcg, } =20 static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, - struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) + gfp_t gfp, swp_entry_t entry) { return 0; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e4c8735e7c85..5852742df958 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7306,8 +7306,7 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *= memcg, gfp_t gfp, * * Returns 0 on success. Otherwise, an error code is returned. */ -int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *= mm, - gfp_t gfp, swp_entry_t entry) +int mem_cgroup_swapin_charge_folio(struct folio *folio, gfp_t gfp, swp_ent= ry_t entry) { struct mem_cgroup *memcg; unsigned short id; @@ -7320,7 +7319,7 @@ int mem_cgroup_swapin_charge_folio(struct folio *foli= o, struct mm_struct *mm, rcu_read_lock(); memcg =3D mem_cgroup_from_id(id); if (!memcg || !css_tryget_online(&memcg->css)) - memcg =3D get_mem_cgroup_from_mm(mm); + memcg =3D get_mem_cgroup_from_current(); rcu_read_unlock(); =20 ret =3D charge_memcg(folio, memcg, gfp); diff --git a/mm/swap_state.c b/mm/swap_state.c index 645f5bcad123..a450d09fc0db 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -495,7 +495,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry= , gfp_t gfp_mask, __folio_set_locked(folio); __folio_set_swapbacked(folio); =20 - if (mem_cgroup_swapin_charge_folio(folio, NULL, gfp_mask, entry)) + if (mem_cgroup_swapin_charge_folio(folio, gfp_mask, entry)) goto fail_unlock; =20 /* May fail (-ENOMEM) if XArray node allocation failed. */ @@ -884,9 +884,8 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gf= p_mask, __folio_set_locked(folio); __folio_set_swapbacked(folio); =20 - if (mem_cgroup_swapin_charge_folio(folio, - vma->vm_mm, GFP_KERNEL, - entry)) { + if (mem_cgroup_swapin_charge_folio(folio, GFP_KERNEL, + entry)) { folio_unlock(folio); folio_put(folio); return NULL; --=20 2.43.0 From nobody Wed Dec 24 03:16:50 2025 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D83A6F09E for ; Mon, 29 Jan 2024 17:55:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550919; cv=none; b=E42tgCK2P+obr0sA1jBKE7HLH9eX7PNES5TH8wkcPXbpJaiaSCFJ6m/oYVCRpxBM6IqT62aJ06vrbsGG+vy8xgaRv/Z/SaPP1dOpsrr01EwzoW3Iu9X/PodZ5Os3K5EGILcY3gy0uy9PMNcO8VxEIgjW6TiLyPH4UNZJPeu1sBQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550919; c=relaxed/simple; bh=Pi+pXUZ+yCZ7h2m4QYK94D2QN9IBmxdngdrHeNLfSNE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EvY8UQCWHRAgFv5rqVSSi0WkAj9CSiusbOFPaj0qZm91hhr2yz+P6e5rJk64zgRVQ7xSbMLXsyd5DwsVwKvhWNLIV15er/ZZizZZddx6Emq33HkyyhzLcU8/vpAeVSFAzH9y4iWKo0PZzw1TSPeGiabtoPEfEoAXLRlZtjxC64E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=f9rOb35P; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="f9rOb35P" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6dddc5e34e2so2065588b3a.0 for ; Mon, 29 Jan 2024 09:55:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706550918; x=1707155718; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=ZEaJsg6FaNgmWF+uyO/aaFj/erl/Uc0eXE/63t3Htcg=; b=f9rOb35PLFCS9TLYFERee38VkB6L7BI5apOztBfhGbBCGCN7aqqD38+As8+tQB+3jI jX+PaA+HElDnuPD5F+aEbe7Y5+yHrgSuj1uf1y/UP2/TJZs+LYUAGkQLtUwTJHgXXoCJ GXnXpNiejx5yKtdmExvXY9AhFUim4ZYBJxrR7PEyUxNgzCkDvfIYP5m80VvXt643l63u MX7+JM9NEdRBG9Pql04yW93N0P73zb0Btpi56qFLheYAlmcLRVCHM6Gq3fXggg90fRwH uKOVP9u3mi3UqN+YV9HEaJ1MpMBJA+TPhgp6N/wpw+ZNVCsSNa0TTPH6vaMjKNY0CR+B H+yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706550918; x=1707155718; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ZEaJsg6FaNgmWF+uyO/aaFj/erl/Uc0eXE/63t3Htcg=; b=iJ7fCbhuKdXGUrX8bcMO5776a6GaCfHOEgGLvTabeDPlm/+V99U/Ehb/cr+MlaiWND IwlGmwWqLoRxx1oI5IoYaFA29I/SUmB5tVRanXvFs+k8DG9iBnDJapQLTjDxQ7HXxsf8 Qa3I5nGG447bumgP9vmBQRPBzzWMa+FpmZZwoS+XVJkgtZQvmv89IYcVtLaPRLHK+7T/ /N9HXp49U2P+8U2rDMkryspTZpoJAAsA5cvNpASzQhTtbtkVZt4kTNR4D29adaSHi+uP xGKJATdtdbbZ4nYqlautkkTtuhW4rtA7rHVR91ug1GOzXQP1U2oiuIupdPLuHfb/qL31 pYVQ== X-Gm-Message-State: AOJu0Yxxumt41S9QGpdgTIP87uUqHR1WS8wMwJqt7wZLmHmef7e08Bid GUjSi+AMrd4WtOXQ/XWnxytqs3xjm75PFqoCqpvKv2bkEkTE83bd X-Google-Smtp-Source: AGHT+IHYv2LHBpglkk4ViPSn6YlurzgUjDbn2zS3N+K1h38tw1crAQqtxAPY8gv9wn+50sarKhsTLA== X-Received: by 2002:a05:6a00:2d96:b0:6de:2470:92e1 with SMTP id fb22-20020a056a002d9600b006de247092e1mr3531769pfb.15.1706550917544; Mon, 29 Jan 2024 09:55:17 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id h8-20020aa79f48000000b006ddcadb1e2csm6116676pfr.29.2024.01.29.09.55.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Jan 2024 09:55:16 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 4/7] mm/swap: introduce swapin_entry for unified readahead policy Date: Tue, 30 Jan 2024 01:54:19 +0800 Message-ID: <20240129175423.1987-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240129175423.1987-1-ryncsn@gmail.com> References: <20240129175423.1987-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Introduce swapin_entry which merges swapin_readahead and swapin_direct making it the main entry for swapin pages, and use a unified swapin readahead policy. This commit makes swapoff make use of this new helper and skip readahead for SYNCHRONOUS_IO device since it's not helpful here. Now swapping off a 10G ZRAM (lzo-rle) after same workload is faster since readahead is skipped and overhead is reduced. Before: time swapoff /dev/zram0 real 0m12.337s user 0m0.001s sys 0m12.329s After: time swapoff /dev/zram0 real 0m9.728s user 0m0.001s sys 0m9.719s Signed-off-by: Kairui Song Reviewed-by: "Huang, Ying" --- mm/memory.c | 18 +++--------------- mm/swap.h | 16 ++++------------ mm/swap_state.c | 40 ++++++++++++++++++++++++---------------- mm/swapfile.c | 7 ++----- 4 files changed, 33 insertions(+), 48 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 81dc9d467f4e..8711f8a07039 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3864,20 +3864,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swapcache =3D folio; =20 if (!folio) { - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && - __swap_count(entry) =3D=3D 1) { - /* skip swapcache and readahead */ - folio =3D swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf); - if (folio) - page =3D &folio->page; - } else { - page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, - vmf); - if (page) - folio =3D page_folio(page); - swapcache =3D folio; - } - + folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, + vmf, &swapcache); if (!folio) { /* * Back out if somebody else faulted in this pte @@ -3890,11 +3878,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret =3D VM_FAULT_OOM; goto unlock; } - /* Had to read the page from swap area: Major fault */ ret =3D VM_FAULT_MAJOR; count_vm_event(PGMAJFAULT); count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); + page =3D folio_file_page(folio, swp_offset(entry)); } else if (PageHWPoison(page)) { /* * hwpoisoned dirty swapcache pages are kept for killing diff --git a/mm/swap.h b/mm/swap.h index 83eab7b67e77..8f8185d3865c 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -54,10 +54,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry,= gfp_t gfp_flags, bool skip_if_exists); struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct mempolicy *mpol, pgoff_t ilx); -struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); +struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf, struct folio **swapcached); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -88,14 +86,8 @@ static inline struct folio *swap_cluster_readahead(swp_e= ntry_t entry, return NULL; } =20 -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf) -{ - return NULL; -} - -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mas= k, - struct vm_fault *vmf) +static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask, + struct vm_fault *vmf, struct folio **swapcached) { return NULL; } diff --git a/mm/swap_state.c b/mm/swap_state.c index a450d09fc0db..5e06b2e140d4 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -870,8 +870,8 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, * Returns the struct folio for entry and addr after the swap entry is read * in. */ -struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) +static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, + struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; struct folio *folio; @@ -908,33 +908,41 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t = gfp_mask, } =20 /** - * swapin_readahead - swap in pages in hope we need them soon + * swapin_entry - swap in a folio from swap entry * @entry: swap entry of this memory * @gfp_mask: memory allocation flags * @vmf: fault information + * @swapcache: set to the swapcache folio if swapcache is used * * Returns the struct page for entry and addr, after queueing swapin. * - * It's a main entry function for swap readahead. By the configuration, + * It's the main entry function for swap in. By the configuration, * it will read ahead blocks by cluster-based(ie, physical disk based) - * or vma-based(ie, virtual address based on faulty address) readahead. + * or vma-based(ie, virtual address based on faulty address) readahead, + * or skip the readahead(ie, ramdisk based swap device). */ -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) +struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, + struct vm_fault *vmf, struct folio **swapcache) { struct mempolicy *mpol; - pgoff_t ilx; struct folio *folio; + pgoff_t ilx; =20 - mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); - folio =3D swap_use_vma_readahead() ? - swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf) : - swap_cluster_readahead(entry, gfp_mask, mpol, ilx); - mpol_cond_put(mpol); + if (data_race(swp_swap_info(entry)->flags & SWP_SYNCHRONOUS_IO) && + __swap_count(entry) =3D=3D 1) { + folio =3D swapin_direct(entry, gfp_mask, vmf); + } else { + mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); + if (swap_use_vma_readahead()) + folio =3D swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); + else + folio =3D swap_cluster_readahead(entry, gfp_mask, mpol, ilx); + mpol_cond_put(mpol); + if (swapcache) + *swapcache =3D folio; + } =20 - if (!folio) - return NULL; - return folio_file_page(folio, swp_offset(entry)); + return folio; } =20 #ifdef CONFIG_SYSFS diff --git a/mm/swapfile.c b/mm/swapfile.c index 606d95b56304..1cf7e72e19e3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1867,7 +1867,6 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, =20 folio =3D swap_cache_get_folio(entry, vma, addr); if (!folio) { - struct page *page; struct vm_fault vmf =3D { .vma =3D vma, .address =3D addr, @@ -1875,10 +1874,8 @@ static int unuse_pte_range(struct vm_area_struct *vm= a, pmd_t *pmd, .pmd =3D pmd, }; =20 - page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, - &vmf); - if (page) - folio =3D page_folio(page); + folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, + &vmf, NULL); } if (!folio) { /* --=20 2.43.0 From nobody Wed Dec 24 03:16:50 2025 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C5DA7602D for ; Mon, 29 Jan 2024 17:55:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550924; cv=none; b=CdFvrhq1NmRufxDRlT0pEbUmdtpEBldMIQABFQUCRfzJB00FPx/q5uiXp7Lk+6C5gomcfpr8+qG2gQ67ThNepf/7ZxHYrvxn+DhDtN8MTvJkjzEV/cUpY1FV1Kg15p05u3RyZYw/ebXOCZqODhuKjF+jDnTMfzsRGcm2lk1UEdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550924; c=relaxed/simple; bh=mdzgPDJES3q5PRzJ9kl+n1xwdr8hP3wcxpGiZK8b3Cs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T+B8w379mtdYj7xKeWo0CFaR2/FC62dnn3yvdSeO9YMAtYB7BUk/BDXJIpz5rpBM4T6FWLQiIK/fYvqLY1KVtcFOYxpTa3o2lTkp8m2te9jl/5lsAqsJIjalLF6l71J+si7MIwaaXwgRzcolGchwYmX5pNn47ye5GhjAPK60KZM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TZIBEBZi; arc=none smtp.client-ip=209.85.167.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TZIBEBZi" Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3bd5c4cffefso3559276b6e.1 for ; Mon, 29 Jan 2024 09:55:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706550921; x=1707155721; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=Y3NzQYMmpvHTXBLoyc702W5Zl5rwxoKujBWnFPZfTzI=; b=TZIBEBZi7mI1/SlQXSCDtaDNPd/s21Fu43WUtynVKvgOm/TVCfsyVumjya8EQyBG+5 kjLlAj5jSZFXWfNQ80F2+GpdxFlRSP7DcTKWA6Q5rMn+IbZ28OIwdhU/omalObNX+kd2 2gFHUgIOINttkHdrdjhyL/cd31/UGQo6Fh139lrhi0zhnhc4I7RzwQMAeg78nnryOBEc Shw+vuHTH8DAEN+Ya5NkYTFx7ePQ6AYSFwBfM8HBAc6VIZU/Ar7SziFMoNI55Qe6RCiT cOJ84kyzpuSG2vDqjHsF/kbY0S35QQ6rKiQtuo9ggJSgs/C/VIN++4421v5eyyk/Mc2L dNQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706550921; x=1707155721; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Y3NzQYMmpvHTXBLoyc702W5Zl5rwxoKujBWnFPZfTzI=; b=s5X0Ql4yEpasHi3B/sTF8KmpkUGFinUcwHnAmEmgK9Nnyg/92fVp+c+OGkmSBeqdmV 6jxCZMOveZr6vci2eJUy7btzxNjooCbDbA4Gj7RYnDLqpDGxy0opz1Yh4uZwmGLkdxCK /xPmyM6PhuTehhtDg/hFBklt8VHl8zbiw1IB/r8FKpWKU3mzNeKYafXQisPu09ApHQIv ZMMVKsuK6CuBiv1mAKFtOzlogk7/utP57zAhMF1gJ4Yp99C/zHY4o2V5/qM1QjcGA0Qs TvMRLoatRFrN6IT0elmW+oxqOX5XTaoLipmEyStOCmjXX3/lyG0JDoQNEdkTehBCniWW SQsA== X-Gm-Message-State: AOJu0Yx6io+5R7tuTccLP3FJy+Q8g43zoOX9/1dytCaOWrOogk5T3uxP i0NkchL7HngS4BdFfCRf4un4olm+ANTivUE6Vy7H5E0lMCLbtYTwN/lBOborfJR3hQ== X-Google-Smtp-Source: AGHT+IHKBzL9b+ijQ48IAq8a2mBY5ABGwlr8jPlejfJOdScwkjPjpxglTY+jaaywvyWnKFqMNfV0Xg== X-Received: by 2002:a05:6808:13cf:b0:3be:5998:2cd4 with SMTP id d15-20020a05680813cf00b003be59982cd4mr2884547oiw.54.1706550921256; Mon, 29 Jan 2024 09:55:21 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id h8-20020aa79f48000000b006ddcadb1e2csm6116676pfr.29.2024.01.29.09.55.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Jan 2024 09:55:20 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 5/7] mm/swap: avoid a duplicated swap cache lookup for SWP_SYNCHRONOUS_IO Date: Tue, 30 Jan 2024 01:54:20 +0800 Message-ID: <20240129175423.1987-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240129175423.1987-1-ryncsn@gmail.com> References: <20240129175423.1987-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song When a xa_value is returned by the cache lookup, keep it to be used later for workingset refault check instead of doing the looking up again in swapin_no_readahead. Shadow look up and workingset check is skipped for swapoff to reduce overhead, workingset checking for anon pages upon swapoff is not helpful, simply consider all pages as inactive make more sense since swapoff doesn't mean pages is being accessed. After this commit, swappin is about 4% faster for ZRAM, micro benchmark result which use madvise to swap out 10G zero-filled data to ZRAM then read them in: Before: 11143285 us After: 10692644 us (+4.1%) Signed-off-by: Kairui Song Reviewed-by: "Huang, Ying" --- mm/memory.c | 5 +++-- mm/shmem.c | 2 +- mm/swap.h | 11 ++++++----- mm/swap_state.c | 23 +++++++++++++---------- mm/swapfile.c | 4 ++-- 5 files changed, 25 insertions(+), 20 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 8711f8a07039..349946899f8d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3800,6 +3800,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) struct swap_info_struct *si =3D NULL; rmap_t rmap_flags =3D RMAP_NONE; bool exclusive =3D false; + void *shadow =3D NULL; swp_entry_t entry; pte_t pte; vm_fault_t ret =3D 0; @@ -3858,14 +3859,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (unlikely(!si)) goto out; =20 - folio =3D swap_cache_get_folio(entry, vma, vmf->address); + folio =3D swap_cache_get_folio(entry, vma, vmf->address, &shadow); if (folio) page =3D folio_file_page(folio, swp_offset(entry)); swapcache =3D folio; =20 if (!folio) { folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, - vmf, &swapcache); + vmf, &swapcache, shadow); if (!folio) { /* * Back out if somebody else faulted in this pte diff --git a/mm/shmem.c b/mm/shmem.c index d7c84ff62186..698a31bf7baa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1873,7 +1873,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, } =20 /* Look it up and read it in.. */ - folio =3D swap_cache_get_folio(swap, NULL, 0); + folio =3D swap_cache_get_folio(swap, NULL, 0, NULL); if (!folio) { /* Or update major stats only when swapin succeeds?? */ if (fault_type) { diff --git a/mm/swap.h b/mm/swap.h index 8f8185d3865c..ca9cb472a263 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -42,7 +42,8 @@ void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr); + struct vm_area_struct *vma, unsigned long addr, + void **shadowp); struct folio *filemap_get_incore_folio(struct address_space *mapping, pgoff_t index); =20 @@ -54,8 +55,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, = gfp_t gfp_flags, bool skip_if_exists); struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct mempolicy *mpol, pgoff_t ilx); -struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf, struct folio **swapcached); +struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, struct vm_fault = *vmf, + struct folio **swapcached, void *shadow); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -87,7 +88,7 @@ static inline struct folio *swap_cluster_readahead(swp_en= try_t entry, } =20 static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf, struct folio **swapcached) + struct vm_fault *vmf, struct folio **swapcached, void *shadow) { return NULL; } @@ -98,7 +99,7 @@ static inline int swap_writepage(struct page *p, struct w= riteback_control *wbc) } =20 static inline struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr) + struct vm_area_struct *vma, unsigned long addr, void **shadowp) { return NULL; } diff --git a/mm/swap_state.c b/mm/swap_state.c index 5e06b2e140d4..e41a137a6123 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -330,12 +330,18 @@ static inline bool swap_use_vma_readahead(void) * Caller must lock the swap device or hold a reference to keep it valid. */ struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr) + struct vm_area_struct *vma, unsigned long addr, void **shadowp) { struct folio *folio; =20 - folio =3D filemap_get_folio(swap_address_space(entry), swp_offset(entry)); - if (!IS_ERR(folio)) { + folio =3D filemap_get_entry(swap_address_space(entry), swp_offset(entry)); + if (xa_is_value(folio)) { + if (shadowp) + *shadowp =3D folio; + return NULL; + } + + if (folio) { bool vma_ra =3D swap_use_vma_readahead(); bool readahead; =20 @@ -365,8 +371,6 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, if (!vma || !vma_ra) atomic_inc(&swapin_readahead_hits); } - } else { - folio =3D NULL; } =20 return folio; @@ -866,16 +870,16 @@ static struct folio *swap_vma_readahead(swp_entry_t t= arg_entry, gfp_t gfp_mask, * @entry: swap entry of this memory * @gfp_mask: memory allocation flags * @vmf: fault information + * @shadow: workingset shadow corresponding to entry * * Returns the struct folio for entry and addr after the swap entry is read * in. */ static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) + struct vm_fault *vmf, void *shadow) { struct vm_area_struct *vma =3D vmf->vma; struct folio *folio; - void *shadow =3D NULL; =20 /* skip swapcache */ folio =3D vma_alloc_folio(gfp_mask, 0, @@ -892,7 +896,6 @@ static struct folio *swapin_direct(swp_entry_t entry, g= fp_t gfp_mask, } mem_cgroup_swapin_uncharge_swap(entry); =20 - shadow =3D get_shadow_from_swap_cache(entry); if (shadow) workingset_refault(folio, shadow); =20 @@ -922,7 +925,7 @@ static struct folio *swapin_direct(swp_entry_t entry, g= fp_t gfp_mask, * or skip the readahead(ie, ramdisk based swap device). */ struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf, struct folio **swapcache) + struct vm_fault *vmf, struct folio **swapcache, void *shadow) { struct mempolicy *mpol; struct folio *folio; @@ -930,7 +933,7 @@ struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp= _mask, =20 if (data_race(swp_swap_info(entry)->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) =3D=3D 1) { - folio =3D swapin_direct(entry, gfp_mask, vmf); + folio =3D swapin_direct(entry, gfp_mask, vmf, shadow); } else { mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); if (swap_use_vma_readahead()) diff --git a/mm/swapfile.c b/mm/swapfile.c index 1cf7e72e19e3..aac26f5a6cec 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1865,7 +1865,7 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, pte_unmap(pte); pte =3D NULL; =20 - folio =3D swap_cache_get_folio(entry, vma, addr); + folio =3D swap_cache_get_folio(entry, vma, addr, NULL); if (!folio) { struct vm_fault vmf =3D { .vma =3D vma, @@ -1875,7 +1875,7 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, }; =20 folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, - &vmf, NULL); + &vmf, NULL, NULL); } if (!folio) { /* --=20 2.43.0 From nobody Wed Dec 24 03:16:50 2025 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7A807603E for ; Mon, 29 Jan 2024 17:55:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550927; cv=none; b=mwv+wnUtuQv8HmgOg+m1Az5KzfqC8l9WJgtdD3gwaxKf4UNFchZqsaG1oZJoi79JMAAP3ET/8bY+iMzKcU6iVA369bRYyn7b8ppWN2pLgynGdAlif7zTtseIc1PrFtBO66fJnZ22gOfbvaFCbGJJ8g80HG1YzhjXB3pOAyX0plM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550927; c=relaxed/simple; bh=tjHlLFIJDmU300Vj/ROonoglxJFuadD0P5tPsb2UWkk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pBNBnZLCxwIPJ4by6f2N1cogVTIWFUfeCE9mU6xOxoCUnPV47JvwJaSzLeFrl7TWC9fURsXC8UqsqI4GrnOrzcq6LuzpiKY4ikCl2QpaMkvHW2qRc0T0No3CyVX7bchbHWfrDJI+rLf6KrD8FYXbvrOrk3BNXGa5UAyGEWYDkng= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=edbcJj8w; arc=none smtp.client-ip=209.85.167.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="edbcJj8w" Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3bd6581bc66so2516433b6e.1 for ; Mon, 29 Jan 2024 09:55:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706550925; x=1707155725; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=vG41zT4S0no8o7fWFQ9pcIk63IVt4FqG1CDDNkW0Zi0=; b=edbcJj8wLCnVlPM0jiIa5RNAuRSa4VHnxBD2FxDfztRed4znWAl3T70bPZlAeGIOaK Zm+Oe6SzPxF0gS8r5Q7To/ZkI18wWHVKb9eX3i3FiAbt1VnasVbuv46xygY8Ol3wkjI1 QqwUjcVa/WqCeTRYGF0mN2tHcLGcK+/MbreopW340qMrefjUXcWujfBlMiVZBT5I0z+5 Rpy3ZgsQf0TZ6tXaS/VszWlWrBgFQKUzsqyj/wRpcTB6OhVh1m0QLW+tq6M4DrcAuvFD L7BNMgODKSKFg6hYAIPgdH3qduQIggCqKXQ/Iilf9fGrXtLavF9kAR7FZCGmYmWXU/GH eEQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706550925; x=1707155725; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vG41zT4S0no8o7fWFQ9pcIk63IVt4FqG1CDDNkW0Zi0=; b=ZjYPRY+O3SmvPc5SgTyg+hUDMP9ZXUUDQeECRCDwwaznhRXeBI9BDb2rqf2UnXN4/i W32NLuc4QhW5poLO3Y9EysYI/g5pZB/ItNfJ52Eft9carkVmA/9vT72SuOYRNMcLTFhi UJSAMfx0Q5AWJShb+lDWBUG67SLc94No4ILRuQ/bMGJgo+P2Z0JMpF3pp/2oA51JN3kF KUllURTuU5mjzlkifarUkgUgafJshfaA6Wip9puUaW6r/IjUVhcL2sUbme3ZuFnJ0FMq acgMcaBHAQxBSrrLq6EroH9yfl+g/ASjtML6mxRYz55gsIm+zTA++Sl5oqKyMfP2qsQQ yOEA== X-Gm-Message-State: AOJu0YwvIY+w1Zudr3K1rceaQSUi6UhxBRJwU0REvY07UepzzXVbAG2/ aWyNozeQt9Go5IEkJ5DKNWcCCuu2bqx+Tmpt+cpjQf5mYserCwAp X-Google-Smtp-Source: AGHT+IESw8A9HzVVYux2+WCOfmmBri6bmvTIsnqEfHR4nt5s9rQPNkemhyM/5eb8SN0HT9K9hl0z2w== X-Received: by 2002:a05:6808:23ca:b0:3be:453d:e061 with SMTP id bq10-20020a05680823ca00b003be453de061mr4478694oib.6.1706550924927; Mon, 29 Jan 2024 09:55:24 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id h8-20020aa79f48000000b006ddcadb1e2csm6116676pfr.29.2024.01.29.09.55.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Jan 2024 09:55:24 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 6/7] mm/swap, shmem: use unified swapin helper for shmem Date: Tue, 30 Jan 2024 01:54:21 +0800 Message-ID: <20240129175423.1987-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240129175423.1987-1-ryncsn@gmail.com> References: <20240129175423.1987-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently, shmem uses cluster readahead for all swap backends. Cluster readahead is not a good solution for ramdisk based device (ZRAM), and it's better to skip it. After switching to the new helper, most benchmarks showed a good result: - Single file sequence read on ramdisk: perf stat --repeat 20 dd if=3D/tmpfs/test of=3D/dev/null bs=3D1M count=3D= 8192 (/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit) Before: 22.248 +- 0.549 After: 22.021 +- 0.684 (-1.1%) - shmem FIO test 1 on a Ryzen 5900HX: fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/tmpfs --size=3D960m \ --ioengine=3Dmmap --rw=3Drandread --random_distribution=3Dzipf:0.5 \ --time_based --ramp_time=3D1m --runtime=3D5m --group_reporting (using brd as swap, 2G memcg limit) Before: bw ( MiB/s): min=3D 1167, max=3D 1732, per=3D100.00%, avg=3D1460.82, std= ev=3D 4.38, samples=3D9536 iops : min=3D298938, max=3D443557, avg=3D373964.41, stdev=3D1121.2= 7, samples=3D9536 After (+3.5%): bw ( MiB/s): min=3D 1285, max=3D 1738, per=3D100.00%, avg=3D1512.88, std= ev=3D 4.34, samples=3D9456 iops : min=3D328957, max=3D445105, avg=3D387294.21, stdev=3D1111.1= 5, samples=3D9456 - shmem FIO test 2 on a Ryzen 5900HX: fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/tmpfs --size=3D960m \ --ioengine=3Dmmap --rw=3Drandread --random_distribution=3Dzipf:1.2 \ --time_based --ramp_time=3D1m --runtime=3D5m --group_reporting (using brd as swap, 2G memcg limit) Before: bw ( MiB/s): min=3D 5296, max=3D 7112, per=3D100.00%, avg=3D6131.93, std= ev=3D17.09, samples=3D9536 iops : min=3D1355934, max=3D1820833, avg=3D1569769.11, stdev=3D437= 5.93, samples=3D9536 After (+3.1%): bw ( MiB/s): min=3D 5466, max=3D 7173, per=3D100.00%, avg=3D6324.51, std= ev=3D16.66, samples=3D9521 iops : min=3D1399355, max=3D1836435, avg=3D1619068.90, stdev=3D426= 3.94, samples=3D9521 So cluster readahead doesn't help much even for single sequence read, and for random stress tests, the performance is better without it. Considering both memory and swap devices will get more fragmented slowly, and commonly used ZRAM consumes much more CPU than plain ramdisk, false readahead could occur more frequently and waste more CPU. Direct SWAP is cheaper, so use the new helper and skip read ahead for SWP_SYNCHRONOUS_IO device. Signed-off-by: Kairui Song --- mm/memory.c | 2 +- mm/shmem.c | 50 +++++++++++++++++++++++++++++++---------------- mm/swap.h | 14 ++++--------- mm/swap_state.c | 52 +++++++++++++++++++++++++++++++++---------------- mm/swapfile.c | 2 +- 5 files changed, 74 insertions(+), 46 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 349946899f8d..51962126a79c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3866,7 +3866,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) =20 if (!folio) { folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, - vmf, &swapcache, shadow); + vmf, NULL, 0, &swapcache, shadow); if (!folio) { /* * Back out if somebody else faulted in this pte diff --git a/mm/shmem.c b/mm/shmem.c index 698a31bf7baa..d3722e25cb32 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1565,15 +1565,16 @@ static inline struct mempolicy *shmem_get_sbmpol(st= ruct shmem_sb_info *sbinfo) static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_info *i= nfo, pgoff_t index, unsigned int order, pgoff_t *ilx); =20 -static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp, - struct shmem_inode_info *info, pgoff_t index) +static struct folio *shmem_swapin(swp_entry_t swap, gfp_t gfp, + struct shmem_inode_info *info, pgoff_t index, + struct folio **swapcache, void *shadow) { struct mempolicy *mpol; pgoff_t ilx; struct folio *folio; =20 mpol =3D shmem_get_pgoff_policy(info, index, 0, &ilx); - folio =3D swap_cluster_readahead(swap, gfp, mpol, ilx); + folio =3D swapin_entry(swap, gfp, NULL, mpol, ilx, swapcache, shadow); mpol_cond_put(mpol); =20 return folio; @@ -1852,8 +1853,9 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, { struct address_space *mapping =3D inode->i_mapping; struct shmem_inode_info *info =3D SHMEM_I(inode); + struct folio *swapcache =3D NULL, *folio; struct swap_info_struct *si; - struct folio *folio =3D NULL; + void *shadow =3D NULL; swp_entry_t swap; int error; =20 @@ -1873,8 +1875,10 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, } =20 /* Look it up and read it in.. */ - folio =3D swap_cache_get_folio(swap, NULL, 0, NULL); - if (!folio) { + folio =3D swap_cache_get_folio(swap, NULL, 0, &shadow); + if (folio) { + swapcache =3D folio; + } else { /* Or update major stats only when swapin succeeds?? */ if (fault_type) { *fault_type |=3D VM_FAULT_MAJOR; @@ -1882,7 +1886,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, count_memcg_event_mm(fault_mm, PGMAJFAULT); } /* Here we actually start the io */ - folio =3D shmem_swapin_cluster(swap, gfp, info, index); + folio =3D shmem_swapin(swap, gfp, info, index, &swapcache, shadow); if (!folio) { error =3D -ENOMEM; goto failed; @@ -1891,17 +1895,21 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, =20 /* We have to do this with folio locked to prevent races */ folio_lock(folio); - if (!folio_test_swapcache(folio) || - folio->swap.val !=3D swap.val || - !shmem_confirm_swap(mapping, index, swap)) { + if (swapcache) { + if (!folio_test_swapcache(folio) || folio->swap.val !=3D swap.val) { + error =3D -EEXIST; + goto unlock; + } + if (!folio_test_uptodate(folio)) { + error =3D -EIO; + goto failed; + } + folio_wait_writeback(folio); + } + if (!shmem_confirm_swap(mapping, index, swap)) { error =3D -EEXIST; goto unlock; } - if (!folio_test_uptodate(folio)) { - error =3D -EIO; - goto failed; - } - folio_wait_writeback(folio); =20 /* * Some architectures may have to restore extra metadata to the @@ -1909,12 +1917,19 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, */ arch_swap_restore(swap, folio); =20 - if (shmem_should_replace_folio(folio, gfp)) { + /* If swapcache is bypassed, folio is newly allocated respects gfp flags = */ + if (swapcache && shmem_should_replace_folio(folio, gfp)) { error =3D shmem_replace_folio(&folio, gfp, info, index); if (error) goto failed; } =20 + /* + * The expected value checking below should be enough to ensure + * only one up-to-date swapin success. swap_free() is called after + * this, so the entry can't be reused. As long as the mapping still + * has the old entry value, it's never swapped in or modified. + */ error =3D shmem_add_to_page_cache(folio, mapping, index, swp_to_radix_entry(swap), gfp); if (error) @@ -1925,7 +1940,8 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, if (sgp =3D=3D SGP_WRITE) folio_mark_accessed(folio); =20 - delete_from_swap_cache(folio); + if (swapcache) + delete_from_swap_cache(folio); folio_mark_dirty(folio); swap_free(swap); put_swap_device(si); diff --git a/mm/swap.h b/mm/swap.h index ca9cb472a263..597a56c7fb02 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -53,10 +53,9 @@ struct folio *read_swap_cache_async(swp_entry_t entry, g= fp_t gfp_mask, struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags, struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, bool skip_if_exists); -struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct mempolicy *mpol, pgoff_t ilx); struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, struct vm_fault = *vmf, - struct folio **swapcached, void *shadow); + struct mempolicy *mpol, pgoff_t ilx, + struct folio **swapcache, void *shadow); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -81,14 +80,9 @@ static inline void show_swap_cache_info(void) { } =20 -static inline struct folio *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx) -{ - return NULL; -} - static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf, struct folio **swapcached, void *shadow) + struct vm_fault *vmf, struct mempolicy *mpol, pgoff_t ilx, + struct folio *swapcache, void *shadow); { return NULL; } diff --git a/mm/swap_state.c b/mm/swap_state.c index e41a137a6123..20c206149be4 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -316,6 +316,18 @@ void free_pages_and_swap_cache(struct encoded_page **p= ages, int nr) release_pages(pages, nr); } =20 +static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_= entry_t entry) +{ + int count; + + if (!data_race(si->flags & SWP_SYNCHRONOUS_IO)) + return false; + + count =3D __swap_count(entry); + + return (count =3D=3D 1 || count =3D=3D SWAP_MAP_SHMEM); +} + static inline bool swap_use_vma_readahead(void) { return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); @@ -635,8 +647,8 @@ static unsigned long swapin_nr_pages(unsigned long offs= et) * are used for every page of the readahead: neighbouring pages on swap * are fairly likely to have been swapped out from the same node. */ -struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, - struct mempolicy *mpol, pgoff_t ilx) +static struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_m= ask, + struct mempolicy *mpol, pgoff_t ilx) { struct folio *folio; unsigned long entry_offset =3D swp_offset(entry); @@ -876,14 +888,13 @@ static struct folio *swap_vma_readahead(swp_entry_t t= arg_entry, gfp_t gfp_mask, * in. */ static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf, void *shadow) + struct mempolicy *mpol, pgoff_t ilx, + void *shadow) { - struct vm_area_struct *vma =3D vmf->vma; struct folio *folio; =20 - /* skip swapcache */ - folio =3D vma_alloc_folio(gfp_mask, 0, - vma, vmf->address, false); + folio =3D (struct folio *)alloc_pages_mpol(gfp_mask, 0, + mpol, ilx, numa_node_id()); if (folio) { __folio_set_locked(folio); __folio_set_swapbacked(folio); @@ -916,6 +927,10 @@ static struct folio *swapin_direct(swp_entry_t entry, = gfp_t gfp_mask, * @gfp_mask: memory allocation flags * @vmf: fault information * @swapcache: set to the swapcache folio if swapcache is used + * @mpol: NUMA memory alloc policy to be applied, + * not needed if vmf is not NULL + * @targ_ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE, + * not needed if vmf is not NULL * * Returns the struct page for entry and addr, after queueing swapin. * @@ -924,26 +939,29 @@ static struct folio *swapin_direct(swp_entry_t entry,= gfp_t gfp_mask, * or vma-based(ie, virtual address based on faulty address) readahead, * or skip the readahead(ie, ramdisk based swap device). */ -struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf, struct folio **swapcache, void *shadow) +struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, struct vm_fa= ult *vmf, + struct mempolicy *mpol, pgoff_t ilx, + struct folio **swapcache, void *shadow) { - struct mempolicy *mpol; + bool mpol_put =3D false; struct folio *folio; - pgoff_t ilx; =20 - if (data_race(swp_swap_info(entry)->flags & SWP_SYNCHRONOUS_IO) && - __swap_count(entry) =3D=3D 1) { - folio =3D swapin_direct(entry, gfp_mask, vmf, shadow); - } else { + if (!mpol) { mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); - if (swap_use_vma_readahead()) + mpol_put =3D true; + } + if (swap_use_no_readahead(swp_swap_info(entry), entry)) { + folio =3D swapin_direct(entry, gfp_mask, mpol, ilx, shadow); + } else { + if (vmf && swap_use_vma_readahead()) folio =3D swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); else folio =3D swap_cluster_readahead(entry, gfp_mask, mpol, ilx); - mpol_cond_put(mpol); if (swapcache) *swapcache =3D folio; } + if (mpol_put) + mpol_cond_put(mpol); =20 return folio; } diff --git a/mm/swapfile.c b/mm/swapfile.c index aac26f5a6cec..7ff05aaf6925 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1875,7 +1875,7 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, }; =20 folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, - &vmf, NULL, NULL); + &vmf, NULL, 0, NULL, NULL); } if (!folio) { /* --=20 2.43.0 From nobody Wed Dec 24 03:16:50 2025 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3ECBF7602D for ; Mon, 29 Jan 2024 17:55:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550930; cv=none; b=hSBVDAHZeB6CtVkS0rfPwjnJvOeBfXpuV1fM9PmqyW2S3qZU4TugKspk6bBjMZ8QzCPJWVhx8FxO0WI/zXqA0R/xh8eFEH3kyQXxN705M2SJJRNu766gQttcnUfAzcUFBSls9qzurLH0tUo+lRx0kvEJpcowL+MKY7/cnV58hVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706550930; c=relaxed/simple; bh=vXTxVyl4btISce8M1UdLUvnN2CnI6VvXo25TXabN8QA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d9bNUZHBH+5gUpm1XZwXkv3gWiCaISaZD41sFwdN9HfWnD4XoRei8ksBPgCqA6TVVoZ/5xhlJnegtCYaXT0w84cnlA8myV2kOSu3IYSu1vxH0f3w0uZYSUXuY0pOfpyA5ZkO1ibyQHod2g2oAhGM8oumlM0lBM7HtrOOGypJoSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=J6cxevWo; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="J6cxevWo" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6de2e24ea87so661615b3a.3 for ; Mon, 29 Jan 2024 09:55:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706550928; x=1707155728; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=GjZwqrF+fUuAybg+YCS1CGFEH6Sivr26PLz3uMvjpFw=; b=J6cxevWo7M1nlkY1mMXSkogjH9tPuvjDDswkuveDJP/VYj5k1P6E8HXI3k9GA1r53c 0dlftRIMmg7BtqM+LTkt/4eKfcgue1BoNBPDPsSPx2dC9kGVJtRryBtSSAIKEE6hRlSd oDRnMov7Q/pdC6ctRjoTa1kmKihxg4F8X13sa91QAjpUX93pAs1T3MlpN0HBzq3HdYPy lpKbbLwRKNf21zLxJ4Ni++n9sx4gpTJm79msCJRO54c0yMomM65czKGYC/Q4bH1bL4kK 0SEF09mQLDksKqxDUG2HkadRdkHF6+kGN+T37Q+Fqe2yhsO4A7XdWPsVA1wZMDs0FkYb qeLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706550928; x=1707155728; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=GjZwqrF+fUuAybg+YCS1CGFEH6Sivr26PLz3uMvjpFw=; b=b+tZjpwC3fHjIXBwqnVxLo+hot8znkgaouXAe6xiyZiF9L2EiVfG05Qw1jCm2avWGP nU/64zrRQ1dkzuCEfk8DNVyMrhF0FUSVBPfSf1HScvOIfPphqGx1VaqMQbWbAqUjdP1N H9NlrykTtem7T/Q5sxtyzCkgPwVrAIaCwMy+KsHBScndXPf7uB40llqNME/MngdZ3qSV J3IlXLxYXM6c2JMOUhcAdKqqjuSksDlmUp76gTev69VVY+nvSlDMalQ2mGIEmQjeR9R2 LhFVA7aloefJs/1x3rVckO5kJUfCxGNsTqvmngnS+1orD3PhiQqKVs8UATX8Dy6bAgEf 8pCw== X-Gm-Message-State: AOJu0Yyc2ZzujSG2rUnf9h579CNt3ey1xNHwwD90efgspO56ikW71NXs UpUYimpFHleScnKg+u7oZ3zt0YG0AQVL9c1vWEB6hdOpUNN6pqwt X-Google-Smtp-Source: AGHT+IFDh6I0Kp5R4i/kVA2C7S+t5BD4PS3SEfvTVBoyIvP27FzIislYT+FaobErhwcJx+cp8D2QWA== X-Received: by 2002:a05:6a00:4fc4:b0:6dd:a24b:b5f3 with SMTP id le4-20020a056a004fc400b006dda24bb5f3mr5025963pfb.12.1706550928614; Mon, 29 Jan 2024 09:55:28 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id h8-20020aa79f48000000b006ddcadb1e2csm6116676pfr.29.2024.01.29.09.55.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Jan 2024 09:55:28 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 7/7] mm/swap: refactor swap_cache_get_folio Date: Tue, 30 Jan 2024 01:54:22 +0800 Message-ID: <20240129175423.1987-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240129175423.1987-1-ryncsn@gmail.com> References: <20240129175423.1987-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song No feature change, change the code layout to reduce object size and kill redundant indent. With gcc 13.2.1: ./scripts/bloat-o-meter mm/swap_state.o.old mm/swap_state.o add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-35 (-35) Function old new delta swap_cache_get_folio 380 345 -35 Total: Before=3D8785, After=3D8750, chg -0.40% Signed-off-by: Kairui Song --- mm/swap_state.c | 59 ++++++++++++++++++++++++------------------------- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 20c206149be4..2f809b69b65a 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -341,9 +341,10 @@ static inline bool swap_use_vma_readahead(void) * * Caller must lock the swap device or hold a reference to keep it valid. */ -struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr, void **shadowp) +struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struc= t *vma, + unsigned long addr, void **shadowp) { + bool vma_ra, readahead; struct folio *folio; =20 folio =3D filemap_get_entry(swap_address_space(entry), swp_offset(entry)); @@ -352,37 +353,35 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, *shadowp =3D folio; return NULL; } + if (!folio) + return NULL; =20 - if (folio) { - bool vma_ra =3D swap_use_vma_readahead(); - bool readahead; + /* + * At the moment, we don't support PG_readahead for anon THP + * so let's bail out rather than confusing the readahead stat. + */ + if (unlikely(folio_test_large(folio))) + return folio; =20 - /* - * At the moment, we don't support PG_readahead for anon THP - * so let's bail out rather than confusing the readahead stat. - */ - if (unlikely(folio_test_large(folio))) - return folio; - - readahead =3D folio_test_clear_readahead(folio); - if (vma && vma_ra) { - unsigned long ra_val; - int win, hits; - - ra_val =3D GET_SWAP_RA_VAL(vma); - win =3D SWAP_RA_WIN(ra_val); - hits =3D SWAP_RA_HITS(ra_val); - if (readahead) - hits =3D min_t(int, hits + 1, SWAP_RA_HITS_MAX); - atomic_long_set(&vma->swap_readahead_info, - SWAP_RA_VAL(addr, win, hits)); - } + vma_ra =3D swap_use_vma_readahead(); + readahead =3D folio_test_clear_readahead(folio); + if (vma && vma_ra) { + unsigned long ra_val; + int win, hits; + + ra_val =3D GET_SWAP_RA_VAL(vma); + win =3D SWAP_RA_WIN(ra_val); + hits =3D SWAP_RA_HITS(ra_val); + if (readahead) + hits =3D min_t(int, hits + 1, SWAP_RA_HITS_MAX); + atomic_long_set(&vma->swap_readahead_info, + SWAP_RA_VAL(addr, win, hits)); + } =20 - if (readahead) { - count_vm_event(SWAP_RA_HIT); - if (!vma || !vma_ra) - atomic_inc(&swapin_readahead_hits); - } + if (readahead) { + count_vm_event(SWAP_RA_HIT); + if (!vma || !vma_ra) + atomic_inc(&swapin_readahead_hits); } =20 return folio; --=20 2.43.0