From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57EEF156E1 for ; Tue, 2 Jan 2024 17:53:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="l9rQWTuA" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1d3e6c86868so74364385ad.1 for ; Tue, 02 Jan 2024 09:53:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218035; x=1704822835; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=HOgiTUumllZGmGcBS/5D1WFbQ4vDZwA6J8hbWSvm0VQ=; b=l9rQWTuAYOHW1SY3DaHjJM3p678/xk22CAGmbYIuDCHW715aNOAEdQtYNuSfBS0eeS 1czU7uHLYHXgHX0Ns6oxGuqogOEmozRbms7VS3Sx2moesyoVUtc85Aauk+QYmB/HwfhN tojv/J+tDgckjwktqZznps9hO+t6pssvACfVTbbx7JFoGDtNkhnMlt19rGWhl+lOFnMl AuMYw4pLFsLa5CMZtwgEPBaFKB/8A7qQz65IEUIeDtvqnxffBeSFwNCBLvUawTcgTleD JPbxn23/h22LtJ5mXAIIlKdHZLnGV2OdwQYNqVXSFq17+tUkxDiLLXRPIneyI+NinLrm 3MZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218035; x=1704822835; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HOgiTUumllZGmGcBS/5D1WFbQ4vDZwA6J8hbWSvm0VQ=; b=hSKvsN/iUswKnBpff6DplYfhd5pb9foCKVROCCKL2keurXj46C5cYakGTXAuADaDzK CIlyjnDFTv/Mp/4Q1SQTmpHdWiqPU2HOzT7hQZb2DwGhR0uHkscnsuJ64DeN7u37MDw+ W0xXcW5uEELAOmlz4IG9eCb0vW5l/UNgK3sAoKFHTYtrshevpip5BSSAGOxn0cj4owqp gUQ+xfcBi7Ia9jWIMwvgiYSRwoJ3yuII0TB1K4QwPkeOcJsS4vAdP7ipeKjtipiLzrqw Fn0kLI0b8gkMEpBbAqrILouzz/FlQEFnZ2QMTK7vKLgVzi3D1U5gYDIYaRm3TauwUf3q jjYg== X-Gm-Message-State: AOJu0Ywtjg5c2/v9aug7hZHJBHNL2HmCGs3ObeY7fm8vBFsoE6bP9Opu u0ix7hqSfVjHUIGNmGyqBv0= X-Google-Smtp-Source: AGHT+IE9UyXqSDCkq4STavfMkchWm/4gFSM+hqb3udvJPylkrwaXHIW+zWlXliWIUXub6vHeHy9JRA== X-Received: by 2002:a17:902:c409:b0:1d4:cca4:7e81 with SMTP id k9-20020a170902c40900b001d4cca47e81mr724379plk.64.1704218035614; Tue, 02 Jan 2024 09:53:55 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.53.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:53:55 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 1/9] mm/swapfile.c: add back some comment Date: Wed, 3 Jan 2024 01:53:30 +0800 Message-ID: <20240102175338.62012-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Some useful comments were dropped in commit b56a2d8af914 ('mm: rid swapoff of quadratic complexity'), add them back. Signed-off-by: Kairui Song --- mm/swapfile.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 3eec686484ef..f7271504aa0a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1880,6 +1880,17 @@ static int unuse_pte_range(struct vm_area_struct *vm= a, pmd_t *pmd, folio =3D page_folio(page); } if (!folio) { + /* + * The entry could have been freed, and will not + * be reused since swapoff() already disabled + * allocation from here, or alloc_page() failed. + * + * We don't hold lock here, so the swap entry could be + * SWAP_MAP_BAD (when the cluster is discarding). + * Instead of fail out, We can just skip the swap + * entry because swapoff will wait for discarding + * finish anyway. + */ swp_count =3D READ_ONCE(si->swap_map[offset]); if (swp_count =3D=3D 0 || swp_count =3D=3D SWAP_MAP_BAD) continue; --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA691156F2 for ; Tue, 2 Jan 2024 17:53:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Bs8e7giB" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1d3ed1ca402so74374295ad.2 for ; Tue, 02 Jan 2024 09:53:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218039; x=1704822839; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=leW6kIEfUqfY3XAn3Ki1C0WiRu5w1qcwcxw36bp08sM=; b=Bs8e7giBNFdIlK96P/kY8T+cQiDyWdvX4fKTHaGdmsv47LTdkbimnxN3oWiEjYHuUb LA9QgtEFD5f4Gng/cmSPbU3JE8QcHwGriH/tPKFwVGPVndBU09pOKNmnjrK6TcsqQPeh u4PiSoZaGfA/gSAZ/ggTplvWzIrnh1dvw2IVyQWsXKmXw6gKoPh9l4VumEDeRhQ9cAD8 1BbedW+Wod1IkoRI1cMZrc60wqmcYOxMUx8TwW93eeHunEk/G2mJVEAq4YH2KAMas9qH g6XqevoEpkr017dwjD2sN80aAUmXNKylaXWMxBecKEBp7QKRLtiomak+1cHK/h+7OH9x nW/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218039; x=1704822839; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=leW6kIEfUqfY3XAn3Ki1C0WiRu5w1qcwcxw36bp08sM=; b=PiaRkpV3iCRw+IzV+KMXmpKHXwG1MnA/JvcmY8cfeWLWkAaKHZNn8mvh6Ll2K5FC6I 7hh2CI5SibhxrLF8752vtK8+iyDpvzmwi2p+Ak2QLmRfHyZLn6gFxOpnOLKKHTpty7tL 5ETaKiyVE6gdeEBwOkxE7CFgFoEBBXRqO5ekTkHIamN3feXrTFu1cqH99WxkCbdqlT3t eAaHPG6y+seT7R67YvbZxeVWywqHcE/rzfgIoKBMN8KBZ8kMaB1aU3fIyzO2RSwFOmz9 N+mivCB28gbzY40WlNfPtHlfxFJN6I0C6mUL+fckZ37uUYx8sKkg9xPbCOi4uNT3J+g8 iKzw== X-Gm-Message-State: AOJu0YylTc3SSjIs2xsqHMA40ocFLM9S1eMbt8AqW1QAuJflplR0Y4J6 v2zohy6y9R+cJq0cC9YPHhM= X-Google-Smtp-Source: AGHT+IHbiz8m7h29JKdOV+VIY3InvdMjldzD40RepmLTENYklXxpKThd4lbQBZ1H/dFstp+Ax17T1A== X-Received: by 2002:a17:902:a585:b0:1d0:6ffe:a28 with SMTP id az5-20020a170902a58500b001d06ffe0a28mr17151404plb.134.1704218039237; Tue, 02 Jan 2024 09:53:59 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.53.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:53:58 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 2/9] mm/swap: move no readahead swapin code to a stand-alone helper Date: Wed, 3 Jan 2024 01:53:31 +0800 Message-ID: <20240102175338.62012-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song No feature change, simply move the routine to a standalone function to be re-used later. The error path handling is copied from the "out_page" label, to make the code change minimized for easier reviewing. Signed-off-by: Kairui Song --- mm/memory.c | 32 ++++---------------------------- mm/swap.h | 8 ++++++++ mm/swap_state.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 28 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index a0a50d3754f0..0165c8cad489 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3803,7 +3803,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swp_entry_t entry; pte_t pte; vm_fault_t ret =3D 0; - void *shadow =3D NULL; =20 if (!pte_unmap_same(vmf)) goto out; @@ -3867,33 +3866,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) =3D=3D 1) { - /* skip swapcache */ - folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); - page =3D &folio->page; - if (folio) { - __folio_set_locked(folio); - __folio_set_swapbacked(folio); - - if (mem_cgroup_swapin_charge_folio(folio, - vma->vm_mm, GFP_KERNEL, - entry)) { - ret =3D VM_FAULT_OOM; - goto out_page; - } - mem_cgroup_swapin_uncharge_swap(entry); - - shadow =3D get_shadow_from_swap_cache(entry); - if (shadow) - workingset_refault(folio, shadow); - - folio_add_lru(folio); - - /* To provide entry to swap_read_folio() */ - folio->swap =3D entry; - swap_read_folio(folio, true, NULL); - folio->private =3D NULL; - } + /* skip swapcache and readahead */ + folio =3D swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf); + if (folio) + page =3D &folio->page; } else { page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf); diff --git a/mm/swap.h b/mm/swap.h index 758c46ca671e..83eab7b67e77 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -56,6 +56,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, g= fp_t flag, struct mempolicy *mpol, pgoff_t ilx); struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -86,6 +88,12 @@ static inline struct folio *swap_cluster_readahead(swp_e= ntry_t entry, return NULL; } =20 +struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf) +{ + return NULL; +} + static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mas= k, struct vm_fault *vmf) { diff --git a/mm/swap_state.c b/mm/swap_state.c index e671266ad772..24cb93ed5081 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -861,6 +861,53 @@ static struct folio *swap_vma_readahead(swp_entry_t ta= rg_entry, gfp_t gfp_mask, return folio; } =20 +/** + * swapin_direct - swap in folios skipping swap cache and readahead + * @entry: swap entry of this memory + * @gfp_mask: memory allocation flags + * @vmf: fault information + * + * Returns the struct folio for entry and addr after the swap entry is read + * in. + */ +struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, + struct vm_fault *vmf) +{ + struct vm_area_struct *vma =3D vmf->vma; + struct folio *folio; + void *shadow =3D NULL; + + /* skip swapcache */ + folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, + vma, vmf->address, false); + if (folio) { + __folio_set_locked(folio); + __folio_set_swapbacked(folio); + + if (mem_cgroup_swapin_charge_folio(folio, + vma->vm_mm, GFP_KERNEL, + entry)) { + folio_unlock(folio); + folio_put(folio); + return NULL; + } + mem_cgroup_swapin_uncharge_swap(entry); + + shadow =3D get_shadow_from_swap_cache(entry); + if (shadow) + workingset_refault(folio, shadow); + + folio_add_lru(folio); + + /* To provide entry to swap_read_folio() */ + folio->swap =3D entry; + swap_read_folio(folio, true, NULL); + folio->private =3D NULL; + } + + return folio; +} + /** * swapin_readahead - swap in pages in hope we need them soon * @entry: swap entry of this memory --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EC2715ADF for ; Tue, 2 Jan 2024 17:54:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Mwym94/+" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1d4ca2fd2fbso3820095ad.2 for ; Tue, 02 Jan 2024 09:54:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218043; x=1704822843; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=5EuDdHpwOIkkgSSbClAkFkJceQknRHYQiFDZ/42t/cQ=; b=Mwym94/+NnPgR3YyIdv7Ey1kOkc9/3yDHaR2W6vWX+s1EKZI1Z2AYXrRLCMzCn0JSd IIL9s5/jUC/ADAPqSosCHvUUuprq5ipDFz3nZVjAv4n+lD5cz03daVSBV9+HkU41Z43v yWBQFSeK6o12OLy6CcLeySk6hRJVEH/NGktn024/uM8nwe1lYWVOPNU+r9e3yK0KYhmE vvMMPr/EaKG6XiuYz6grmRd7Wb0o1r2ePe5xtD/ZShMgggnYM/wDyF1/dDH37HA9R8ST X6qDW6nXAEq8QMA0bpfTi6dTCD7D4WxADp3GUks9Wu2dVmY7Ua+BdpXPvXE/0q+tvcK5 Z2Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218043; x=1704822843; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5EuDdHpwOIkkgSSbClAkFkJceQknRHYQiFDZ/42t/cQ=; b=MlFZ4wxC2t8Uwcl2IhFYWrdOVEvfHCkd6am6s0NQDcdt2kTaBh0RMbYVX0CHebcZKO 3nV+EZkqWsH9GdEQnFIa/T0j+zJCehey780kvVvsIOE1lDPhYuFEqAN4u8d+0fTNFFUp PNjU7dLQQpI/+5cWAcD1DBncj5hQJGFHDT8IDypmz9002HcZ3dzvyqVyYUMmBV2RfCDY Xzt6e6iyL9Tm9Z/kOzs/NeruNaeslkHALHE0nNKWzL7mtglSbZoyPxnqg43vQwkYVpQJ kP+wMS2Pukx3Rq9+aosVPyFDs80w++nUq+L2sdvZCnQEwsGTNscE67DafIGFiZujjqBw E1kQ== X-Gm-Message-State: AOJu0YzfTFC16DGqNI90Ay7ZfPBF4qkVbK4dgAoQyf0xVNSWsuw1/fRL x0QIT/EH1kZ+CPiof3s1hWM= X-Google-Smtp-Source: AGHT+IGfDUQj/99XwdeiZZ73lh5kEPQ3dWYF+KkQDfvXFyCLAaB8H9KSj/gNI32pWe4Z84HwpG1X3w== X-Received: by 2002:a17:903:184:b0:1d0:c986:8ac9 with SMTP id z4-20020a170903018400b001d0c9868ac9mr9755498plg.22.1704218042880; Tue, 02 Jan 2024 09:54:02 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.53.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:02 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 3/9] mm/swap: avoid doing extra unlock error checks for direct swapin Date: Wed, 3 Jan 2024 01:53:32 +0800 Message-ID: <20240102175338.62012-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song When swapping in a page, mem_cgroup_swapin_charge_folio is called for new allocated folio, nothing else is referencing the folio so no need to set the lock bit early. This avoided doing extra unlock checks on the error path. Signed-off-by: Kairui Song --- mm/swap_state.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 24cb93ed5081..6130de8d5226 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -881,16 +881,15 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t = gfp_mask, folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vmf->address, false); if (folio) { - __folio_set_locked(folio); - __folio_set_swapbacked(folio); - - if (mem_cgroup_swapin_charge_folio(folio, - vma->vm_mm, GFP_KERNEL, - entry)) { - folio_unlock(folio); + if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, + GFP_KERNEL, entry)) { folio_put(folio); return NULL; } + + __folio_set_locked(folio); + __folio_set_swapbacked(folio); + mem_cgroup_swapin_uncharge_swap(entry); =20 shadow =3D get_shadow_from_swap_cache(entry); --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CD2115AE5 for ; Tue, 2 Jan 2024 17:54:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iQ3XAnIW" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1d4a2526a7eso14040995ad.3 for ; Tue, 02 Jan 2024 09:54:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218046; x=1704822846; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=HKSeSe6aIz/giM2hLWohMCNrft/gPTDUxVx5SpP3Eg8=; b=iQ3XAnIWnjar2ZYyLhDTvoTQGD86vhuTspFjEfkZ2LS8is25EqtlIlVqY1uNrARYhv 4njSVNDX7VvqQykfK3TMI3LuF5aQ1VyidjoTlxv9QIvoz/SqTk9hI3AnUeIQHY7hsi6H Oqx5fTw5nTQpZaCE4OwLNXIjRr1gek9ss6yx2t3z7HGrkv9xiudu/Url5mbG/Q2b/9P4 iPrpU9rmarBnVV2LHpAcXuQw4sbwPCYSVx63I9X+8I3q14VOWhryxEp+dEjwxwIC+ju7 urazwrtNUwpiFvSQulSKKlxiUJ2cHXugOX2FgvGizKkdzeOCNDBKbH7EOZ8TbLpSXubF 9UyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218046; x=1704822846; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HKSeSe6aIz/giM2hLWohMCNrft/gPTDUxVx5SpP3Eg8=; b=rOoF/bfwkbAjfGKvATd3rCj3YSrko5CjzTXBQy7q4tEnnxiV/FB3iGgxTAbsfGxbI2 8LM/ojOc+gnTtmIDSCCf5zrUKFyAC1SyVQLlPRp4BgaEx3cA7aNKUeojptBKKtRRwDo0 3n/9y+/Kp0dITyk9obdUK0erlW9+a7sOrAyXGJWBuRIjeUAUpSBjXbCkYndxHLOj09oc Zi6Qhg6t1V3RY5vsjMWT5QNlEMCkyyzhXuEK87+Crscucjtlmpb5fha9x5omgrwajnML 07qMWQA8hWvtRh6lquD/W7E1sLGGZk9hVudd5TnD7SimEHOejZrVBJMwvza9GVDHUTf0 2hiA== X-Gm-Message-State: AOJu0Yxwcwo+m/0duzLe0bXyP7mPiHQtc8qpQxFcX1w1jInKCVTOTmIt eDnWvhpbtQsQDJa57QsH2jw= X-Google-Smtp-Source: AGHT+IGjD60IJ5O934G0uk9U2RzbZjBO7ugPyT2rW4oxx9bLV6dvHfHyD2MlY2xhFWamcsU7Iv7RfQ== X-Received: by 2002:a17:903:1251:b0:1d3:529a:83b6 with SMTP id u17-20020a170903125100b001d3529a83b6mr8433403plh.17.1704218046394; Tue, 02 Jan 2024 09:54:06 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.54.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:05 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 4/9] mm/swap: always account swapped in page into current memcg Date: Wed, 3 Jan 2024 01:53:33 +0800 Message-ID: <20240102175338.62012-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently, mem_cgroup_swapin_charge_folio is always called with mm argument as NULL, except in swapin_direct. swapin_direct is used when swapin should skip readahead and swapcache (SWP_SYNCHRONOUS_IO). Other caller paths of mem_cgroup_swapin_charge_folio are for swapin that should not skip readahead and cache. This could cause swapin charging to behave differently depending on swap device. This currently didn't happen because the only call path of swapin_direct is the direct anon page fault path, where mm equals to current->mm, but will no longer be true if swapin_direct is shared and have other callers (eg, swapoff). So make swapin_direct also passes NULL for mm, no feature change. Signed-off-by: Kairui Song --- mm/swap_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 6130de8d5226..d39c5369da21 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -881,7 +881,7 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t gf= p_mask, folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vmf->address, false); if (folio) { - if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, + if (mem_cgroup_swapin_charge_folio(folio, NULL, GFP_KERNEL, entry)) { folio_put(folio); return NULL; --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91BA116400 for ; Tue, 2 Jan 2024 17:54:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="d1IUZ15P" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1d4a980fdedso18617695ad.1 for ; Tue, 02 Jan 2024 09:54:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218050; x=1704822850; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=nlZ0fg6NatnLjyD4rpGx2P4cih5u8+SQurmVS5ry9aQ=; b=d1IUZ15PrvfH4sIZWM9dh5/ZjjJBNUeQsauTWJAQM5A+DYZJd5s5VqrNTVY22eHj69 OHXcyuCRhAyOyohsdngiCNsS669xt19+3lR/4yDdTaDZ0JZFnzKqEDXY7xROrA1+hg3l Httm3BHwRdTYwjDo2vOC0XJP7FMi+eofHeip5z0tV4iqZmYoU/ZlA/ocm/btUBGxzh6s mz03Ru6FdAF+KhcKMU/ODQJ62D/Zu8PJfYByMssaFtWVjGLa/DmKTpLO0O6fvFF40Bga /a9RSwC5cytBGxYrDeK5sCWygfcytMB4LbF2SUeA+1IESGhBTKQDpoEPqtlmuA5Ho6A6 GnyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218050; x=1704822850; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nlZ0fg6NatnLjyD4rpGx2P4cih5u8+SQurmVS5ry9aQ=; b=HhOKJf+gcUFiEshMcN+bH4l0624PtueaDGgI2QOfUAhXAgpI1himqOVFRIeC5RysH2 jDq6mednBOsBbdSxSNzOYnK59iWYqYeGv+wP03lb0vaEKjEWN8oQ0xIbAi/nurntyl1j MCrfvwu6nOB068pc/nLVHIiH7zliO6g2mQaHzaFd6DESdy3pcoJVkU1VddcO2mCxv5Cp F9RFTpMqiMigw2E5AjkyFjRG1Zlm81uVztP+LvQf3UwPa0R9CfSC5vG33zlpkNNp8muK EpxUQdOxiEmsPyr8+/uoYGBQf6hojRwBwJU3QPQ5foz8NmE7s9j9ynsJTpdXFIGCbBgb FoUQ== X-Gm-Message-State: AOJu0Yx64XzP/IY4EvtWiouXHb0uqBji4dOmom4qUtDpuTjNfTP5Ec79 ekO6I8w5sZooJCHAGwAoI2kRvH7cy0jtdaUN X-Google-Smtp-Source: AGHT+IHg3EaXPsfJ3BsQyYM+9C4NVV8c46gR3f+IfIC1eTcpnGbk9EIEm5Ji/30WqG1cQYmyBx9asQ== X-Received: by 2002:a17:903:41d1:b0:1d4:3b72:b294 with SMTP id u17-20020a17090341d100b001d43b72b294mr20703980ple.88.1704218049886; Tue, 02 Jan 2024 09:54:09 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.54.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:09 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 5/9] mm/swap: introduce swapin_entry for unified readahead policy Date: Wed, 3 Jan 2024 01:53:34 +0800 Message-ID: <20240102175338.62012-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Introduce swapin_entry which merges swapin_readahead and swapin_direct making it the main entry for swapin pages, and use a unified swapin policy. This commit makes swapoff make use of this new helper and now swapping off a 10G ZRAM (lzo-rle) is faster since readahead is skipped. Before: time swapoff /dev/zram0 real 0m12.337s user 0m0.001s sys 0m12.329s After: time swapoff /dev/zram0 real 0m9.728s user 0m0.001s sys 0m9.719s Signed-off-by: Kairui Song --- mm/memory.c | 21 +++++++-------------- mm/swap.h | 16 ++++------------ mm/swap_state.c | 49 +++++++++++++++++++++++++++++++++---------------- mm/swapfile.c | 7 ++----- 4 files changed, 46 insertions(+), 47 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 0165c8cad489..b56254a875f8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3801,6 +3801,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) rmap_t rmap_flags =3D RMAP_NONE; bool exclusive =3D false; swp_entry_t entry; + bool swapcached; pte_t pte; vm_fault_t ret =3D 0; =20 @@ -3864,21 +3865,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swapcache =3D folio; =20 if (!folio) { - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && - __swap_count(entry) =3D=3D 1) { - /* skip swapcache and readahead */ - folio =3D swapin_direct(entry, GFP_HIGHUSER_MOVABLE, vmf); - if (folio) - page =3D &folio->page; + folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, + vmf, &swapcached); + if (folio) { + page =3D folio_file_page(folio, swp_offset(entry)); + if (swapcached) + swapcache =3D folio; } else { - page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, - vmf); - if (page) - folio =3D page_folio(page); - swapcache =3D folio; - } - - if (!folio) { /* * Back out if somebody else faulted in this pte * while we released the pte lock. diff --git a/mm/swap.h b/mm/swap.h index 83eab7b67e77..502a2801f817 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -54,10 +54,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry,= gfp_t gfp_flags, bool skip_if_exists); struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct mempolicy *mpol, pgoff_t ilx); -struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); +struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf, bool *swapcached); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -88,14 +86,8 @@ static inline struct folio *swap_cluster_readahead(swp_e= ntry_t entry, return NULL; } =20 -struct folio *swapin_direct(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf) -{ - return NULL; -} - -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mas= k, - struct vm_fault *vmf) +static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask, + struct vm_fault *vmf, bool *swapcached) { return NULL; } diff --git a/mm/swap_state.c b/mm/swap_state.c index d39c5369da21..66ff187aa5d3 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -316,6 +316,11 @@ void free_pages_and_swap_cache(struct encoded_page **p= ages, int nr) release_pages(pages, nr); } =20 +static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_= entry_t entry) +{ + return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) = =3D=3D 1; +} + static inline bool swap_use_vma_readahead(void) { return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); @@ -870,8 +875,8 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, * Returns the struct folio for entry and addr after the swap entry is read * in. */ -struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) +static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, + struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; struct folio *folio; @@ -908,33 +913,45 @@ struct folio *swapin_direct(swp_entry_t entry, gfp_t = gfp_mask, } =20 /** - * swapin_readahead - swap in pages in hope we need them soon + * swapin_entry - swap in a page from swap entry * @entry: swap entry of this memory * @gfp_mask: memory allocation flags * @vmf: fault information + * @swapcached: pointer to a bool used as indicator if the + * page is swapped in through swapcache. * * Returns the struct page for entry and addr, after queueing swapin. * - * It's a main entry function for swap readahead. By the configuration, + * It's a main entry function for swap in. By the configuration, * it will read ahead blocks by cluster-based(ie, physical disk based) - * or vma-based(ie, virtual address based on faulty address) readahead. + * or vma-based(ie, virtual address based on faulty address) readahead, + * or skip the readahead (ie, ramdisk based swap device). */ -struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) +struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, + struct vm_fault *vmf, bool *swapcached) { struct mempolicy *mpol; - pgoff_t ilx; struct folio *folio; + pgoff_t ilx; + bool cached; =20 - mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); - folio =3D swap_use_vma_readahead() ? - swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf) : - swap_cluster_readahead(entry, gfp_mask, mpol, ilx); - mpol_cond_put(mpol); + if (swap_use_no_readahead(swp_swap_info(entry), entry)) { + folio =3D swapin_direct(entry, gfp_mask, vmf); + cached =3D false; + } else { + mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); + if (swap_use_vma_readahead()) + folio =3D swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); + else + folio =3D swap_cluster_readahead(entry, gfp_mask, mpol, ilx); + mpol_cond_put(mpol); + cached =3D true; + } =20 - if (!folio) - return NULL; - return folio_file_page(folio, swp_offset(entry)); + if (swapcached) + *swapcached =3D cached; + + return folio; } =20 #ifdef CONFIG_SYSFS diff --git a/mm/swapfile.c b/mm/swapfile.c index f7271504aa0a..ce4e6c10dce7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1866,7 +1866,6 @@ static int unuse_pte_range(struct vm_area_struct *vma= , pmd_t *pmd, =20 folio =3D swap_cache_get_folio(entry, vma, addr); if (!folio) { - struct page *page; struct vm_fault vmf =3D { .vma =3D vma, .address =3D addr, @@ -1874,10 +1873,8 @@ static int unuse_pte_range(struct vm_area_struct *vm= a, pmd_t *pmd, .pmd =3D pmd, }; =20 - page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, - &vmf); - if (page) - folio =3D page_folio(page); + folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, + &vmf, NULL); } if (!folio) { /* --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39D6A16439 for ; Tue, 2 Jan 2024 17:54:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HxubI01Y" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1d3f8af8297so33802465ad.2 for ; Tue, 02 Jan 2024 09:54:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218053; x=1704822853; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=WQ0rvcDFd1GF49xQGBwNPnCf4JL45lf63n9SBHxFbwk=; b=HxubI01YHxZeLZXmBGUJPUWLPXNnjeznxPdl/qi6OcXRdcoLKNJaVjW54Z8+WHm9zo 7abgLdYleMTvbXgGy9o8v0RHu54toPPNMDVOoZH7h8Qi/oed3qyBqyHihxgB4I9cLGfl 7sHyHFbN5A16fGjZYuo2wee7ZlvWTaFb7TMMfsTAenwpd+d4o4kcdjFz0JNln0YSNp7/ lVvkunFAX/w/fQA2rRVYf5rY0MP4aZWT8CdBZg669B9flf5McUOHhw0BiJ61/+/s62UY 2O/kUT58BQ8n2hksDjA91CRNcpqHXqQDqLRWGop+Y1DUKoQmN6CnrcKuPXBsyM/5ZOlo tZdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218053; x=1704822853; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=WQ0rvcDFd1GF49xQGBwNPnCf4JL45lf63n9SBHxFbwk=; b=wtA6G/oEH/L/+alczYr++A3+eD9WbxQlcbhNGgWLx2YWOh88um9F93CRRuKIm6FUn4 quyZqXEb49e2VE6mTNTcf2W0Pk2+VC/9fJ6qRzgfsiYdBWoDMcO6sySJs/XKxZQOKLPF BtkhrS6pobxe7axuXDwr5dhMSDcjmR1d8EBjTlGMUUcm4am8B7uMm6uK4uIawuDZGabr DGAIsYlQuSierplmb0XT7WW1sA9IFhi2QVREttjofYOw+RPcy9maFMI6atGXfSxxwFqB CSob0VTlO5SuWAEIoFHwj/5MVYQOyLsTc4a2NqEFLQWuk0bpnze9HHYOL5Sd307BU3OD bEYQ== X-Gm-Message-State: AOJu0YyKPhTRQRCkjKOciyG7c0j4NiTbR/L5PyeRJ3yG52ldItVNk/Nl GcYabRHmrEOL6q19CFERkSE= X-Google-Smtp-Source: AGHT+IEqRxzrscJ5aDzXjoCzOKW/mGxYYasxFtxeT189T8sXpnySEuTBNSzOKUPECfaBTd/IXyLMhw== X-Received: by 2002:a17:903:2352:b0:1d4:c1b3:c6c with SMTP id c18-20020a170903235200b001d4c1b30c6cmr1260292plh.51.1704218053424; Tue, 02 Jan 2024 09:54:13 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.54.10 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:12 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 6/9] mm/swap: handle swapcache lookup in swapin_entry Date: Wed, 3 Jan 2024 01:53:35 +0800 Message-ID: <20240102175338.62012-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Since all callers of swapin_entry need to check the swap cache first, we can merge this common routine into swapin_entry, so it can be shared and optimized later. Also introduce a enum to better represent possible swap cache usage, and add some comments about it, make the usage of swap cache easier to understand. Signed-off-by: Kairui Song --- mm/memory.c | 45 ++++++++++++++++++++------------------------- mm/swap.h | 20 ++++++++++++++++++-- mm/swap_state.c | 22 ++++++++++++++-------- mm/swapfile.c | 21 +++++++++------------ 4 files changed, 61 insertions(+), 47 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index b56254a875f8..ab6e76c95632 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3795,13 +3795,13 @@ static vm_fault_t handle_pte_marker(struct vm_fault= *vmf) vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; - struct folio *swapcache, *folio =3D NULL; + struct folio *swapcache =3D NULL, *folio; + enum swap_cache_result cache_result; struct page *page; struct swap_info_struct *si =3D NULL; rmap_t rmap_flags =3D RMAP_NONE; bool exclusive =3D false; swp_entry_t entry; - bool swapcached; pte_t pte; vm_fault_t ret =3D 0; =20 @@ -3859,31 +3859,26 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (unlikely(!si)) goto out; =20 - folio =3D swap_cache_get_folio(entry, vma, vmf->address); - if (folio) + folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, + vmf, &cache_result); + if (folio) { page =3D folio_file_page(folio, swp_offset(entry)); - swapcache =3D folio; - - if (!folio) { - folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, - vmf, &swapcached); - if (folio) { - page =3D folio_file_page(folio, swp_offset(entry)); - if (swapcached) - swapcache =3D folio; - } else { - /* - * Back out if somebody else faulted in this pte - * while we released the pte lock. - */ - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); - if (likely(vmf->pte && - pte_same(ptep_get(vmf->pte), vmf->orig_pte))) - ret =3D VM_FAULT_OOM; - goto unlock; - } + if (cache_result !=3D SWAP_CACHE_BYPASS) + swapcache =3D folio; + } else { + /* + * Back out if somebody else faulted in this pte + * while we released the pte lock. + */ + vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (likely(vmf->pte && + pte_same(ptep_get(vmf->pte), vmf->orig_pte))) + ret =3D VM_FAULT_OOM; + goto unlock; + } =20 + if (cache_result !=3D SWAP_CACHE_HIT) { /* Had to read the page from swap area: Major fault */ ret =3D VM_FAULT_MAJOR; count_vm_event(PGMAJFAULT); diff --git a/mm/swap.h b/mm/swap.h index 502a2801f817..1f4cdb324bf0 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -4,6 +4,22 @@ =20 struct mempolicy; =20 +/* + * Caller of swapin_entry may need to know the cache lookup result: + * + * SWAP_CACHE_HIT: cache hit, cached folio is retured. + * SWAP_CACHE_MISS: cache miss, folio is allocated, read from swap device + * and adde to swap cache, but still may return a cached + * folio if raced (check __read_swap_cache_async). + * SWAP_CACHE_BYPASS: cache miss, folio is new allocated and read + * from swap device bypassing the cache. + */ +enum swap_cache_result { + SWAP_CACHE_HIT, + SWAP_CACHE_MISS, + SWAP_CACHE_BYPASS, +}; + #ifdef CONFIG_SWAP #include /* for bio_end_io_t */ =20 @@ -55,7 +71,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, = gfp_t gfp_flags, struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct mempolicy *mpol, pgoff_t ilx); struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf, bool *swapcached); + struct vm_fault *vmf, enum swap_cache_result *result); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -87,7 +103,7 @@ static inline struct folio *swap_cluster_readahead(swp_e= ntry_t entry, } =20 static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf, bool *swapcached) + struct vm_fault *vmf, enum swap_cache_result *result) { return NULL; } diff --git a/mm/swap_state.c b/mm/swap_state.c index 66ff187aa5d3..f6f1e6f5d782 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -917,8 +917,7 @@ static struct folio *swapin_direct(swp_entry_t entry, g= fp_t gfp_mask, * @entry: swap entry of this memory * @gfp_mask: memory allocation flags * @vmf: fault information - * @swapcached: pointer to a bool used as indicator if the - * page is swapped in through swapcache. + * @result: a return value to indicate swap cache usage. * * Returns the struct page for entry and addr, after queueing swapin. * @@ -928,16 +927,22 @@ static struct folio *swapin_direct(swp_entry_t entry,= gfp_t gfp_mask, * or skip the readahead (ie, ramdisk based swap device). */ struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf, bool *swapcached) + struct vm_fault *vmf, enum swap_cache_result *result) { + enum swap_cache_result cache_result; struct mempolicy *mpol; struct folio *folio; pgoff_t ilx; - bool cached; + + folio =3D swap_cache_get_folio(entry, vmf->vma, vmf->address); + if (folio) { + cache_result =3D SWAP_CACHE_HIT; + goto done; + } =20 if (swap_use_no_readahead(swp_swap_info(entry), entry)) { folio =3D swapin_direct(entry, gfp_mask, vmf); - cached =3D false; + cache_result =3D SWAP_CACHE_BYPASS; } else { mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); if (swap_use_vma_readahead()) @@ -945,11 +950,12 @@ struct folio *swapin_entry(swp_entry_t entry, gfp_t g= fp_mask, else folio =3D swap_cluster_readahead(entry, gfp_mask, mpol, ilx); mpol_cond_put(mpol); - cached =3D true; + cache_result =3D SWAP_CACHE_MISS; } =20 - if (swapcached) - *swapcached =3D cached; +done: + if (result) + *result =3D cache_result; =20 return folio; } diff --git a/mm/swapfile.c b/mm/swapfile.c index ce4e6c10dce7..5aa44de11edc 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1845,6 +1845,13 @@ static int unuse_pte_range(struct vm_area_struct *vm= a, pmd_t *pmd, int ret; pte_t ptent; =20 + struct vm_fault vmf =3D { + .vma =3D vma, + .address =3D addr, + .real_address =3D addr, + .pmd =3D pmd, + }; + if (!pte++) { pte =3D pte_offset_map(pmd, addr); if (!pte) @@ -1864,18 +1871,8 @@ static int unuse_pte_range(struct vm_area_struct *vm= a, pmd_t *pmd, pte_unmap(pte); pte =3D NULL; =20 - folio =3D swap_cache_get_folio(entry, vma, addr); - if (!folio) { - struct vm_fault vmf =3D { - .vma =3D vma, - .address =3D addr, - .real_address =3D addr, - .pmd =3D pmd, - }; - - folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, - &vmf, NULL); - } + folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, + &vmf, NULL); if (!folio) { /* * The entry could have been freed, and will not --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6EFC168BF for ; Tue, 2 Jan 2024 17:54:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="i3/c1RD+" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1d3eb299e2eso44778135ad.2 for ; Tue, 02 Jan 2024 09:54:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218057; x=1704822857; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=wKrEvGkZ8GQyBX+qIUqWP+XNbS2tofN+iPdO+E071Zk=; b=i3/c1RD+DChkvymDRkcUCFyrARZtmNDvfOcYKJL10LqpYvuf376ck7zDsWlNkLnm7S pA5GVWZJ5aXyBg2OT9zLKAtGYDlg0sdetUdK9MJtvhJJA0VuWHuuBd+eyvV+OEzOkhcr gbwgOSXo3UYHchzrSQYG7Bgw37yDI0o413JFoH6C3n03CVP/CV/qcfzItHHNmgmpxIUD +fBeYxcY4KYXU4IZTsSf0Y+JgJLFEUpSAuJoUm/uM1MfomHj5BqC4bM55LECvYqv7/4E Yd6B0Nk1wpS3W22now8OquYvrZwT3ERSrF1ZlRkdBW/flqCv2nUp19kUNoDjpJn9i8fQ Oegg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218057; x=1704822857; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=wKrEvGkZ8GQyBX+qIUqWP+XNbS2tofN+iPdO+E071Zk=; b=vHvXkZ0cGFdrKNdhXcPYRqGkwghIGk30VBmlMi1S1s1KK5hogEGyb2s/2moioaveo6 T8e5IKIFGWx4uOLp9T0WBkosv/lUi3lDgy3pRuzu5L8lEhTvBdxImBTuooMYEALJ6tyi ZyIxDeuJ2+opFAGywBmadIUeyRXLt+aNqmvgmlyai5oI5UFgg7VeHNh55BS7lltyF6Na uDyhwgH5y+mTG8CH91+267Izfs36+tcVRzUGS/+FszueotzrGzpYn+DUlKeya143A249 KoP/f3LGsYkCp81RlRw62UHqmEmLtATQECgvG+mjG5DLeWk8ICtVCLr0yYK74mKyAe1v T2Bg== X-Gm-Message-State: AOJu0YwoA2gleZKhCd5axNMYXDEu1N3Y0CEkAkiDZ2a6dqxDJ1HqKcRP oo+75Txaz8w2/RYZ183QjlI= X-Google-Smtp-Source: AGHT+IEdREnUFQzLQLXziwcbdTCFJIEnWtbesYR/Yi2j7ybSJZObtayDa/HB7ljxXyH3M9oPbUbcVA== X-Received: by 2002:a17:902:e88e:b0:1d4:cdcf:97e8 with SMTP id w14-20020a170902e88e00b001d4cdcf97e8mr238233plg.126.1704218056928; Tue, 02 Jan 2024 09:54:16 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.54.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:16 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 7/9] mm/swap: avoid a duplicated swap cache lookup for SWP_SYNCHRONOUS_IO Date: Wed, 3 Jan 2024 01:53:36 +0800 Message-ID: <20240102175338.62012-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song When a xa_value is returned by the cache lookup, keep it to be used later for workingset refault check instead of doing the looking up again in swapin_no_readahead. This does have a side effect of making swapoff also triggers workingset check, but should be fine since swapoff does affect the workload in many ways already. After this commit, swappin is about 4% faster for ZRAM, micro benchmark result which use madvise to swap out 10G zero-filled data to ZRAM then read them in: Before: 11143285 us After: 10692644 us (+4.1%) Signed-off-by: Kairui Song --- mm/shmem.c | 2 +- mm/swap.h | 3 ++- mm/swap_state.c | 24 +++++++++++++----------- 3 files changed, 16 insertions(+), 13 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 928aa2304932..9da9f7a0e620 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1872,7 +1872,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, } =20 /* Look it up and read it in.. */ - folio =3D swap_cache_get_folio(swap, NULL, 0); + folio =3D swap_cache_get_folio(swap, NULL, 0, NULL); if (!folio) { /* Or update major stats only when swapin succeeds?? */ if (fault_type) { diff --git a/mm/swap.h b/mm/swap.h index 1f4cdb324bf0..9180411afcfe 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -58,7 +58,8 @@ void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr); + struct vm_area_struct *vma, unsigned long addr, + void **shadowp); struct folio *filemap_get_incore_folio(struct address_space *mapping, pgoff_t index); =20 diff --git a/mm/swap_state.c b/mm/swap_state.c index f6f1e6f5d782..21badd4f0fc7 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -335,12 +335,18 @@ static inline bool swap_use_vma_readahead(void) * Caller must lock the swap device or hold a reference to keep it valid. */ struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr) + struct vm_area_struct *vma, unsigned long addr, void **shadowp) { struct folio *folio; =20 - folio =3D filemap_get_folio(swap_address_space(entry), swp_offset(entry)); - if (!IS_ERR(folio)) { + folio =3D filemap_get_entry(swap_address_space(entry), swp_offset(entry)); + if (xa_is_value(folio)) { + if (shadowp) + *shadowp =3D folio; + return NULL; + } + + if (folio) { bool vma_ra =3D swap_use_vma_readahead(); bool readahead; =20 @@ -370,8 +376,6 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, if (!vma || !vma_ra) atomic_inc(&swapin_readahead_hits); } - } else { - folio =3D NULL; } =20 return folio; @@ -876,11 +880,10 @@ static struct folio *swap_vma_readahead(swp_entry_t t= arg_entry, gfp_t gfp_mask, * in. */ static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) + struct vm_fault *vmf, void *shadow) { struct vm_area_struct *vma =3D vmf->vma; struct folio *folio; - void *shadow =3D NULL; =20 /* skip swapcache */ folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, @@ -897,7 +900,6 @@ static struct folio *swapin_direct(swp_entry_t entry, g= fp_t gfp_mask, =20 mem_cgroup_swapin_uncharge_swap(entry); =20 - shadow =3D get_shadow_from_swap_cache(entry); if (shadow) workingset_refault(folio, shadow); =20 @@ -931,17 +933,18 @@ struct folio *swapin_entry(swp_entry_t entry, gfp_t g= fp_mask, { enum swap_cache_result cache_result; struct mempolicy *mpol; + void *shadow =3D NULL; struct folio *folio; pgoff_t ilx; =20 - folio =3D swap_cache_get_folio(entry, vmf->vma, vmf->address); + folio =3D swap_cache_get_folio(entry, vmf->vma, vmf->address, &shadow); if (folio) { cache_result =3D SWAP_CACHE_HIT; goto done; } =20 if (swap_use_no_readahead(swp_swap_info(entry), entry)) { - folio =3D swapin_direct(entry, gfp_mask, vmf); + folio =3D swapin_direct(entry, gfp_mask, vmf, shadow); cache_result =3D SWAP_CACHE_BYPASS; } else { mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); @@ -952,7 +955,6 @@ struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp= _mask, mpol_cond_put(mpol); cache_result =3D SWAP_CACHE_MISS; } - done: if (result) *result =3D cache_result; --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DAF5168DD for ; Tue, 2 Jan 2024 17:54:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hhLgzUK5" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1d43df785c2so21399785ad.1 for ; Tue, 02 Jan 2024 09:54:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218060; x=1704822860; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=Plx2aGPJbBehNqIW4p1L+Cff2qlR20r6A012vbacRHU=; b=hhLgzUK5N8rK40R24rPF0X3w4oIb+1Sk+wiRsDI2sJRpZ5Yn5/Mj8E/8x5WrVDBAYX PIokK0Eu0TfIUZiImYkD59cp0fxx5UPvwzxaA1BGkL/0i9uQ9qYWh4/mlL4AP8XIZ4d1 4vNAETOQuO1Gk8X3Eyzf1Z2CXs2BCpQ4mbfKHUK5aynVLziY4xKEa2fuq6gtl3R5j1K2 L2skehtg0qqvtx2AoRkHsQ+JTGzwuFNiKgo4ux6PDEUruK7cRTDPL+1pzN3EneGh+cbc DFVOcOe0ubqQoWI3iqCs30s9/kOAgrGLkuLY7J9js9bBj6NpWJBZeWYU7g0pXak6wXTX O+kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218060; x=1704822860; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Plx2aGPJbBehNqIW4p1L+Cff2qlR20r6A012vbacRHU=; b=DL7mJTHUO3hjxhPtsYH6XJyOqNhK3xz7yHlAB/rpq17tr213DCAFOcD3xtKiQb12FA I7GkJeNUweA/XsScx1bI1cdyiXCHjeTFc1Ee3GQ1Svycpwpxo4Vx/yDWQpCFejuBOnLa QY+ontag6U+Mf88VR53uM2AirExjzDqI+RmrlFzvaRvqWNXHqA0GOXrfXgPiMDl6qCtC pgHYRlv0SyByM8gHfZY7HCWG/3ePsHHiMeZmL8nk2Pqw3ieX2jz+drvjYN5hI8oTnAtR CG7X1LvnV2VOs4BtnsBK3Z3TAnMLLMgOAkQSW+iKSP/wMeibEhqJnQYnxV50r9ZzY25q JN8w== X-Gm-Message-State: AOJu0Yx69j7pSImom8TkkvySoqT0f9EGD5/RPjBrqB+2Vtu+FGNtxs7R wwOY1cuH7l/wFErZ3X41MEU= X-Google-Smtp-Source: AGHT+IGr1fEmKCpFGmKewU8/hVrmqMb8AM0dTCE+QEyY4/Rq+9azV/ZjBTyHv60Nu5X7VVBSfSChoA== X-Received: by 2002:a17:903:110d:b0:1d4:cc31:71e with SMTP id n13-20020a170903110d00b001d4cc31071emr483019plh.67.1704218060519; Tue, 02 Jan 2024 09:54:20 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.54.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:19 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 8/9] mm/swap: introduce a helper for swapin without vmfault Date: Wed, 3 Jan 2024 01:53:37 +0800 Message-ID: <20240102175338.62012-9-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song There are two places where swapin is not caused by direct anon page fault: - shmem swapin, invoked indirectly through shmem mapping - swapoff They used to construct a pseudo vmfault struct for swapin function. Shmem has dropped the pseudo vmfault recently in commit ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"). Swapoff path is still using one. Introduce a helper for them both, this help save stack usage for swapoff path, and help apply a unified swapin cache and readahead policy check. Due to missing vmfault info, the caller have to pass in mempolicy explicitly, make it different from swapin_entry and name it swapin_entry_mpol. This commit convert swapoff to use this helper, follow-up commits will convert shmem to use it too. Signed-off-by: Kairui Song --- mm/swap.h | 9 +++++++++ mm/swap_state.c | 40 ++++++++++++++++++++++++++++++++-------- mm/swapfile.c | 15 ++++++--------- 3 files changed, 47 insertions(+), 17 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index 9180411afcfe..8f790a67b948 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -73,6 +73,9 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, g= fp_t flag, struct mempolicy *mpol, pgoff_t ilx); struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf, enum swap_cache_result *result); +struct folio *swapin_entry_mpol(swp_entry_t entry, gfp_t gfp_mask, + struct mempolicy *mpol, pgoff_t ilx, + enum swap_cache_result *result); =20 static inline unsigned int folio_swap_flags(struct folio *folio) { @@ -109,6 +112,12 @@ static inline struct folio *swapin_entry(swp_entry_t s= wp, gfp_t gfp_mask, return NULL; } =20 +static inline struct page *swapin_entry_mpol(swp_entry_t entry, gfp_t gfp_= mask, + struct mempolicy *mpol, pgoff_t ilx, enum swap_cache_result *result) +{ + return NULL; +} + static inline int swap_writepage(struct page *p, struct writeback_control = *wbc) { return 0; diff --git a/mm/swap_state.c b/mm/swap_state.c index 21badd4f0fc7..3edf4b63158d 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -880,14 +880,13 @@ static struct folio *swap_vma_readahead(swp_entry_t t= arg_entry, gfp_t gfp_mask, * in. */ static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf, void *shadow) + struct mempolicy *mpol, pgoff_t ilx, + void *shadow) { - struct vm_area_struct *vma =3D vmf->vma; struct folio *folio; =20 - /* skip swapcache */ - folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); + folio =3D (struct folio *)alloc_pages_mpol(gfp_mask, 0, + mpol, ilx, numa_node_id()); if (folio) { if (mem_cgroup_swapin_charge_folio(folio, NULL, GFP_KERNEL, entry)) { @@ -943,18 +942,18 @@ struct folio *swapin_entry(swp_entry_t entry, gfp_t g= fp_mask, goto done; } =20 + mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); if (swap_use_no_readahead(swp_swap_info(entry), entry)) { - folio =3D swapin_direct(entry, gfp_mask, vmf, shadow); + folio =3D swapin_direct(entry, gfp_mask, mpol, ilx, shadow); cache_result =3D SWAP_CACHE_BYPASS; } else { - mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); if (swap_use_vma_readahead()) folio =3D swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); else folio =3D swap_cluster_readahead(entry, gfp_mask, mpol, ilx); - mpol_cond_put(mpol); cache_result =3D SWAP_CACHE_MISS; } + mpol_cond_put(mpol); done: if (result) *result =3D cache_result; @@ -962,6 +961,31 @@ struct folio *swapin_entry(swp_entry_t entry, gfp_t gf= p_mask, return folio; } =20 +struct folio *swapin_entry_mpol(swp_entry_t entry, gfp_t gfp_mask, + struct mempolicy *mpol, pgoff_t ilx, + enum swap_cache_result *result) +{ + enum swap_cache_result cache_result; + void *shadow =3D NULL; + struct folio *folio; + + folio =3D swap_cache_get_folio(entry, NULL, 0, &shadow); + if (folio) { + cache_result =3D SWAP_CACHE_HIT; + } else if (swap_use_no_readahead(swp_swap_info(entry), entry)) { + folio =3D swapin_direct(entry, gfp_mask, mpol, ilx, shadow); + cache_result =3D SWAP_CACHE_BYPASS; + } else { + folio =3D swap_cluster_readahead(entry, gfp_mask, mpol, ilx); + cache_result =3D SWAP_CACHE_MISS; + } + + if (result) + *result =3D cache_result; + + return folio; +} + #ifdef CONFIG_SYSFS static ssize_t vma_ra_enabled_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) diff --git a/mm/swapfile.c b/mm/swapfile.c index 5aa44de11edc..2f77bf143af8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1840,18 +1840,13 @@ static int unuse_pte_range(struct vm_area_struct *v= ma, pmd_t *pmd, do { struct folio *folio; unsigned long offset; + struct mempolicy *mpol; unsigned char swp_count; swp_entry_t entry; + pgoff_t ilx; int ret; pte_t ptent; =20 - struct vm_fault vmf =3D { - .vma =3D vma, - .address =3D addr, - .real_address =3D addr, - .pmd =3D pmd, - }; - if (!pte++) { pte =3D pte_offset_map(pmd, addr); if (!pte) @@ -1871,8 +1866,10 @@ static int unuse_pte_range(struct vm_area_struct *vm= a, pmd_t *pmd, pte_unmap(pte); pte =3D NULL; =20 - folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, - &vmf, NULL); + mpol =3D get_vma_policy(vma, addr, 0, &ilx); + folio =3D swapin_entry_mpol(entry, GFP_HIGHUSER_MOVABLE, + mpol, ilx, NULL); + mpol_cond_put(mpol); if (!folio) { /* * The entry could have been freed, and will not --=20 2.43.0 From nobody Fri Dec 26 17:19:04 2025 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAC93171BF for ; Tue, 2 Jan 2024 17:54:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hSF6x50r" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1d3e2972f65so23606835ad.3 for ; Tue, 02 Jan 2024 09:54:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704218064; x=1704822864; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=sYlN8vSKv83J8k3KTy0zWmAC8wYuURkZt3/TH4YiDKw=; b=hSF6x50rWYkNjIKgbpg55GQuU89JYijV3c15EOyXX+TEpgqd61P+OgclIbgfg4V/WL fSdri9N9pAs/IVYKaSISENPNBdNm/z2UnDCtgvN1mOcwbxCBaAswrPZaI2+QlGpoo78R YzRuos0hC9fAFjJRtIp+J9JdoWae6JODji943kt2U/oWHDKumsM+dz68rAje9NW4rJcn wpoY8QqmabWUemUmqGReu7SChM6egszE3U7YMLxI6/WGZ2QQVFzYYDdOdqISGy4G0h6i EbJAlO6aC+ELFeKCdN4Sh2ZVySm3sr1ylmACdII/PUBVJOd1yueuralUEVmvRFlQ5Mu6 PwcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704218064; x=1704822864; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=sYlN8vSKv83J8k3KTy0zWmAC8wYuURkZt3/TH4YiDKw=; b=wiWIw2tAcwen2GLBPbMeaqdFa5lvH2D3XyO6JIOMaQhDmNWgKbJCc7daBdjqB1Ccah 0V+4Pow6cKU/ZCwEgVKLR5WqA/Vwvc+eKd2YoPPM0Z6nRvi72ctmIFS8L4iBQKSNM0kL j01bmOxBMaNHLcE3O1NdmAnECF5y5jBr/kLHi1fyEiLfQiLna5hkXstGWavnrNFjchgU zaRdlJoHt91uq1tAbA0HiXLvrrH9EFO+uE4X9z7QiVwLiLhtodsMIvFjMw/C4jwql1sB aFYVzUssQO7d1tAdJ96SEg/oe8YDABOF99Bvns+odkF2pTIhe3hylRmUoZGDslmDaulV PBPA== X-Gm-Message-State: AOJu0YypICnCFkmgoX89nnxwaPMpQdHHRMfW7AWEh+thhpFMYQLxkJWf XBnB2XjHkorGbi/E9jViObgB325okUKkTplt X-Google-Smtp-Source: AGHT+IEl/2qWBPW9Z5FVh74mUbrKLR970e9KepR/VbDcVwyuR9p3xpCkYWqKFuuUY+RAsB8Q97us6w== X-Received: by 2002:a17:902:ea06:b0:1d4:2066:68c with SMTP id s6-20020a170902ea0600b001d42066068cmr8090026plg.130.1704218064019; Tue, 02 Jan 2024 09:54:24 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.41.9]) by smtp.gmail.com with ESMTPSA id be10-20020a170902aa0a00b001d3c3d486bfsm22151969plb.163.2024.01.02.09.54.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 02 Jan 2024 09:54:23 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 9/9] mm/swap, shmem: use new swapin helper to skip readahead conditionally Date: Wed, 3 Jan 2024 01:53:38 +0800 Message-ID: <20240102175338.62012-10-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240102175338.62012-1-ryncsn@gmail.com> References: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently, shmem uses cluster readahead for all swap backends. Cluster readahead is not a good solution for ramdisk based device (ZRAM) at all. After switching to the new helper, most benchmarks showed a good result: - Single file sequence read: perf stat --repeat 20 dd if=3D/tmpfs/test of=3D/dev/null bs=3D1M count=3D= 8192 (/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit) Before: 22.248 +- 0.549 After: 22.021 +- 0.684 (-1.1%) - Random read stress test: fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/tmpfs \ --size=3D256m --ioengine=3Dmmap --rw=3Drandread --random_distribution=3Dr= andom \ --time_based --ramp_time=3D1m --runtime=3D5m --group_reporting (using brd as swap, 2G memcg limit) Before: 1818MiB/s After: 1888MiB/s (+3.85%) - Zipf biased random read stress test: fio -name=3Dtmpfs --numjobs=3D16 --directory=3D/tmpfs \ --size=3D256m --ioengine=3Dmmap --rw=3Drandread --random_distribution=3Dz= ipf:1.2 \ --time_based --ramp_time=3D1m --runtime=3D5m --group_reporting (using brd as swap, 2G memcg limit) Before: 31.1GiB/s After: 32.3GiB/s (+3.86%) So cluster readahead doesn't help much even for single sequence read, and for random stress test, the performance is better without it. Considering both memory and swap device will get more fragmented slowly, and commonly used ZRAM consumes much more CPU than plain ramdisk, false readahead could occur more frequently and waste more CPU. Direct SWAP is cheaper, so use the new helper and skip read ahead for SWP_SYNCHRONOUS_IO device. Signed-off-by: Kairui Song --- mm/shmem.c | 67 +++++++++++++++++++++++++------------------------ mm/swap.h | 9 ------- mm/swap_state.c | 11 ++++++-- 3 files changed, 43 insertions(+), 44 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 9da9f7a0e620..3c0729fe934d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1564,20 +1564,6 @@ static inline struct mempolicy *shmem_get_sbmpol(str= uct shmem_sb_info *sbinfo) static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_info *i= nfo, pgoff_t index, unsigned int order, pgoff_t *ilx); =20 -static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp, - struct shmem_inode_info *info, pgoff_t index) -{ - struct mempolicy *mpol; - pgoff_t ilx; - struct folio *folio; - - mpol =3D shmem_get_pgoff_policy(info, index, 0, &ilx); - folio =3D swap_cluster_readahead(swap, gfp, mpol, ilx); - mpol_cond_put(mpol); - - return folio; -} - /* * Make sure huge_gfp is always more limited than limit_gfp. * Some of the flags set permissions, while others set limitations. @@ -1851,9 +1837,12 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, { struct address_space *mapping =3D inode->i_mapping; struct shmem_inode_info *info =3D SHMEM_I(inode); + enum swap_cache_result cache_result; struct swap_info_struct *si; struct folio *folio =3D NULL; + struct mempolicy *mpol; swp_entry_t swap; + pgoff_t ilx; int error; =20 VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); @@ -1871,36 +1860,40 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, return -EINVAL; } =20 - /* Look it up and read it in.. */ - folio =3D swap_cache_get_folio(swap, NULL, 0, NULL); + mpol =3D shmem_get_pgoff_policy(info, index, 0, &ilx); + folio =3D swapin_entry_mpol(swap, gfp, mpol, ilx, &cache_result); + mpol_cond_put(mpol); + if (!folio) { - /* Or update major stats only when swapin succeeds?? */ + error =3D -ENOMEM; + goto failed; + } + if (cache_result !=3D SWAP_CACHE_HIT) { if (fault_type) { *fault_type |=3D VM_FAULT_MAJOR; count_vm_event(PGMAJFAULT); count_memcg_event_mm(fault_mm, PGMAJFAULT); } - /* Here we actually start the io */ - folio =3D shmem_swapin_cluster(swap, gfp, info, index); - if (!folio) { - error =3D -ENOMEM; - goto failed; - } } =20 /* We have to do this with folio locked to prevent races */ folio_lock(folio); - if (!folio_test_swapcache(folio) || - folio->swap.val !=3D swap.val || - !shmem_confirm_swap(mapping, index, swap)) { + if (cache_result !=3D SWAP_CACHE_BYPASS) { + /* With cache bypass, folio is new allocated, sync, and not in cache */ + if (!folio_test_swapcache(folio) || folio->swap.val !=3D swap.val) { + error =3D -EEXIST; + goto unlock; + } + if (!folio_test_uptodate(folio)) { + error =3D -EIO; + goto failed; + } + folio_wait_writeback(folio); + } + if (!shmem_confirm_swap(mapping, index, swap)) { error =3D -EEXIST; goto unlock; } - if (!folio_test_uptodate(folio)) { - error =3D -EIO; - goto failed; - } - folio_wait_writeback(folio); =20 /* * Some architectures may have to restore extra metadata to the @@ -1908,12 +1901,19 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, */ arch_swap_restore(swap, folio); =20 - if (shmem_should_replace_folio(folio, gfp)) { + /* With cache bypass, folio is new allocated and always respect gfp flags= */ + if (cache_result !=3D SWAP_CACHE_BYPASS && shmem_should_replace_folio(fol= io, gfp)) { error =3D shmem_replace_folio(&folio, gfp, info, index); if (error) goto failed; } =20 + /* + * The expected value checking below should be enough to ensure + * only one up-to-date swapin success. swap_free() is called after + * this, so the entry can't be reused. As long as the mapping still + * has the old entry value, it's never swapped in or modified. + */ error =3D shmem_add_to_page_cache(folio, mapping, index, swp_to_radix_entry(swap), gfp); if (error) @@ -1924,7 +1924,8 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, if (sgp =3D=3D SGP_WRITE) folio_mark_accessed(folio); =20 - delete_from_swap_cache(folio); + if (cache_result !=3D SWAP_CACHE_BYPASS) + delete_from_swap_cache(folio); folio_mark_dirty(folio); swap_free(swap); put_swap_device(si); diff --git a/mm/swap.h b/mm/swap.h index 8f790a67b948..20f4048c971c 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -57,9 +57,6 @@ void __delete_from_swap_cache(struct folio *folio, void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); -struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr, - void **shadowp); struct folio *filemap_get_incore_folio(struct address_space *mapping, pgoff_t index); =20 @@ -123,12 +120,6 @@ static inline int swap_writepage(struct page *p, struc= t writeback_control *wbc) return 0; } =20 -static inline struct folio *swap_cache_get_folio(swp_entry_t entry, - struct vm_area_struct *vma, unsigned long addr) -{ - return NULL; -} - static inline struct folio *filemap_get_incore_folio(struct address_space *mapping, pgoff_t index) diff --git a/mm/swap_state.c b/mm/swap_state.c index 3edf4b63158d..10eec68475dd 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -318,7 +318,14 @@ void free_pages_and_swap_cache(struct encoded_page **p= ages, int nr) =20 static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_= entry_t entry) { - return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) = =3D=3D 1; + int count; + + if (!data_race(si->flags & SWP_SYNCHRONOUS_IO)) + return false; + + count =3D __swap_count(entry); + + return (count =3D=3D 1 || count =3D=3D SWAP_MAP_SHMEM); } =20 static inline bool swap_use_vma_readahead(void) @@ -334,7 +341,7 @@ static inline bool swap_use_vma_readahead(void) * * Caller must lock the swap device or hold a reference to keep it valid. */ -struct folio *swap_cache_get_folio(swp_entry_t entry, +static struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr, void **shadowp) { struct folio *folio; --=20 2.43.0