From nobody Wed Oct 8 14:18:11 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB690202C3A; Fri, 27 Jun 2025 06:22:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005367; cv=none; b=TEyPJSdPDVUIOR/POYTG/MBFC2K8MxRjuiao7GVfQGzI1XReVD4Nf+JgDT/nsWKK+izOwAusbFckdPasLTkXD4uJc3/a/Bm0ezb5Ig2PrI8I//l/kSbIeLwUQNkSZ0dBjSI6E7N584MYYK/uIywyuxI3S+cr+z//Xyw+gScdspg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005367; c=relaxed/simple; bh=NR1Le/QeGmW3hnvTusfj4VqQPx1dEdIZ91YrkkAOqLM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iWpKPWyvx4bWextoxKeIuLcne48VAboFC0CYmOqUpVTkMTmxS0uUjqOKkSTfjEsdepnrFQjmSxn4alWNFb14x6/+KY/QY939qLq07so4Oyyy1QL+oOQ+ixRlt+XwieIb8H6QLAirxQL4iIJvJC55WSDvz06nT1joh5E156sbicw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=atfME/wd; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="atfME/wd" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-748f5a4a423so1279194b3a.1; Thu, 26 Jun 2025 23:22:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005365; x=1751610165; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=v/lPSXdBTYyYcW2f1P/9Kus+D32Lno4t+GB7GicU1lE=; b=atfME/wdOkkQeOyrzLRDRI30piORfIfup+Fab6XCOfue9pY6u1dcIDSuANVxXyzgo2 t82jvM5u04QS2zFRStBXVRdKDxK8F22qu6HLm/NwsBS94h6WknD/l3e4zufWJ2yI8TSw Igt5rFWXp3FS2PVUGHLyKgqByJ/Ln2ruLmE5tmh46ok5VpFxRMxZa30AL85PRzPSXZs4 7JJm2tHseBHAKvy22efjEyZ0100VGE50O1BoghF9BjMXcf2zCDhXq/aH+nCnNB2yBmsF OpifxsxX92ouXDxgA428EP/drFJp5tPytP5dBKCI0VRWJQWsQ4oIxWGHMKvpa7KyRRfm oAxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005365; x=1751610165; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=v/lPSXdBTYyYcW2f1P/9Kus+D32Lno4t+GB7GicU1lE=; b=LpxSEBQ6Cdp+VVu7HbHxMwiaKtG9Ck0nNibcprgcCq+fa3Jbtc5xqd37war45jLk8Z StzkBsYlZ/igopeQ2cYxahoWtnTSiju1Qcydnt3X8cTCft5y+iaurc/xy5UxXaY2zeo/ bOGr1HheTID2Lkv0DsNaQmD4ZXYqu3CZI1pA/YBY1AKqLP00hWj1i1wWIHmlFgGaUFd2 RVNuos0lcIIKADu0cvOBkYlQYbpndU91GZgBsGbH1FRClELH9AOz8ORGwvqWXpBWDUXc k9bjTSZhEtocchOhMbo5gCoKG5A6DHA7kC15mfhmQYMBbNIIwibOJ1c/HHDq8wTOVAr6 umjA== X-Forwarded-Encrypted: i=1; AJvYcCVBQRwb9111usquD+0S2bSecbBFg10BJvjjPTfWKDw1UveVc7MUZTeJ8iwu3GsWeLBUeKynVpWZSUNKOhs=@vger.kernel.org, AJvYcCWZHwJ73vx5Q7jpDAnZpWaK7DBsMPFeLDmIka/uTrohDZ+qausVl37XfgPDwM4ziXihU2Ox3GdJ@vger.kernel.org X-Gm-Message-State: AOJu0YxCphIqhtTyJBsF3+m/1qqAU2hhu8/eJCjewUiU/ZfdvPCJ5FNE mRzyDMyFowUTOXWJpyFZ59GQ0mh0dy1/o3BxmrTON6RdXuNLeBXErcXq X-Gm-Gg: ASbGncsbGzT6Iy1uySCfwWX2jlKuMKsb7CytmbdD8gwyS02juVRWR3P/D9TPxA+2Ymf s0VlacOQjP65ep66ThJNd2SEf9O+oZbewTaqxqkhl6pLxcGaSTGg1VENu+A+g6OD+ACEdTLwwoM WtlllNvXqGdZk/5F5CLfLTXFISGBW45oYWx6VqgvUMm7pgWUEwhL7gIda3yDWUXjteVMLgvNu8w xqA3mqBG+M4BYkEG3ISwyebVZDMO3s9Ri1idQVqnwaJ1CnBtQlMYXngSok1DOe2ttqk8I6drxUB bPRIUH0x5jXCBLW8BoRT/EOM7IDFdpcUC9e0XPx+3x786KgvJ9OvRP7pzsXMuy1mI1l+lxrArTO v X-Google-Smtp-Source: AGHT+IEHehy1Nsijf5r8v/quVVBhH3AlbhVTvjTmjKxr/kL02DNDQ1CmikQ6GIVuWlF8J/CCf3tK0Q== X-Received: by 2002:a05:6a00:2ea6:b0:748:3a1a:ba72 with SMTP id d2e1a72fcca58-74af701c716mr2887242b3a.20.1751005364750; Thu, 26 Jun 2025 23:22:44 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.22.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:22:44 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song , stable@vger.kernel.org Subject: [PATCH v3 1/7] mm/shmem, swap: improve cached mTHP handling and fix potential hung Date: Fri, 27 Jun 2025 14:20:14 +0800 Message-ID: <20250627062020.534-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song The current swap-in code assumes that, when a swap entry in shmem mapping is order 0, its cached folios (if present) must be order 0 too, which turns out not always correct. The problem is shmem_split_large_entry is called before verifying the folio will eventually be swapped in, one possible race is: CPU1 CPU2 shmem_swapin_folio /* swap in of order > 0 swap entry S1 */ folio =3D swap_cache_get_folio /* folio =3D NULL */ order =3D xa_get_order /* order > 0 */ folio =3D shmem_swap_alloc_folio /* mTHP alloc failure, folio =3D NULL */ <... Interrupted ...> shmem_swapin_folio /* S1 is swapped in */ shmem_writeout /* S1 is swapped out, folio cached */ shmem_split_large_entry(..., S1) /* S1 is split, but the folio covering it has order > 0 now */ Now any following swapin of S1 will hang: `xa_get_order` returns 0, and folio lookup will return a folio with order > 0. The `xa_get_order(&mapping->i_pages, index) !=3D folio_order(folio)` will always return false causing swap-in to return -EEXIST. And this looks fragile. So fix this up by allowing seeing a larger folio in swap cache, and check the whole shmem mapping range covered by the swapin have the right swap value upon inserting the folio. And drop the redundant tree walks before the insertion. This will actually improve performance, as it avoids two redundant Xarray tree walks in the hot path, and the only side effect is that in the failure path, shmem may redundantly reallocate a few folios causing temporary slight memory pressure. And worth noting, it may seems the order and value check before inserting might help reducing the lock contention, which is not true. The swap cache layer ensures raced swapin will either see a swap cache folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is bypassed), so holding the folio lock and checking the folio flag is already good enough for avoiding the lock contention. The chance that a folio passes the swap entry value check but the shmem mapping slot has changed should be very low. Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") Signed-off-by: Kairui Song Reviewed-by: Kemeng Shi Cc: Reviewed-by: Baolin Wang Tested-by: Baolin Wang --- mm/shmem.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 334b7b4a61a0..e3c9a1365ff4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -884,7 +884,9 @@ static int shmem_add_to_page_cache(struct folio *folio, pgoff_t index, void *expected, gfp_t gfp) { XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio)); - long nr =3D folio_nr_pages(folio); + unsigned long nr =3D folio_nr_pages(folio); + swp_entry_t iter, swap; + void *entry; =20 VM_BUG_ON_FOLIO(index !=3D round_down(index, nr), folio); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); @@ -896,14 +898,24 @@ static int shmem_add_to_page_cache(struct folio *foli= o, =20 gfp &=3D GFP_RECLAIM_MASK; folio_throttle_swaprate(folio, gfp); + swap =3D iter =3D radix_to_swp_entry(expected); =20 do { xas_lock_irq(&xas); - if (expected !=3D xas_find_conflict(&xas)) { - xas_set_err(&xas, -EEXIST); - goto unlock; + xas_for_each_conflict(&xas, entry) { + /* + * The range must either be empty, or filled with + * expected swap entries. Shmem swap entries are never + * partially freed without split of both entry and + * folio, so there shouldn't be any holes. + */ + if (!expected || entry !=3D swp_to_radix_entry(iter)) { + xas_set_err(&xas, -EEXIST); + goto unlock; + } + iter.val +=3D 1 << xas_get_order(&xas); } - if (expected && xas_find_conflict(&xas)) { + if (expected && iter.val - nr !=3D swap.val) { xas_set_err(&xas, -EEXIST); goto unlock; } @@ -2323,7 +2335,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, error =3D -ENOMEM; goto failed; } - } else if (order !=3D folio_order(folio)) { + } else if (order > folio_order(folio)) { /* * Swap readahead may swap in order 0 folios into swapcache * asynchronously, while the shmem mapping can still stores @@ -2348,15 +2360,15 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, =20 swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); } + } else if (order < folio_order(folio)) { + swap.val =3D round_down(swap.val, 1 << folio_order(folio)); } =20 alloced: /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || - folio->swap.val !=3D swap.val || - !shmem_confirm_swap(mapping, index, swap) || - xa_get_order(&mapping->i_pages, index) !=3D folio_order(folio)) { + folio->swap.val !=3D swap.val) { error =3D -EEXIST; goto unlock; } --=20 2.50.0 From nobody Wed Oct 8 14:18:11 2025 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D68D15746E for ; Fri, 27 Jun 2025 06:22:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005371; cv=none; b=dMAA9BSyoL1kG9zxL0hZxRgZccnjaZSNFJ/L0/llK4b5oFdpz4liplwrANokpPynfYd7396FJThkZ5zj1JpXTt0fwZqYoUjK32YfKmjfsjcxCAroaiirdhSROF4YKpPpMfrP/hOxJryqYLPZF/Ab4e4EwTDFVQ6hcNO5Hv0E96o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005371; c=relaxed/simple; bh=sMzcydJGDBsWjPrRWFzyKAcbh3UK5atqGzi5zVIOv70=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FMbqGrLQEvVXpHEi1v87Y1dAi6RX9AFvOadeilc1HVCNmw2MmP62iX43q8JBrawyuH4Zhp7LnxmcDmtm/azAk1UN3yXJ3TKHBpKMYRB9OlwstbwQQEmRI+jA3A80xKBxNLQbdGqSmZB4D++8sbT9PIMvDFDmC2O1rm6U38Lfbm0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=g8Kt66gp; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="g8Kt66gp" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-74924255af4so1617617b3a.1 for ; Thu, 26 Jun 2025 23:22:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005369; x=1751610169; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=/hWgoWI8jaJ1OTfxVEFPXPhBOEtz+zuwq/NuSmx4sE0=; b=g8Kt66gpUR7HFS5EoHNvOdLX4CD2hju0VqJBoZ+9uNRwmT/Jtn5dpALTiUBAI5RxSq vHHmicXgd7Q6D7Nj4NncFTf7mfjS+bEaMOCS8rliRLGFsY4iUErg30kljAbd87MxVMIo ohprFTEzcyxrVVgkh/XF6w8TPPQ7fWynYbOziYrw4pIslaAs+ZQKoB/u9YgQs+x7nfnI lBv8bUkIgPljt8HQiq35+61p2LHJ5FVliu7IzYFQWDPasYqKM4F8E5+ZGifwx4zGfkEa RWua10ZB72AmAuvTbeazJ5NtHzyDjx2ujGjmmBoI//rb74kgehNiWVOi0ut+wTIlwJLJ IxSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005369; x=1751610169; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/hWgoWI8jaJ1OTfxVEFPXPhBOEtz+zuwq/NuSmx4sE0=; b=h41S6mdU8XG3xm2KL4bZ1X7XeAxHgPS6yznS0yUIa2z116EhRqA1uWWbswLkulI17T crlPe55B93peROIpCrtBkWP1QwxPRWYJAXsBcQqKNxKy5Qo/jsJS31EyJIi3rxbxwdzb D8oaabzHk1qlcX8Wa294awA+gR7tKZ1tRrfvD3YVBF5CT1kAORcIKIBpYBWBNC0Ujbpc 61yM4AOIP4XNRAT/3XvsTdK3J3pOSLbF/h7rNvrFoLIxe5NzL9yQTczbzFWJp4CXuvEZ 7i0OoA+1L2C51bShM4hX7fZr+v0XOTSopzgqXpXr1h6iRWiYaU8Dkgsgx6H5BJv6gXU4 wodg== X-Forwarded-Encrypted: i=1; AJvYcCUcuFCB440qLwGHJLTrJLkvT80Rza6FnYyZiJjwGsIEW8PYE1Epqow3RjCxPOqQuuY2tUhw1dj7HDVCz8o=@vger.kernel.org X-Gm-Message-State: AOJu0Yy8Sbmu+gVYIundFh1cKdr/mNxkMzb27GjKyZMkE6Qsi0zqqf/9 LNMxLfazEOZtN1JQ6VpY8Pd8CW8r8JCPGVWtrjTuFJsoSuf7ld95eQ8t X-Gm-Gg: ASbGncu0atRmtxTWUNQInS7kricGaJj1Mzn0rc9EoVpVrmh9RxEuePMtD3h6HXJ0LN/ 3cZGKAyf699xgH9svfPYb6gQkfU1DElZqJVeUmTX0hj5Q3oWq0RJa/u4wpe4L0s+GIcy+mJ0Kg1 cIBk5HyrK8pKbq06xcTll5/n8cqrdyPSI55NoIPWVSpTgGfdnsOjpLMdTzKKuv1nRciKUVR7qPE HKvNiE7kkJTzS1/UFDMioLKbNVTKwny5hEtBbJdNz5elHTZhEwtyVAoA9NAGvo+7InGhzCkuMdI cK6bNYsGicSXUMJ8pgZ9nl6wI8sBjWHTGVY2szCPeLvVwVwjgDn4Jsh7YYxaToTZXNfITRYoMBC EjsN3CM3kGwQ= X-Google-Smtp-Source: AGHT+IGEMRNMMobFjW+JqiZRS8skcEyEbLLPp5CM2aFvT1VUOfcS0+W5bUpwPXWMXSptyteMv7HUMw== X-Received: by 2002:a05:6a00:130a:b0:742:da7c:3f30 with SMTP id d2e1a72fcca58-74af6f4cf7cmr2863972b3a.19.1751005368810; Thu, 26 Jun 2025 23:22:48 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.22.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:22:48 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song , Dev Jain Subject: [PATCH v3 2/7] mm/shmem, swap: avoid redundant Xarray lookup during swapin Date: Fri, 27 Jun 2025 14:20:15 +0800 Message-ID: <20250627062020.534-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently shmem calls xa_get_order to get the swap radix entry order, requiring a full tree walk. This can be easily combined with the swap entry value checking (shmem_confirm_swap) to avoid the duplicated lookup, which should improve the performance. Signed-off-by: Kairui Song Reviewed-by: Kemeng Shi Reviewed-by: Dev Jain Reviewed-by: Baolin Wang --- mm/shmem.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index e3c9a1365ff4..033dc7a3435d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -505,15 +505,27 @@ static int shmem_replace_entry(struct address_space *= mapping, =20 /* * Sometimes, before we decide whether to proceed or to fail, we must check - * that an entry was not already brought back from swap by a racing thread. + * that an entry was not already brought back or split by a racing thread. * * Checking folio is not enough: by the time a swapcache folio is locked, = it * might be reused, and again be swapcache, using the same swap as before. + * Returns the swap entry's order if it still presents, else returns -1. */ -static bool shmem_confirm_swap(struct address_space *mapping, - pgoff_t index, swp_entry_t swap) +static int shmem_confirm_swap(struct address_space *mapping, pgoff_t index, + swp_entry_t swap) { - return xa_load(&mapping->i_pages, index) =3D=3D swp_to_radix_entry(swap); + XA_STATE(xas, &mapping->i_pages, index); + int ret =3D -1; + void *entry; + + rcu_read_lock(); + do { + entry =3D xas_load(&xas); + if (entry =3D=3D swp_to_radix_entry(swap)) + ret =3D xas_get_order(&xas); + } while (xas_retry(&xas, entry)); + rcu_read_unlock(); + return ret; } =20 /* @@ -2256,16 +2268,20 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, return -EIO; =20 si =3D get_swap_device(swap); - if (!si) { - if (!shmem_confirm_swap(mapping, index, swap)) + order =3D shmem_confirm_swap(mapping, index, swap); + if (unlikely(!si)) { + if (order < 0) return -EEXIST; else return -EINVAL; } + if (unlikely(order < 0)) { + put_swap_device(si); + return -EEXIST; + } =20 /* Look it up and read it in.. */ folio =3D swap_cache_get_folio(swap, NULL, 0); - order =3D xa_get_order(&mapping->i_pages, index); if (!folio) { int nr_pages =3D 1 << order; bool fallback_order0 =3D false; @@ -2415,7 +2431,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, *foliop =3D folio; return 0; failed: - if (!shmem_confirm_swap(mapping, index, swap)) + if (shmem_confirm_swap(mapping, index, swap) < 0) error =3D -EEXIST; if (error =3D=3D -EIO) shmem_set_folio_swapin_error(inode, index, folio, swap, @@ -2428,7 +2444,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, folio_put(folio); } put_swap_device(si); - return error; } =20 --=20 2.50.0 From nobody Wed Oct 8 14:18:11 2025 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66D98202965 for ; Fri, 27 Jun 2025 06:22:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005374; cv=none; b=Dg43Bs15oa9txNc/o0ttGvFVbJn+PgtLuH/d4K3RQ6BvU0kj3j0nRgYL1wP5EK+m6Z2w6NptJXHV0Rtn3e1S0tFedkfGZaaYFI+v9Ojjkbn7WxapjSj27wKFOpHrpB/k/KeSCDAeXHD1lOxUXS5VYT/Fwg2KWy0jouAgFNM4sUI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005374; c=relaxed/simple; bh=gkzFmUaJ1q4N79v+je8BwK2j3QkKt60kKVdzDcL+ZPg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LDqPF+0HB+gAydGvXy8YUhH8dU/I6bc+9X+kBvvOecWK4LkvaoGLe4+K+UVfsoYOP1cHKOGfJjoTWIrU28Mg7gVIMcIwTVPlLEfyuqpl4Fm4/R6YHSv91DW4lQpf+Mb+oPdY/rMQeuh1+tyIVmF7qVmChdrKEWa6aDtw2DVu98g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VwOh0FXm; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VwOh0FXm" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-748d982e97cso1836698b3a.1 for ; Thu, 26 Jun 2025 23:22:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005373; x=1751610173; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=50Y0XWJGwJinRQammbhrqeExSwEhDWKAQx/TaIcN54Y=; b=VwOh0FXmDGBkBc6iucZ01/thIJRlbOojrgM3GLZ7J4xNiy+3qecJLfFrEuu2MuQUb3 e9y+qu0bGl3QT6/lK4shn029wOuxn9v8W4Hw9uftzF6TyBJE1YN+PFSMPjp/U8FtMyxO s/quSAn+O+BxAyNFdaYxUkX2OVKYfAUKR3nqWi4fUWI0incpUB4UzyZxkA6L2XUYf67V mQkx3ukvaFjuCFIfAk638yaiCCKtoW3PkG7OmizN9icsuiJakNSkGOJRvHLUuS/rnVlG 5s8f5hlm+cCCSWhLisyOjrbU1YWQFXW2kd5UtDF3nTnwPhxmBbMrBFXq7VGfVKHC3nix mllQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005373; x=1751610173; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=50Y0XWJGwJinRQammbhrqeExSwEhDWKAQx/TaIcN54Y=; b=sCG/FhIDkjNi8RXhW8unr5qCHsBloChG8+1sa3WM5Mrrb7TADUNsd8083FIBOyBTR6 U4ik6Kqz11T5yXbgrrHJ/QT489mFQsuYeK5lm55QIyJKV4U6/nBUwa1nzd/zXfs9RICf FRq3nE59v02XFPUIQol9P1x5hNB4RF1bsPjxYDupoaVJTuy0frGfTNFornYYVrlRNH0H tqgundjyn81TiWImhggEJOGlJqK4yoBgn4qzEskvhO8HF3ZxY1mVYe/Kkc7Vi0FuDVRd fKszPFXS3xE4F2+ektF8dFSuqX2CYUuVn/Wt9wKFMIcLSATDkBW9QV2g9RW8BduROsgt 6KMw== X-Forwarded-Encrypted: i=1; AJvYcCVVEJCNa7L1VHaefrr9lIoaVtf8+uN3Q50xaZHHmhwcmAcfZ3P2GRrIKdlQHKYU0WdwD6sfnaPz6deoEhY=@vger.kernel.org X-Gm-Message-State: AOJu0YzqPP+tQrrKaFXoFkCwEU9cImf2YvxaTD71KQ+GhVy2NTd4F5Ze QEv0BVmBsKwv3Ki83J/u3OTTyL55DGzd2E1wndwrqfgXoX47CFOwONnM X-Gm-Gg: ASbGncvhpL0Aix6wDnfonzIjdM0pbw9/ls9Vgr8wly4tA8v3oIlinAEMw8xvkuPTc1O xdGOz8CYKvBhERpMFKoyExILFX30V7kFEq2nA2I8UQHdj1D5CGuW8SJ7ddwxZMS0td1wEbu/7qW Fy+cqH8VmRqtXLPZgpQthmglztR16ZR35whakw4AIZv7QgQG8j5OQ7Fl5w6NMbPP1uI/rrokF65 B0MTyll9IfYoDmiKRLic1bsJBoa9hmEyOKGV/0ilCOzQiw6mWoK3WcaZj/sKFcrAKWKEH3mam99 6x/o0JhQ1odkh/vV23952f8b1UIxqmp5I+v1WgpRfXyTy0fO8BRFlp6hWtNlZ/B9nPg4sMGcx0o 3 X-Google-Smtp-Source: AGHT+IEqfY010ingzvwEXrROI72JiTGh0L7tXCs5uWwB+6W3VxTF2TUNYLCfZ3OCYchE45voeAupYA== X-Received: by 2002:a05:6a00:238b:b0:746:24c9:c92e with SMTP id d2e1a72fcca58-74af6ef81a9mr2980203b3a.8.1751005372580; Thu, 26 Jun 2025 23:22:52 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.22.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:22:51 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 3/7] mm/shmem, swap: tidy up THP swapin checks Date: Fri, 27 Jun 2025 14:20:16 +0800 Message-ID: <20250627062020.534-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Move all THP swapin related checks under CONFIG_TRANSPARENT_HUGEPAGE, so they will be trimmed off by the compiler if not needed. And add a WARN if shmem sees a order > 0 entry when CONFIG_TRANSPARENT_HUGEPAGE is disabled, that should never happen unless things went very wrong. There should be no observable feature change except the new added WARN. Signed-off-by: Kairui Song Reviewed-by: Baolin Wang --- mm/shmem.c | 42 ++++++++++++++++++++---------------------- 1 file changed, 20 insertions(+), 22 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 033dc7a3435d..f85a985167c5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1980,26 +1980,39 @@ static struct folio *shmem_swap_alloc_folio(struct = inode *inode, swp_entry_t entry, int order, gfp_t gfp) { struct shmem_inode_info *info =3D SHMEM_I(inode); + int nr_pages =3D 1 << order; struct folio *new; void *shadow; - int nr_pages; =20 /* * We have arrived here because our zones are constrained, so don't * limit chance of success with further cpuset and node constraints. */ gfp &=3D ~GFP_CONSTRAINT_MASK; - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && order > 0) { - gfp_t huge_gfp =3D vma_thp_gfp_mask(vma); - - gfp =3D limit_gfp_mask(huge_gfp, gfp); + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + if (WARN_ON_ONCE(order)) + return ERR_PTR(-EINVAL); + } else if (order) { + /* + * If uffd is active for the vma, we need per-page fault + * fidelity to maintain the uffd semantics, then fallback + * to swapin order-0 folio, as well as for zswap case. + * Any existing sub folio in the swap cache also blocks + * mTHP swapin. + */ + if ((vma && unlikely(userfaultfd_armed(vma))) || + !zswap_never_enabled() || + non_swapcache_batch(entry, nr_pages) !=3D nr_pages) { + return ERR_PTR(-EINVAL); + } else { + gfp =3D limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); + } } =20 new =3D shmem_alloc_folio(gfp, order, info, index); if (!new) return ERR_PTR(-ENOMEM); =20 - nr_pages =3D folio_nr_pages(new); if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, gfp, entry)) { folio_put(new); @@ -2283,9 +2296,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, /* Look it up and read it in.. */ folio =3D swap_cache_get_folio(swap, NULL, 0); if (!folio) { - int nr_pages =3D 1 << order; - bool fallback_order0 =3D false; - /* Or update major stats only when swapin succeeds?? */ if (fault_type) { *fault_type |=3D VM_FAULT_MAJOR; @@ -2293,20 +2303,8 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, count_memcg_event_mm(fault_mm, PGMAJFAULT); } =20 - /* - * If uffd is active for the vma, we need per-page fault - * fidelity to maintain the uffd semantics, then fallback - * to swapin order-0 folio, as well as for zswap case. - * Any existing sub folio in the swap cache also blocks - * mTHP swapin. - */ - if (order > 0 && ((vma && unlikely(userfaultfd_armed(vma))) || - !zswap_never_enabled() || - non_swapcache_batch(swap, nr_pages) !=3D nr_pages)) - fallback_order0 =3D true; - /* Skip swapcache for synchronous device. */ - if (!fallback_order0 && data_race(si->flags & SWP_SYNCHRONOUS_IO)) { + if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { folio =3D shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp); if (!IS_ERR(folio)) { skip_swapcache =3D true; --=20 2.50.0 From nobody Wed Oct 8 14:18:11 2025 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26654202F70 for ; Fri, 27 Jun 2025 06:22:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005378; cv=none; b=EP6h716AdQAA++ST3U73otNqNH2LO1Z9ebYnpjliZOY14XSjNy6lCI2ho/JCfV6xelXFzRXRNcje2zpERyz55XN3LJVeiC9ei7eDFGIt4hjwpVCaKsfIFVU2RTTJK4FQLk2wYte88k7Q4cXOFL2B1Irklml7no1yK7MCikAmmVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005378; c=relaxed/simple; bh=9nbIUFmU9Tta0ZnvfqhyYfx+UP28cIN/vEjcQZB46ps=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Pz2CLPv7zT+FjxXTcxopOvVa6apt08DJhd1H11wOLmq4yHOAbq/WrrVc5d7Yj3jZVvDcxQ7JisGCsnw9yL6vUnE1g9z+l9R3SNuTt4fqOyqSHNgiw7QdwGxeGdXP+z7uZdRu8hweN9vDvzsp3S+TmEve6sTqEXKSjEgFeJvyRw0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Nl9xPfge; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Nl9xPfge" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-74931666cbcso1700050b3a.0 for ; Thu, 26 Jun 2025 23:22:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005376; x=1751610176; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=36NUzmy7cI1mrCn85qnONqa0stObJhxRD+bOILE9V6E=; b=Nl9xPfgetU/Ea0VBAWgAyG+G4ntOPvM7AzPIefodBjGaTEqrCBLdCnfiJpvLkZRZyt aihdLNaQt5jznguGXkBSjOr+xEugwGXDmnDiJJS2fuIdUcdYiyI3eSdL+LFSGpICRjWH cU2FNFuXEfkWP97Fu2I6HuB8jvTRcycueTIppZaXwUFH4RrtaQM8pm9dvH8HT2lCARj6 KKhPE2UsRQfmXdlWzbco64cnmkl/92Qev1GCbvGQfSzgjTsrwIVvBw+9sUWoHp34ex7n AO9JhmjpFrNAevs9rmv8kCjdCGYUA54CN5CDFrAyB6c2O7KwQ3xDQy3ZTEYLWbCPuTef ievQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005376; x=1751610176; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=36NUzmy7cI1mrCn85qnONqa0stObJhxRD+bOILE9V6E=; b=UZlAtuWvCrTZknGx5gc+HCdADzlqRqpsmohHqGPvkkHqmJ0Zk8HRThd//EOH/dQXK5 3cW1MeMwbcIoUQviHwwjo/t+eQRfAQg8zcRejALpMDA9uBi0hNcMWRixlBgMrlLeOr2N OnO4ayIpzPogf2Q9tgHNLcq8JfDNNpu+5RD5pteBFCI4pCzJ+/FrZlnWIJod7QLfWa/l Jhf5RLYpNTnYFTTSOi6/RI4LZF6/M6QYdZYmLuZX1d1y/RwbdRDmzWTzWV/D+52XLnSi OzvrMFddkIPQpRzI8vy9/UEWP+YzYD+0ZQVXbwIDaDaSUmlUts5Upt1P5hcqvd2Rp9sO WNSA== X-Forwarded-Encrypted: i=1; AJvYcCUKdnsS2y8uFC/e1yELwel3Rwc00oPEqghpvaohPTQrpE++ax3cQXpWAa+dWSK36jBeMa8cl/7XQjvmF84=@vger.kernel.org X-Gm-Message-State: AOJu0Yx4Khqkg23shYpxWZU0eTOLDayVdk7d907WNlIWa8/PGv6AJ3++ TuWESzLGhAS7nnE8Uoas6Le2QKHU9IcfyS0fXSpWG774As3CX8Bph8tB X-Gm-Gg: ASbGnctAo2rxaroC6vnqC1YOUOHHbilF5RWN1w0cRWH0GUvxxgVn1Cr1veEro6KDHg9 gagxpgD+9kCF+hE7sfSGtPcctIyZpP5WxIbdEB4+88ynqIgIjaUxJpE32Z6EarAJJYwBfB+rQYb zHtwo9GaaAaHQrW0/B+9xqJBZqAvmGgx/NruFyBkSWascRhSpEEvnN7HaXKXBsl6nGhz+OCPCYD mR9vjrH9cFlbLg7niyChoS1dqJYIkIQK4SLCWf7JBT5/Ym6hcugPQcolMhANAQsgAv1Hq8yLB64 Grmit9TAfmkHHCH/XX6Up6exmSETdIwX3ahofuKRNztAk4TotwV43JqWxcUYXpnxgOuAjRiwLih Y X-Google-Smtp-Source: AGHT+IHGmm3EE0oN1bu4Rse5K8kjBmPx7EfFiguYLGrRvqrW0keBD5PfxxWPsRTGL/+Upm5cemHgNQ== X-Received: by 2002:a05:6a00:2411:b0:746:26fe:8cdf with SMTP id d2e1a72fcca58-74af6e84939mr2996548b3a.7.1751005376305; Thu, 26 Jun 2025 23:22:56 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.22.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:22:55 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 4/7] mm/shmem, swap: clean up swap entry splitting Date: Fri, 27 Jun 2025 14:20:17 +0800 Message-ID: <20250627062020.534-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Instead of keeping different paths of splitting the entry and recalculating the swap entry and index, do it in one place. Whenever swapin brought in a folio smaller than the entry, split the entry. And always recalculate the entry and index, in case it might read in a folio that's larger than the entry order. This removes duplicated code and function calls, and makes the code more robust. Signed-off-by: Kairui Song --- mm/shmem.c | 103 +++++++++++++++++++++-------------------------------- 1 file changed, 41 insertions(+), 62 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index f85a985167c5..5be9c905396e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2178,8 +2178,12 @@ static void shmem_set_folio_swapin_error(struct inod= e *inode, pgoff_t index, swap_free_nr(swap, nr_pages); } =20 -static int shmem_split_large_entry(struct inode *inode, pgoff_t index, - swp_entry_t swap, gfp_t gfp) +/* + * Split an existing large swap entry. @index should point to one sub mapp= ing + * slot within the entry @swap, this sub slot will be split into order 0. + */ +static int shmem_split_swap_entry(struct inode *inode, pgoff_t index, + swp_entry_t swap, gfp_t gfp) { struct address_space *mapping =3D inode->i_mapping; XA_STATE_ORDER(xas, &mapping->i_pages, index, 0); @@ -2250,7 +2254,7 @@ static int shmem_split_large_entry(struct inode *inod= e, pgoff_t index, if (xas_error(&xas)) return xas_error(&xas); =20 - return entry_order; + return 0; } =20 /* @@ -2267,11 +2271,11 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, struct address_space *mapping =3D inode->i_mapping; struct mm_struct *fault_mm =3D vma ? vma->vm_mm : NULL; struct shmem_inode_info *info =3D SHMEM_I(inode); + int error, nr_pages, order, swap_order; struct swap_info_struct *si; struct folio *folio =3D NULL; bool skip_swapcache =3D false; swp_entry_t swap; - int error, nr_pages, order, split_order; =20 VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); swap =3D radix_to_swp_entry(*foliop); @@ -2321,70 +2325,43 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, goto failed; } =20 - /* - * Now swap device can only swap in order 0 folio, then we - * should split the large swap entry stored in the pagecache - * if necessary. - */ - split_order =3D shmem_split_large_entry(inode, index, swap, gfp); - if (split_order < 0) { - error =3D split_order; - goto failed; - } - - /* - * If the large swap entry has already been split, it is - * necessary to recalculate the new swap entry based on - * the old order alignment. - */ - if (split_order > 0) { - pgoff_t offset =3D index - round_down(index, 1 << split_order); - - swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); - } - /* Here we actually start the io */ folio =3D shmem_swapin_cluster(swap, gfp, info, index); if (!folio) { error =3D -ENOMEM; goto failed; } - } else if (order > folio_order(folio)) { - /* - * Swap readahead may swap in order 0 folios into swapcache - * asynchronously, while the shmem mapping can still stores - * large swap entries. In such cases, we should split the - * large swap entry to prevent possible data corruption. - */ - split_order =3D shmem_split_large_entry(inode, index, swap, gfp); - if (split_order < 0) { - folio_put(folio); - folio =3D NULL; - error =3D split_order; - goto failed; - } - - /* - * If the large swap entry has already been split, it is - * necessary to recalculate the new swap entry based on - * the old order alignment. - */ - if (split_order > 0) { - pgoff_t offset =3D index - round_down(index, 1 << split_order); - - swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); - } - } else if (order < folio_order(folio)) { - swap.val =3D round_down(swap.val, 1 << folio_order(folio)); } =20 alloced: + /* + * We need to split an existing large entry if swapin brought in a + * smaller folio due to various of reasons. + * + * And worth noting there is a special case: if there is a smaller + * cached folio that covers @swap, but not @index (it only covers + * first few sub entries of the large entry, but @index points to + * later parts), the swap cache lookup will still see this folio, + * And we need to split the large entry here. Later checks will fail, + * as it can't satisfy the swap requirement, and we will retry + * the swapin from beginning. + */ + swap_order =3D folio_order(folio); + if (order > swap_order) { + error =3D shmem_split_swap_entry(inode, index, swap, gfp); + if (error) + goto failed_nolock; + } + + index =3D round_down(index, 1 << swap_order); + swap.val =3D round_down(swap.val, 1 << swap_order); + /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || folio->swap.val !=3D swap.val) { error =3D -EEXIST; - goto unlock; + goto failed_unlock; } if (!folio_test_uptodate(folio)) { error =3D -EIO; @@ -2405,8 +2382,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, goto failed; } =20 - error =3D shmem_add_to_page_cache(folio, mapping, - round_down(index, nr_pages), + error =3D shmem_add_to_page_cache(folio, mapping, index, swp_to_radix_entry(swap), gfp); if (error) goto failed; @@ -2417,8 +2393,8 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, folio_mark_accessed(folio); =20 if (skip_swapcache) { + swapcache_clear(si, folio->swap, folio_nr_pages(folio)); folio->swap.val =3D 0; - swapcache_clear(si, swap, nr_pages); } else { delete_from_swap_cache(folio); } @@ -2434,13 +2410,16 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, if (error =3D=3D -EIO) shmem_set_folio_swapin_error(inode, index, folio, swap, skip_swapcache); -unlock: - if (skip_swapcache) - swapcache_clear(si, swap, folio_nr_pages(folio)); - if (folio) { +failed_unlock: + if (folio) folio_unlock(folio); - folio_put(folio); +failed_nolock: + if (skip_swapcache) { + swapcache_clear(si, folio->swap, folio_nr_pages(folio)); + folio->swap.val =3D 0; } + if (folio) + folio_put(folio); put_swap_device(si); return error; } --=20 2.50.0 From nobody Wed Oct 8 14:18:11 2025 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8BB4211A00 for ; Fri, 27 Jun 2025 06:23:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005382; cv=none; b=nBl9T4GDS3Ldt4dLLc8mfCjRMSUO3HPvXj2l7eYzbUwlcfXLWkUeY0uZ6fVhaI562TX7Hz2sipkvuujn7ooUuCTorQRhq0QD6/q6DINOQmxKRs+Hw9lg+y5lHwdPwVPX8RCn/eFnF1FtYihrxCXWNFfhQf/idFnxg7LAyyNbgQY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005382; c=relaxed/simple; bh=iVyr98S66nkqKEQXOcn/fomDItBbPe+w6yG+IASi8zY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=crnxpXXLCQn91L8sVHs5iuEMTMMxVYuvGzbiicU2kO59rbVcmV4obpZrl+NGaYAOEGtVA9Jnx+LQpvCspqGO+7Pij9dGXwe2hKl9nIa3X/VjFz2T2ettFKuVo3M6+tlnPe5ZUbhJRSn5YfCHMLby2MAmVYKRI3akg6r+tRb05UA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Zl9znYnl; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Zl9znYnl" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-7490cb9a892so1242932b3a.0 for ; Thu, 26 Jun 2025 23:23:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005380; x=1751610180; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=C7MLuCSrE1yGN0/K+3ZOSsZH1kdg+jacwIW2AlKNhWo=; b=Zl9znYnlL9yS2AsIU/GG2PWFG8t1cCX778a3ZBB0Rmc6Ug9J45UTeqlzkGKS7WfbK+ V7GnLN5+EEuDfXcBp/XoJOBnfLOsnJ3Sm2llh44kbWo7KJia9vcz2f1phCvmwHafPP6q 8rZrapog7s5btQ3jQcmo2M0GuENDRS6LFEvTfQeDsWSLAxLmF3taPrmYijazYLmzgSvv NSlUIhKC+8kMQor3P6OR5nS8MKWjrWcpGyQjKXI+0WDKaeHCYm7fS6nFlhK38UuSIh4K ZcwiVV9VU92OqWtXz7RGFIr1YGUeE29KmH3wTPlGi9P8x4lBtqB1kANfGJKN266ywL66 iMbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005380; x=1751610180; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=C7MLuCSrE1yGN0/K+3ZOSsZH1kdg+jacwIW2AlKNhWo=; b=S0wqu9sePMT8Zp4vlrm0Egb6D1ZHLHmynkeWMRB369rSEpkHuCW80imHx6ZU65xx2a CKgvy8cjA1E9KXEerDZxrwt5RCKngb1xuug/3cgGAJzj/QB9qrClXlw+BIp49Vwess3o kGM/9mNgJKXFh2q7nNQF1clbYZI9lWr5SiRoBF7ziKtz+hCFr9QwUkZgsFzTsVVvo8WO NMZQfUS2W0XVHjMkLZzsPBbOCfylFhtea0169mWSZSC7hhLCc6LDw4/hAv7DXbnJ55mv 76Fuc+aTxcmwfPqPM1hzA7QIuOFovJBT7fVmGlHBn3A/hv8nQwxdBJnOiDaY6mQyyas2 VBXA== X-Forwarded-Encrypted: i=1; AJvYcCVzvByUgnmu7GP45h53ADsEiHYr/DvS68OmjaL1PrIjf8HQmgmqnp+ILVfT1GkYKs1EQl3LX1JxtxoBPaA=@vger.kernel.org X-Gm-Message-State: AOJu0Yw3gSb3pX6sqqQsCFyQ9lodG4OTG9vCjPHCDHzfl5uKsyMLdtqM 2JjQGJVuyKIf8c+K+D1ZGswwEtPl3qivtuCFOx/c+EYlnmWzCYkEzEn8 X-Gm-Gg: ASbGncvGBTkV80dVqGNZg6gey4DXg4wfE6MUpKX8fj0Y+XI1U451wzYFzedmgVXioSl 7/QdkO8cKvjnbPpqErD/UGm/N5/G/HwiFXXLzRWJPgfn5Q22fpzshl/c958flRgjwJSvPPJ3yxc JdYB6PDxLvDEz+hI8qnNus0RE986xBOSeE/AZ6KjQ0g7kFiNQmJN83ZzXU/qNEQ+Kf2qFJy4xBK Q3lGr6+o6qW6iKeGmWJHjkm7ravPhaJhhr1yNk/6Zxf+3aTwWoBmo7wbySSUUuQ2vs9aVk9si7X WCRZfmliXteEsbG+cKjsZfDtuwkoHteJtV9TPvsPAcePLt5KDxc8ZXGTzhTBaSdtY8eB4Ii8QQJ 6 X-Google-Smtp-Source: AGHT+IGH69rDgVPbw+rW1TxbTSwidDYOQSjIR5lrciPeYQ1EueHkgXL17qrV4KOtidHHVr3VYimEvg== X-Received: by 2002:a05:6a00:b4f:b0:748:e4f6:ff31 with SMTP id d2e1a72fcca58-74af6e509f7mr3101933b3a.8.1751005379998; Thu, 26 Jun 2025 23:22:59 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.22.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:22:59 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 5/7] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO Date: Fri, 27 Jun 2025 14:20:18 +0800 Message-ID: <20250627062020.534-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently if THP swapin failed due to reasons like partially conflicting swap cache or ZSWAP enabled, it will fallback to cached swapin. Right now the swap cache has a non-trivial overhead, and readahead is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip the readahead and swap cache even if the swapin falls back to order 0. So handle the fallback logic without falling back to the cached read. Also slightly tweak the behavior if the WARN_ON is triggered (shmem mapping is corrupted or buggy code) as a side effect, just return with -EINVAL. This should be OK as things are already very wrong beyond recovery at that point. Signed-off-by: Kairui Song --- mm/shmem.c | 68 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 5be9c905396e..5f2641fd1be7 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1975,13 +1975,15 @@ static struct folio *shmem_alloc_and_add_folio(stru= ct vm_fault *vmf, return ERR_PTR(error); } =20 -static struct folio *shmem_swap_alloc_folio(struct inode *inode, +static struct folio *shmem_swapin_direct(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, swp_entry_t entry, int order, gfp_t gfp) { struct shmem_inode_info *info =3D SHMEM_I(inode); int nr_pages =3D 1 << order; struct folio *new; + pgoff_t offset; + gfp_t swap_gfp; void *shadow; =20 /* @@ -1989,6 +1991,7 @@ static struct folio *shmem_swap_alloc_folio(struct in= ode *inode, * limit chance of success with further cpuset and node constraints. */ gfp &=3D ~GFP_CONSTRAINT_MASK; + swap_gfp =3D gfp; if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (WARN_ON_ONCE(order)) return ERR_PTR(-EINVAL); @@ -2003,20 +2006,23 @@ static struct folio *shmem_swap_alloc_folio(struct = inode *inode, if ((vma && unlikely(userfaultfd_armed(vma))) || !zswap_never_enabled() || non_swapcache_batch(entry, nr_pages) !=3D nr_pages) { - return ERR_PTR(-EINVAL); + goto fallback; } else { - gfp =3D limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); + swap_gfp =3D limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); } } - - new =3D shmem_alloc_folio(gfp, order, info, index); - if (!new) - return ERR_PTR(-ENOMEM); +retry: + new =3D shmem_alloc_folio(swap_gfp, order, info, index); + if (!new) { + new =3D ERR_PTR(-ENOMEM); + goto fallback; + } =20 if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, - gfp, entry)) { + swap_gfp, entry)) { folio_put(new); - return ERR_PTR(-ENOMEM); + new =3D ERR_PTR(-ENOMEM); + goto fallback; } =20 /* @@ -2045,6 +2051,17 @@ static struct folio *shmem_swap_alloc_folio(struct i= node *inode, folio_add_lru(new); swap_read_folio(new, NULL); return new; +fallback: + /* Order 0 swapin failed, nothing to fallback to, abort */ + if (!order) + return new; + /* High order swapin failed, fallback to order 0 and retry */ + order =3D 0; + nr_pages =3D 1; + swap_gfp =3D gfp; + offset =3D index - round_down(index, nr_pages); + entry =3D swp_entry(swp_type(entry), swp_offset(entry) + offset); + goto retry; } =20 /* @@ -2243,7 +2260,6 @@ static int shmem_split_swap_entry(struct inode *inode= , pgoff_t index, cur_order =3D split_order; split_order =3D xas_try_split_min_order(split_order); } - unlock: xas_unlock_irq(&xas); =20 @@ -2306,34 +2322,26 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, count_vm_event(PGMAJFAULT); count_memcg_event_mm(fault_mm, PGMAJFAULT); } - - /* Skip swapcache for synchronous device. */ if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { - folio =3D shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp); - if (!IS_ERR(folio)) { + /* Direct mTHP swapin without swap cache or readahead */ + folio =3D shmem_swapin_direct(inode, vma, index, + swap, order, gfp); + if (IS_ERR(folio)) { + error =3D PTR_ERR(folio); + folio =3D NULL; + } else { skip_swapcache =3D true; - goto alloced; } - + } else { /* - * Fallback to swapin order-0 folio unless the swap entry - * already exists. + * Order 0 swapin using swap cache and readahead, it + * may return order > 0 folio due to raced swap cache */ - error =3D PTR_ERR(folio); - folio =3D NULL; - if (error =3D=3D -EEXIST) - goto failed; + folio =3D shmem_swapin_cluster(swap, gfp, info, index); } - - /* Here we actually start the io */ - folio =3D shmem_swapin_cluster(swap, gfp, info, index); - if (!folio) { - error =3D -ENOMEM; + if (!folio) goto failed; - } } - -alloced: /* * We need to split an existing large entry if swapin brought in a * smaller folio due to various of reasons. --=20 2.50.0 From nobody Wed Oct 8 14:18:11 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2A392144CF for ; Fri, 27 Jun 2025 06:23:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005386; cv=none; b=VlAcTZJVvqS4KUALPa3qcE286zULyNsgkr4LJ65bCZWsInAxdK/f2y5IhF8wD3ZMf5qjimRcIaOJ7UeBYwdr4XXZ2HUdWLfMavYBRuyZXsnrB5aNmra/65UQxwBWu1aVNF1DDg3uENcFMvREc7VItw8NgwI0tiOayLpttIQP/LE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005386; c=relaxed/simple; bh=r49wrMFHcDSo/76hRCDvLHdq7DnCO/1JAUksq7YfABY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YOgv/hMpdYuvPM+8qsfg1NaO23ASOlHbCwxWDKzTfr1iWs+XMXahgmwz3Uppjxf8j5fnQEn7A7ji+yeYLPF0LLrt9jsaU6f9jkzEEP8d+FiwLicNUkdDcXWHKrzM0Tl7YaAx+COk0x/pRY3FZGxQqJef/wK2pYYnWUwp690ZN+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AwWPdree; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AwWPdree" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-7399a2dc13fso2281777b3a.2 for ; Thu, 26 Jun 2025 23:23:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005384; x=1751610184; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=0ktIgMB1madWBeIOKUq29bfqNvO0BwWDQ23jnFNNR2c=; b=AwWPdreew3+i5O9mSJM9sA1HuGX1C9WVCeh1Q7bq+Us4/TwTmzLWA15qBHtdhkQ9Bx Ev9ZI/46AzM0gD1pF7xND25K++pay0jrSmHqHbOhHOvQrAIm4Qeq2/jLcT189bM5Uh+l NFd7r0vHOeRDTafdvHjPfDEtl3Gvt3UMkOqwuTQSlKgLVK6Jf/W5zRSMmxk/XG1IFWNT robXGcLGe1HzUXwQjnfmBxmnALKVtORNQMNp3plwO5fBps4njMcEevqdQzCDKS/3wlWe 3goAN6tEbjUEnkXEQd92TZjZRkzvB0Dyw6Bfigo/nNpqCrQqvz2Os4WcUC5x2vTrGkCd Hwsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005384; x=1751610184; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0ktIgMB1madWBeIOKUq29bfqNvO0BwWDQ23jnFNNR2c=; b=tKmFKKjaTsRJAicYoPA8fYV+FwT8iPW04afZyPmJ4Pem6eG8iy79FvuE0WMAf8b0iW 9yC5hqJ8yR9d0TDSccb1qxT8PIWeqNHYHdEXocpquSvm5ESW9lkULc/HZOmTrwKvmFZv YseHgXNotYKxurVdXkCi8oqbZ7Fa9FN6g3LmVxm5AtYRY8yp63wS1qjCeBydWQDstXT+ 0RYE5kzOJm0EJo1xmxnZamLAGz2voOXTZwtgxZhrGGyMv+OxueVmOsd1fSg3lKoBGnvl o6/Jpue/zqlKui7qvbTrxA7qMNb7sJeR2qxU9LDEfhJLtj3z7DfMn9TpQVYEs6Upo63S X8XQ== X-Forwarded-Encrypted: i=1; AJvYcCV5Ol96kC3kYHIIPIaZxkFCuqsgzHcVeYa8F0o24bVbyxGpjkvYjlOMo1aL75/tQTfwgn7x8fVvif0YoVE=@vger.kernel.org X-Gm-Message-State: AOJu0YyybOPrk5nrDCzNbW3Y1yedXlT6AZ2eyo4WDVDymVEF2wWxftlc gKA2AtmP8pjBIO+raXN+uG4/SCBlv1ep5bsBprZiyw5tcbkOlUQLPqfh X-Gm-Gg: ASbGncuJz/pUOjAQh3MdNbG5oCkx9L00aDPMLblVnP881TjzfwsK8sesqv91MiYdovu Uqr/NoAKB6y9dDEDUJQ9PPyqWsJ36CjfH3z+mdoDTg1iJVKtu8ZelBkH0CQULEYx7YoUX774uEi HZbDtL6VcDfWXsHhppS6LPYJ0uSKuIL91AeC3klfmIzUXs1SZUD6dcM9X2ys5dUDvfmktx+Ir/7 ZAZ6//4SNU4JwMO+axK8KeTnEffdkEcd3+MT42D7FPpyTFFrEXPkhimyQHpZWh3767fJmHMR7i1 SjUxGnzLj8QPxNKKGlqPFp8IrxlLhkOrfqgyqeoB8JQbtdGpljBjItskAe1q0uSVuzDGAaTAlPK t X-Google-Smtp-Source: AGHT+IGBjDvPz8yY9PBgUeOwmtTrIoPYD/1q7xOCKAY9zvL2RpB2LlARU8Ly7a5Cg/HvE9F9I/zxCw== X-Received: by 2002:a05:6a21:600a:b0:1ee:d418:f764 with SMTP id adf61e73a8af0-220a16c9a36mr2649969637.38.1751005383692; Thu, 26 Jun 2025 23:23:03 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.23.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:23:03 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 6/7] mm/shmem, swap: fix major fault counting Date: Fri, 27 Jun 2025 14:20:19 +0800 Message-ID: <20250627062020.534-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song If the swapin failed, don't update the major fault count. There is a long existing comment for doing it this way, now with previous cleanups, we can finally fix it. Signed-off-by: Kairui Song Reviewed-by: Baolin Wang --- mm/shmem.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 5f2641fd1be7..ea9a105ded5d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2316,12 +2316,6 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, /* Look it up and read it in.. */ folio =3D swap_cache_get_folio(swap, NULL, 0); if (!folio) { - /* Or update major stats only when swapin succeeds?? */ - if (fault_type) { - *fault_type |=3D VM_FAULT_MAJOR; - count_vm_event(PGMAJFAULT); - count_memcg_event_mm(fault_mm, PGMAJFAULT); - } if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { /* Direct mTHP swapin without swap cache or readahead */ folio =3D shmem_swapin_direct(inode, vma, index, @@ -2341,6 +2335,11 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, } if (!folio) goto failed; + if (fault_type) { + *fault_type |=3D VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(fault_mm, PGMAJFAULT); + } } /* * We need to split an existing large entry if swapin brought in a --=20 2.50.0 From nobody Wed Oct 8 14:18:11 2025 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60C8020A5F3 for ; Fri, 27 Jun 2025 06:23:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005389; cv=none; b=kOYPuleBAEwmgG1Uvw8W/Qkc6KQsYdiP+s0SGd9UXnxhyKN/J0/J70Zj2LxN/VOqIpQjXAg/+mXFOOjzdVp1EEpJBYPwKxoaUw+WXnOdIOtv/zmKdTdvtZIkB0cOBbpMx2VodSK4JLaOu+ldufqRTgoXEB0sZI5B1XzOvLx7nGw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751005389; c=relaxed/simple; bh=Fqet+RpEj0rCT062rkXG4OgvuguU2O3BW4D1tbqMX90=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nFLcCrJwNdFlBVOntRVQcobs4LC/RjFL4mniILcqDuvh4qIcaO0Z9NKj19vs8wxzcdlEVC5rOvVOKsr3M+NCgWHgUNo7MD6cCWAYlBxEsdYB5E5sdU+bkC5YRTH0ZnnFbUY6ZAnN66yfl4fiB9s37hV/pjONu0MAZa+rWiuRlHU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MoXZZ7K8; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MoXZZ7K8" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-7399a2dc13fso2281834b3a.2 for ; Thu, 26 Jun 2025 23:23:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005388; x=1751610188; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=nL0w5kzVkX1kWqwcuxi9i82Fs+bCrE4ZYjkCcwdAB2I=; b=MoXZZ7K8rdXpYjAqOGTOFTBvbPlAnrHmbZ85S/3833KIxSmfmvPHsxQQ5xoamZKlFQ PSiUI+dnUsFc+GW2oTuT2/bK7jYUScvbexWZYDj8UWMiCIBwH/nz++iEPQhO1DFKEOSE JseQpVTcwtAKCQeFjNWLiAl8fXSIgsjOYzU6L4NY8+YZGDhX2ZRmUghBkqwnqp9884Mc QvsR5VkleILJuaAGozCrTe00x9PMLOL3tq+DkJPcguUKn3Zg1xi8Yq4EmsHdb5CsXUdm Q/FEzC31Fj9ZPr0B147c1SlwmoQZtz8Cy8OttOW/w0pJUFVUFskM0AmneamsHGc/L9Vu BPoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005388; x=1751610188; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nL0w5kzVkX1kWqwcuxi9i82Fs+bCrE4ZYjkCcwdAB2I=; b=k9sWJYnaMqgHK6axPZp8kL7yVf40sN/Lnn1dyA7Vx2XLBFG3yoUFsu6iSEMxVfKVPH hOTNmxWFdLLswkTwfibrMe4LXT0da7M/+k7tvgzX6vEGKFcNBHWXpqSoyb1eib3K9n81 kxIA8KzXfWh+j6wBWqdPmRsyxyexkbxO92qPax1K70Fz9D4iizQZpJjSFMt9sKOhp8np JBRQqJJeE93EthYPOUBaTNePtPQqxySjvJ/ftZlhrfHfGBN1iWFos1afLugYQcYHXcYm br/zeBtaITCq7h6dbUxbtrewghTKUR/x5khCEBandV88VoXCpceEUG8RPBJy9rwKwihP zVCA== X-Forwarded-Encrypted: i=1; AJvYcCUeCQoNNzdFBKEu8107UfC4j/Ooldz6c6K+DubDrmHz+tdspVEn2VtX/KXdBKSxj/AqbggrI13DP76YLJk=@vger.kernel.org X-Gm-Message-State: AOJu0Yy9zBYM3MUIyjsoHa9Qkfvdzmz5ae9eilIJheDmo7zd+2cXzstr rDzkwubxfTbj/hGdYmBsuNunKqF9gAbSU88fpXEuP7YEGjARFu24LmBeR5rfcwmlEJ8= X-Gm-Gg: ASbGnctpH07yRhKESmo63/86rgCQAgTfTp9K22BIaBvdc+jlyseZF890wbxEq4MMB4m fR1gGmQJyTqk4gn0UeRY6ek/0gf4ADtPGCI/FwjkXrQYHeiFF2htN3v0BwPgYLlzNRD6q8eI5W9 lUykh+SfxDWpbeoQ9AugH1umf8hew0KJcuYpIG8GLA0q7eeHL1FEaMjdvnzhdHOf7FZHvZUAy6P jYfK73TEpVv8Yi+V17Ltm4qO6JsRYM+ZlEnrEGi5qN+WOxNUug42TaqNod4MYI49qWd7ZLJoe3C X+k1JVReRutiD95LYlMYa7Nr8GPzL+UuQzmcdn9v2uP9vUez5R4cprMQ4ifHm0f66dtXIVLPHAa sDyXem60t+GY= X-Google-Smtp-Source: AGHT+IHB6BJrzXE0fbYKz+3Hy49o+JIqzEHntdXhY43+JBAbIZmgdJw3vrmu91ru2+wBU4vePpDHUA== X-Received: by 2002:a05:6a00:14cd:b0:748:e38d:fed4 with SMTP id d2e1a72fcca58-74af6ef30e0mr2742498b3a.6.1751005387610; Thu, 26 Jun 2025 23:23:07 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.23.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:23:06 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 7/7] mm/shmem, swap: avoid false positive swap cache lookup Date: Fri, 27 Jun 2025 14:20:20 +0800 Message-ID: <20250627062020.534-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song If the shmem read request's index points to the middle of a large swap entry, shmem swap in does the swap cache lookup use the large swap entry's starting value (the first sub swap entry of this large entry). This will lead to false positive lookup result if only the first few swap entries are cached, but the requested swap entry pointed by index is uncached. Currently shmem will do a large entry split then retry the swapin from beginning, which is a waste of CPU and fragile. Handle this correctly. Also add some sanity checks to help understand the code and ensure things won't go wrong. Signed-off-by: Kairui Song --- mm/shmem.c | 60 +++++++++++++++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 28 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index ea9a105ded5d..9341c51c3d10 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1977,14 +1977,19 @@ static struct folio *shmem_alloc_and_add_folio(stru= ct vm_fault *vmf, =20 static struct folio *shmem_swapin_direct(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - swp_entry_t entry, int order, gfp_t gfp) + swp_entry_t index_entry, swp_entry_t swap, + int order, gfp_t gfp) { struct shmem_inode_info *info =3D SHMEM_I(inode); - int nr_pages =3D 1 << order; struct folio *new; - pgoff_t offset; + swp_entry_t entry; gfp_t swap_gfp; void *shadow; + int nr_pages; + + /* Prefer aligned THP swapin */ + entry.val =3D index_entry.val; + nr_pages =3D 1 << order; =20 /* * We have arrived here because our zones are constrained, so don't @@ -2011,6 +2016,7 @@ static struct folio *shmem_swapin_direct(struct inode= *inode, swap_gfp =3D limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); } } + retry: new =3D shmem_alloc_folio(swap_gfp, order, info, index); if (!new) { @@ -2056,11 +2062,10 @@ static struct folio *shmem_swapin_direct(struct ino= de *inode, if (!order) return new; /* High order swapin failed, fallback to order 0 and retry */ - order =3D 0; - nr_pages =3D 1; + entry.val =3D swap.val; swap_gfp =3D gfp; - offset =3D index - round_down(index, nr_pages); - entry =3D swp_entry(swp_type(entry), swp_offset(entry) + offset); + nr_pages =3D 1; + order =3D 0; goto retry; } =20 @@ -2288,20 +2293,21 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, struct mm_struct *fault_mm =3D vma ? vma->vm_mm : NULL; struct shmem_inode_info *info =3D SHMEM_I(inode); int error, nr_pages, order, swap_order; + swp_entry_t swap, index_entry; struct swap_info_struct *si; struct folio *folio =3D NULL; bool skip_swapcache =3D false; - swp_entry_t swap; + pgoff_t offset; =20 VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); - swap =3D radix_to_swp_entry(*foliop); + index_entry =3D radix_to_swp_entry(*foliop); *foliop =3D NULL; =20 - if (is_poisoned_swp_entry(swap)) + if (is_poisoned_swp_entry(index_entry)) return -EIO; =20 - si =3D get_swap_device(swap); - order =3D shmem_confirm_swap(mapping, index, swap); + si =3D get_swap_device(index_entry); + order =3D shmem_confirm_swap(mapping, index, index_entry); if (unlikely(!si)) { if (order < 0) return -EEXIST; @@ -2313,13 +2319,15 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, return -EEXIST; } =20 - /* Look it up and read it in.. */ + /* @index may points to the middle of a large entry, get the real swap va= lue first */ + offset =3D index - round_down(index, 1 << order); + swap.val =3D index_entry.val + offset; folio =3D swap_cache_get_folio(swap, NULL, 0); if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { /* Direct mTHP swapin without swap cache or readahead */ folio =3D shmem_swapin_direct(inode, vma, index, - swap, order, gfp); + index_entry, swap, order, gfp); if (IS_ERR(folio)) { error =3D PTR_ERR(folio); folio =3D NULL; @@ -2341,28 +2349,25 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, count_memcg_event_mm(fault_mm, PGMAJFAULT); } } + + swap_order =3D folio_order(folio); + nr_pages =3D folio_nr_pages(folio); + /* The swap-in should cover both @swap and @index */ + swap.val =3D round_down(swap.val, nr_pages); + VM_WARN_ON_ONCE(swap.val > index_entry.val + offset); + VM_WARN_ON_ONCE(swap.val + nr_pages <=3D index_entry.val + offset); + /* * We need to split an existing large entry if swapin brought in a * smaller folio due to various of reasons. - * - * And worth noting there is a special case: if there is a smaller - * cached folio that covers @swap, but not @index (it only covers - * first few sub entries of the large entry, but @index points to - * later parts), the swap cache lookup will still see this folio, - * And we need to split the large entry here. Later checks will fail, - * as it can't satisfy the swap requirement, and we will retry - * the swapin from beginning. */ - swap_order =3D folio_order(folio); + index =3D round_down(index, nr_pages); if (order > swap_order) { - error =3D shmem_split_swap_entry(inode, index, swap, gfp); + error =3D shmem_split_swap_entry(inode, index, index_entry, gfp); if (error) goto failed_nolock; } =20 - index =3D round_down(index, 1 << swap_order); - swap.val =3D round_down(swap.val, 1 << swap_order); - /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || @@ -2375,7 +2380,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, goto failed; } folio_wait_writeback(folio); - nr_pages =3D folio_nr_pages(folio); =20 /* * Some architectures may have to restore extra metadata to the --=20 2.50.0