From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 076D2214A97; Fri, 4 Jul 2025 18:18:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653085; cv=none; b=NK4X1Y5Be9q+Vv/uw4ZAiQ5ecZky5BIdcogZ1bftrhRF3eDJ8SqqE7nwtQmIn+EOmFAmtNxh2uc+JEK6qEapYPZVeGIXe2ye1JdrJc12TsoXC/PqxMJ+++8U/ZCPkeOzWRUdIjf/d1RqBdscmzNqLAoq8Ip7dGfCTX0I8Fk+ASc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653085; c=relaxed/simple; bh=aHEKVPEcG65zMssaVDheCdyA1eEbVFSOQxSEcrZfmqo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ObtVTqt5tAzIlLDxuoQwgiup50XBC4Fu48YcG2fKRIUnXtJROGDGbipSZMibU7aA/65Od8q2hkI3GWwkbhLMBs6f/HgxX4ALrk4voPHabcS5V93O2EXOub9Bya8PSu/ZEuGav3+ZF2iuTso0Bf8UhGppgBKcEiaoYliZ/b4trKA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TgI6j4j8; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TgI6j4j8" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-7d219896edeso142827885a.1; Fri, 04 Jul 2025 11:18:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653083; x=1752257883; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=Bl/mZF3g2hMilPsP1iGl5oIVwOLv3WwEcuUx0SZOALE=; b=TgI6j4j8NvQ1RSc38+62ty64bMoKYVUKba44sKW1/C5mCIy4gMhV55Qoy5JpxU9YVm XRXQyEKwc2DsNCOKve3DDjE16B+UAs7WAobl0tKRJ7HNreY7UBsnP2+VlmwRcz/Iq2jp Jty1yGuBZDUGto66R6UVes/A2695Yi6AGmt7Q5leH8h9Km3HavorJk9MgQN/TnEwiQeu U8D5D9z3hKvPI9TKgyqVLt1G+NLcHe2PF1acAy7m+odBzkbaSnL6RSoZMDTddadaTJQV TYwio411vvxY2b0NbK6dsGd5RKCDz1rJQHRHskuWWVnbGon85cNvGNC30EnTBH4ShkSa h50g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653083; x=1752257883; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Bl/mZF3g2hMilPsP1iGl5oIVwOLv3WwEcuUx0SZOALE=; b=ghr68InYzthW+GB5aQFZxQjI3FfiaxE6veKfqv53flEKmn8oXQ4UmXeJmD7JcRZbZT AKVxf2Azq3ur+2QG9+r+5e9BZ1LOVa1UjtfbADMtL/vaf5gN8aXwoLhV+rjBAvx59iod HGfgvSQsF63/tHmVexLq0NenC8iGHU+SyEjVIRNudmWeu3C/k/lL6UyOy/4eQsHxmkWg MhFGng8QQ9v7k1h/QrG3JnSaiuMWyXYltKPWWfWqHWbtFLhwhqN1Z3j+nPk/hGnnMI5H sM5B1RWAK+v/Pn2MAk2trhahvrJmCuEdd+S/0wjE1bcdiIx4/kufqDCYyPo/Wha2TYbt nDSA== X-Forwarded-Encrypted: i=1; AJvYcCU07E9VqsOp0vNbaZ+yMIui9zy65L2piOnwdWp8LURdT1Y4Ajesl6G8KTuljmF1dVroyhkhUV3P@vger.kernel.org, AJvYcCU7tfT87ZADPY/Uo4jZGfdBZ8vjzS8NWZinn+MGy1rHjlxFV82AC18v2+ffDw5/jmCoA5HxIM4YZScQeMk=@vger.kernel.org X-Gm-Message-State: AOJu0YzT8hpKPbdJ6xrEx9LeI9CgD0V7sjvApWv2dk7IB5nBr3xIAIEo hDCWs2Gk+SMYhFYu37qGc5XSVCFtD4g0SqncZpOjvK7siZlCQBCIBPcK X-Gm-Gg: ASbGncvSOUuVKbzWxQa1LVC9nbjMhzzsLfbO2WD6sBx1bYojAfampBzufiUDCjrbmGQ 7dH9P/DfcGPTl8kG1fIuqrTqZ7IvKwVjEkXbotDeLXwyoHs2gS2vSMVmnOaiuDDU+WZKCqnDlBA cfIgd4R+J89w7aRkmXCNh0Pq5b22MMVZe51OEGJxEnFJfy18HJv6Nsqv1Kjwqx+gd3axgY9pdJL xT17MlKduif307Os7Op+1C5RMns0eh2fBqoqz1QR5tqkqgRI6qTibtKr05XCZHkPGsaJuqZm+SB psmOnkqzOOn6sMH0mHIAFsAdsE2ehQxt2k1NWWzHOBdJrIK292amU5Z/LQSP/AuktHA= X-Google-Smtp-Source: AGHT+IFy3tLLLYtnh/8YAuJviyCcHo1jFc1c5QzCjrQPfcPthvfTb7Ko7B/LTrElXNHWG9aZmoeYiQ== X-Received: by 2002:a05:620a:4147:b0:7d3:a580:c197 with SMTP id af79cd13be357-7d5ddc2f78fmr380974585a.31.1751653082750; Fri, 04 Jul 2025 11:18:02 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.17.58 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:02 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song , stable@vger.kernel.org Subject: [PATCH v4 1/9] mm/shmem, swap: improve cached mTHP handling and fix potential hung Date: Sat, 5 Jul 2025 02:17:40 +0800 Message-ID: <20250704181748.63181-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song The current swap-in code assumes that, when a swap entry in shmem mapping is order 0, its cached folios (if present) must be order 0 too, which turns out not always correct. The problem is shmem_split_large_entry is called before verifying the folio will eventually be swapped in, one possible race is: CPU1 CPU2 shmem_swapin_folio /* swap in of order > 0 swap entry S1 */ folio =3D swap_cache_get_folio /* folio =3D NULL */ order =3D xa_get_order /* order > 0 */ folio =3D shmem_swap_alloc_folio /* mTHP alloc failure, folio =3D NULL */ <... Interrupted ...> shmem_swapin_folio /* S1 is swapped in */ shmem_writeout /* S1 is swapped out, folio cached */ shmem_split_large_entry(..., S1) /* S1 is split, but the folio covering it has order > 0 now */ Now any following swapin of S1 will hang: `xa_get_order` returns 0, and folio lookup will return a folio with order > 0. The `xa_get_order(&mapping->i_pages, index) !=3D folio_order(folio)` will always return false causing swap-in to return -EEXIST. And this looks fragile. So fix this up by allowing seeing a larger folio in swap cache, and check the whole shmem mapping range covered by the swapin have the right swap value upon inserting the folio. And drop the redundant tree walks before the insertion. This will actually improve performance, as it avoids two redundant Xarray tree walks in the hot path, and the only side effect is that in the failure path, shmem may redundantly reallocate a few folios causing temporary slight memory pressure. And worth noting, it may seems the order and value check before inserting might help reducing the lock contention, which is not true. The swap cache layer ensures raced swapin will either see a swap cache folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is bypassed), so holding the folio lock and checking the folio flag is already good enough for avoiding the lock contention. The chance that a folio passes the swap entry value check but the shmem mapping slot has changed should be very low. Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") Signed-off-by: Kairui Song Reviewed-by: Kemeng Shi Reviewed-by: Baolin Wang Tested-by: Baolin Wang Cc: --- mm/shmem.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 334b7b4a61a0..e3c9a1365ff4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -884,7 +884,9 @@ static int shmem_add_to_page_cache(struct folio *folio, pgoff_t index, void *expected, gfp_t gfp) { XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio)); - long nr =3D folio_nr_pages(folio); + unsigned long nr =3D folio_nr_pages(folio); + swp_entry_t iter, swap; + void *entry; =20 VM_BUG_ON_FOLIO(index !=3D round_down(index, nr), folio); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); @@ -896,14 +898,24 @@ static int shmem_add_to_page_cache(struct folio *foli= o, =20 gfp &=3D GFP_RECLAIM_MASK; folio_throttle_swaprate(folio, gfp); + swap =3D iter =3D radix_to_swp_entry(expected); =20 do { xas_lock_irq(&xas); - if (expected !=3D xas_find_conflict(&xas)) { - xas_set_err(&xas, -EEXIST); - goto unlock; + xas_for_each_conflict(&xas, entry) { + /* + * The range must either be empty, or filled with + * expected swap entries. Shmem swap entries are never + * partially freed without split of both entry and + * folio, so there shouldn't be any holes. + */ + if (!expected || entry !=3D swp_to_radix_entry(iter)) { + xas_set_err(&xas, -EEXIST); + goto unlock; + } + iter.val +=3D 1 << xas_get_order(&xas); } - if (expected && xas_find_conflict(&xas)) { + if (expected && iter.val - nr !=3D swap.val) { xas_set_err(&xas, -EEXIST); goto unlock; } @@ -2323,7 +2335,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, error =3D -ENOMEM; goto failed; } - } else if (order !=3D folio_order(folio)) { + } else if (order > folio_order(folio)) { /* * Swap readahead may swap in order 0 folios into swapcache * asynchronously, while the shmem mapping can still stores @@ -2348,15 +2360,15 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, =20 swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); } + } else if (order < folio_order(folio)) { + swap.val =3D round_down(swap.val, 1 << folio_order(folio)); } =20 alloced: /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || - folio->swap.val !=3D swap.val || - !shmem_confirm_swap(mapping, index, swap) || - xa_get_order(&mapping->i_pages, index) !=3D folio_order(folio)) { + folio->swap.val !=3D swap.val) { error =3D -EEXIST; goto unlock; } --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1128223DFB for ; Fri, 4 Jul 2025 18:18:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653090; cv=none; b=HhrQTbFmeNNNp7yGgzf4t/d/r41YaRWv6HERzdxaWrygJ6VK2JqPOKlP5E4IFipFqdfa0AJu15AesEKecKxt1TBNsarENCKZzd7o4Yjn1JllbIU+JF2GFHc+MyeZhd0rL2PwIMqDgNVGVvhBSX9+I0U8Q4WBFnE5MNGhVSWDqjA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653090; c=relaxed/simple; bh=sMzcydJGDBsWjPrRWFzyKAcbh3UK5atqGzi5zVIOv70=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AWi/6pkF57TL5y2mV9Ck7iPvzxrE6jVrB9hT7JIGaPtvaVVtERgfXHcWsOsM8XkzXbVZK3IpW+w5ygIeouyOQgkhIOArgiVi7fFWrJCMmG88hLK6YOOo0zZ44lhq4F9nj3LCa2FBrl4gJrg2W25fwhrbJYmr6b5PPHuvXRxEkCY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IfibfP30; arc=none smtp.client-ip=209.85.222.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IfibfP30" Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7d0a0bcd3f3so137161485a.1 for ; Fri, 04 Jul 2025 11:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653087; x=1752257887; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=/hWgoWI8jaJ1OTfxVEFPXPhBOEtz+zuwq/NuSmx4sE0=; b=IfibfP30W4RdEL/zc7MGW332r9Qjaok0DZ296HJ/SpGCDLrKAqPffCFqzYcKR2b/OX 1S8lc3TVxERreNAVb9MMKiLgirvwgTrsEqVQYPO/pgf40BLhd+Njy5GB7ACHY/utZRx/ YzWuTD9ruT9anlMFn6V/XxBP0S0cuL/yYSuD4seRgkZSaK2AUda8NMizHUDhCYYlreld oJ1wDHh5uu7Xu4sOoNuf8vHHNxiuzMdWpSk+gkWmNxmj/2eXdIC8nOScsCa5813M+aWd cFBrHfUgABemsoy82/NjyTG7jGG3nOYvqiSgO5exo01LHYVea3hWcDkxWw2agv2QBAcd RRYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653087; x=1752257887; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/hWgoWI8jaJ1OTfxVEFPXPhBOEtz+zuwq/NuSmx4sE0=; b=va97vK1pDmAMR4FsEoTdwyLCXcQioekQtPQOaAwjTswiZvOdQ+qLTzck1N19TprTUo y2mx+PckpHZysfafi115MocjbSA+9JDsOoHgLt6eF6ABv8ivRNhETmSx/Ns/qmhOvzZZ iaX9LN30Uk3LO2IVVls/T9XFDdD+qawX7pJjOv7tDMRhf8OEynrhSw6mTwg8NEFvxQwi BMR6UyjH/9EBMmsJwPEMexfHPPUaxvXL5NFS4JZMOjDaef9P6ESf3Csfl7hSggCYohH1 s3bMj05CjWCL0r5tXRif2yl9wstVjYA01OWRlXYP1BGRZhHal/vKj2u6e77j7n5DLlvw +b/A== X-Forwarded-Encrypted: i=1; AJvYcCUQ32gw0f29BZRpawVpCrSJ/ggVTbt8OxUlEL87Q08yGgD9Vf1Kn7SLmU1JIpjJ1BYRK9m2DlUECUzzxR0=@vger.kernel.org X-Gm-Message-State: AOJu0YyB7a0TUikljhj0egPrM0Rdjd/fo58KENpVTgARmmNp19nIgCrg YNMw60P4/B4oTsN/M/n25U+hg5pBz22ZPGcItrbnVU6bdbfczDkgjQ/4 X-Gm-Gg: ASbGncsFtfl/CuCPI4ues9FcmQpHS29elvDmDMCiiBibQieca+GlG8zrWqzox1t1c82 J436tlM9Mtj0Y6Iqa1f/aE9EfO2dMHcu+xvXeiDJQQh3BYuateh0WUXjBulWZJL9RPKmGppjHRF Xv4ezBFI2JxDSoYPL7BBZBZUu62/t644ORFJbetND+ZG4iyG3+IRiwYR6lZy6wXWHfSa4VVRbNf C0GeXAWkRMUtDYz+14mDqUkUU8+Jaa5B3vskyg5FM0seKFQPpCxGFWV7aH8a+JC74ELtGp/sBxG p/jwFpWGDnZ11xE5wUFEYg+3qO/clpVXdJr75el6iY5a9BV7Nt5rgdIKXacbS+1uGXE= X-Google-Smtp-Source: AGHT+IHcGsS8lCBCc2bG6GiwF3Ci1TV5JesuDe1bXsduyBVEYl+kCzXmLRqAZibZ2g+n+KoEDJjbSw== X-Received: by 2002:a05:620a:1908:b0:7d3:92bd:21b8 with SMTP id af79cd13be357-7d5dcd15dc8mr574698785a.17.1751653087454; Fri, 04 Jul 2025 11:18:07 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.03 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:06 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song , Dev Jain Subject: [PATCH v4 2/9] mm/shmem, swap: avoid redundant Xarray lookup during swapin Date: Sat, 5 Jul 2025 02:17:41 +0800 Message-ID: <20250704181748.63181-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently shmem calls xa_get_order to get the swap radix entry order, requiring a full tree walk. This can be easily combined with the swap entry value checking (shmem_confirm_swap) to avoid the duplicated lookup, which should improve the performance. Signed-off-by: Kairui Song Reviewed-by: Kemeng Shi Reviewed-by: Dev Jain Reviewed-by: Baolin Wang --- mm/shmem.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index e3c9a1365ff4..033dc7a3435d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -505,15 +505,27 @@ static int shmem_replace_entry(struct address_space *= mapping, =20 /* * Sometimes, before we decide whether to proceed or to fail, we must check - * that an entry was not already brought back from swap by a racing thread. + * that an entry was not already brought back or split by a racing thread. * * Checking folio is not enough: by the time a swapcache folio is locked, = it * might be reused, and again be swapcache, using the same swap as before. + * Returns the swap entry's order if it still presents, else returns -1. */ -static bool shmem_confirm_swap(struct address_space *mapping, - pgoff_t index, swp_entry_t swap) +static int shmem_confirm_swap(struct address_space *mapping, pgoff_t index, + swp_entry_t swap) { - return xa_load(&mapping->i_pages, index) =3D=3D swp_to_radix_entry(swap); + XA_STATE(xas, &mapping->i_pages, index); + int ret =3D -1; + void *entry; + + rcu_read_lock(); + do { + entry =3D xas_load(&xas); + if (entry =3D=3D swp_to_radix_entry(swap)) + ret =3D xas_get_order(&xas); + } while (xas_retry(&xas, entry)); + rcu_read_unlock(); + return ret; } =20 /* @@ -2256,16 +2268,20 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, return -EIO; =20 si =3D get_swap_device(swap); - if (!si) { - if (!shmem_confirm_swap(mapping, index, swap)) + order =3D shmem_confirm_swap(mapping, index, swap); + if (unlikely(!si)) { + if (order < 0) return -EEXIST; else return -EINVAL; } + if (unlikely(order < 0)) { + put_swap_device(si); + return -EEXIST; + } =20 /* Look it up and read it in.. */ folio =3D swap_cache_get_folio(swap, NULL, 0); - order =3D xa_get_order(&mapping->i_pages, index); if (!folio) { int nr_pages =3D 1 << order; bool fallback_order0 =3D false; @@ -2415,7 +2431,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, *foliop =3D folio; return 0; failed: - if (!shmem_confirm_swap(mapping, index, swap)) + if (shmem_confirm_swap(mapping, index, swap) < 0) error =3D -EEXIST; if (error =3D=3D -EIO) shmem_set_folio_swapin_error(inode, index, folio, swap, @@ -2428,7 +2444,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, folio_put(folio); } put_swap_device(si); - return error; } =20 --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50D8928EA53 for ; Fri, 4 Jul 2025 18:18:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653094; cv=none; b=C4MyF3zke6WZaIURjZMV1mIX0M1AwCggj4gs76XgtVK5A7IR6rkHLfr60FZvydz7fHT121VDKaZHhCdo7wB2UAiAL9EyxjEAaFdf2PmhK6gNslZtph1nRA8y1wYDzWv2zqcAiAi7cA7luIIPqYgwaFte3yNusEt/BexDs5kREyI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653094; c=relaxed/simple; bh=5SlSKVYaikBvzNw7pg1GtRW0MelPrvAj+/aYCCw7eew=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=orr4HMYnGrtJaNo9OGQAuROWp/BDDSa7G8IGjrrcjBYX6U+JUkCIXffUZH2JVraiy1hxSCp8piihoLbxc9ZDCbAFuJf3NZU3EK75kLEmW4qbv4RS3T75zvp6llIJr5B1odBnoG5X2oN8TtTm7RuPjdmpygJ1eM5/0wBdDewH+cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KrH43g6X; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KrH43g6X" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-7d21cecc11fso234780085a.3 for ; Fri, 04 Jul 2025 11:18:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653092; x=1752257892; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=oWhZ3pxhN1lXqqOVvU26zrhwf3trxDJnUjGeu1jb+Qg=; b=KrH43g6XCWaeaxUYLYel60z1RdAfNU92uHz44+ot23/KxNk7on87wD/vt3Z8yomE9O czHWfZfAC10KGLYQu0EfRxygBjQcBoDuukuVCVey1El3p1fvdyVUCLNJrfZihK/MQmgl cFXUYxKzecHfeDwwz8157oc8iATozU1x8GrxFBIA8dUea32uFAm+nVo1teLLkOuqWSkv x5Q9LcHSOY61wAZsyBKlS2XBm0B15qBBr9ZgV+yKhSYNFjeWjrOHSiKEiMDjlox58UcN koUNSSKU07gIEfAfhdtsIPXT/FX4lUKAqW1g6kp83j6s9KLkJH4zfyp59M7smKuRshTp CCFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653092; x=1752257892; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=oWhZ3pxhN1lXqqOVvU26zrhwf3trxDJnUjGeu1jb+Qg=; b=vNBTOUQRl+jeYDC4iePYnEnzVLLJQMbMnJ9sSFJcfTFSQzHCAvoYmzx2WpRaxxe1Wm URHJd3SdssKfWRl5an1w2Lxib5TtNY+lCKXUaEidvNUh887koCwwFbviLr1GkxgLkrLH Y6Fb7ZtgKNeZSmo9eIyelzOSkhMZbrx5r2dflxcKqxKulQC9sb3QlCjncFOPaXxZ29Gm njGOZk3Id9C5BmwrbiRT/vLz1+p1GiBwIlFLZ4vk5rNSCKd7dchqTkG51cIk/DI0R97e shYLNd7056+LvVx5UPckLqOTIqWiHyjN658SrqciIyA/nYA/nBMYPz7AdlPGqNpJ0s0y dAQw== X-Forwarded-Encrypted: i=1; AJvYcCW050CeJTEuz48JQdyjFTu80C1WnaDZmGq5lPdynAneg6IDKWxGu+TtVEPR6QAcNuSUJJ8tmNSuZBOj6As=@vger.kernel.org X-Gm-Message-State: AOJu0YyAFMuYuq6q91AFtS2aQSt1h5aH3GGFoO3MceHrzT8Nm0MW0I7L qyiudfprGnyns9ap+Z9lE1BAaJ0M8mVr3hINFzGAHUv7G5GJUmxozFTw X-Gm-Gg: ASbGnctWfXEOpGt0X5oEqDuwjKhBFEETU17chmjA9GyVsKWNib6ErtVdq9rLO6tad4j jtXmhsODApE7HGgNi7v9G9tiq79i4+9VKhvY5mKBPfjgvib/4jZjctBRpGRmJvYChR90ELcBpXh f/I/CtPygVwQjLZfD9aP04/bh45nlDagjfX3qq9ZzYfgKlzy07yfLcd9oW3R5zFM8uOwdnlrn4+ BjHE2uMdmVbgHlnIahVybSiut3rl+E1LWZmuraiP/vQ4/qfbBya6SWNpuUpGhXd1GmnpZ2NE4Xo TQo1YY70e/jORIUwG4U6QKq946tdsRW8O9bV36jQkiGGWJZ3/Ewgybi36y4JP0mWkag= X-Google-Smtp-Source: AGHT+IFjFtRePaENXASuLHvAEpQN3EzqUwdcJQ4E3pI38WDkXZL5SEVcCYTGBTbX24Z+GJefpfkG0w== X-Received: by 2002:a05:620a:4403:b0:7d4:57b7:bc12 with SMTP id af79cd13be357-7d5ddaa7cb8mr405533485a.10.1751653091890; Fri, 04 Jul 2025 11:18:11 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:11 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 3/9] mm/shmem, swap: tidy up THP swapin checks Date: Sat, 5 Jul 2025 02:17:42 +0800 Message-ID: <20250704181748.63181-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Move all THP swapin related checks under CONFIG_TRANSPARENT_HUGEPAGE, so they will be trimmed off by the compiler if not needed. And add a WARN if shmem sees a order > 0 entry when CONFIG_TRANSPARENT_HUGEPAGE is disabled, that should never happen unless things went very wrong. There should be no observable feature change except the new added WARN. Signed-off-by: Kairui Song Reviewed-by: Baolin Wang --- mm/shmem.c | 39 ++++++++++++++++++--------------------- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 033dc7a3435d..e43becfa04b3 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1980,26 +1980,38 @@ static struct folio *shmem_swap_alloc_folio(struct = inode *inode, swp_entry_t entry, int order, gfp_t gfp) { struct shmem_inode_info *info =3D SHMEM_I(inode); + int nr_pages =3D 1 << order; struct folio *new; void *shadow; - int nr_pages; =20 /* * We have arrived here because our zones are constrained, so don't * limit chance of success with further cpuset and node constraints. */ gfp &=3D ~GFP_CONSTRAINT_MASK; - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && order > 0) { - gfp_t huge_gfp =3D vma_thp_gfp_mask(vma); + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + if (WARN_ON_ONCE(order)) + return ERR_PTR(-EINVAL); + } else if (order) { + /* + * If uffd is active for the vma, we need per-page fault + * fidelity to maintain the uffd semantics, then fallback + * to swapin order-0 folio, as well as for zswap case. + * Any existing sub folio in the swap cache also blocks + * mTHP swapin. + */ + if ((vma && unlikely(userfaultfd_armed(vma))) || + !zswap_never_enabled() || + non_swapcache_batch(entry, nr_pages) !=3D nr_pages) + return ERR_PTR(-EINVAL); =20 - gfp =3D limit_gfp_mask(huge_gfp, gfp); + gfp =3D limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); } =20 new =3D shmem_alloc_folio(gfp, order, info, index); if (!new) return ERR_PTR(-ENOMEM); =20 - nr_pages =3D folio_nr_pages(new); if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, gfp, entry)) { folio_put(new); @@ -2283,9 +2295,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, /* Look it up and read it in.. */ folio =3D swap_cache_get_folio(swap, NULL, 0); if (!folio) { - int nr_pages =3D 1 << order; - bool fallback_order0 =3D false; - /* Or update major stats only when swapin succeeds?? */ if (fault_type) { *fault_type |=3D VM_FAULT_MAJOR; @@ -2293,20 +2302,8 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, count_memcg_event_mm(fault_mm, PGMAJFAULT); } =20 - /* - * If uffd is active for the vma, we need per-page fault - * fidelity to maintain the uffd semantics, then fallback - * to swapin order-0 folio, as well as for zswap case. - * Any existing sub folio in the swap cache also blocks - * mTHP swapin. - */ - if (order > 0 && ((vma && unlikely(userfaultfd_armed(vma))) || - !zswap_never_enabled() || - non_swapcache_batch(swap, nr_pages) !=3D nr_pages)) - fallback_order0 =3D true; - /* Skip swapcache for synchronous device. */ - if (!fallback_order0 && data_race(si->flags & SWP_SYNCHRONOUS_IO)) { + if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { folio =3D shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp); if (!IS_ERR(folio)) { skip_swapcache =3D true; --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3EA52D8392 for ; Fri, 4 Jul 2025 18:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653099; cv=none; b=MsjdPCBwpg7KoN5TgLmJFAp2eDa/jHIzYiYKFDqRWLojpnSht/BtVH652uOEcPcBHrAomDKmBjmYi/tYH94XrEyuul82Rra5RmH3/U2rAZs2upfix9WEm+HUUAIp1x49ivuBmPAilZcxGxAw5PCSvh2npkXAb5QpXy3/3Wxbvb0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653099; c=relaxed/simple; bh=4Yo5ckMvYfWc4FLiy5iUpaApQaM7L/dvKx+H6a911wk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OC/r/uf1DCenSQoKnZ0uOxfvlIPE4V7/ipaBr+0wwu3Yzwc1+BlkB4MNUgSywKrCDE/tQE4BQ217K4pOQ1k//bphOcVwfXlOdPcXcVnwhXfQDInWA8Q7rIv0EmZ6ecWRcAC7C7x4RH00gT4qBDOg5Xnxr9zep3YTDD3N5UqaiBM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XnBGK2yP; arc=none smtp.client-ip=209.85.222.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XnBGK2yP" Received: by mail-qk1-f178.google.com with SMTP id af79cd13be357-7d094ef02e5so308065485a.1 for ; Fri, 04 Jul 2025 11:18:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653097; x=1752257897; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=XZoMdOAYvbZ00cTZOCHcf1+vcwk6yO09XTz7UlT/lic=; b=XnBGK2yPb192ffxHzhYaBVpG24tvjIA90xuAWCqQH/vDBCicCgOGok1u/o1yviNjos rzFqSNzR4lT82qBayaICrW2eOwWhWYaIfaztn2jEZR4q5K3vFKWIvsWVETCl10Aad1uY beFKIUhGZwQm1ye2Yuc9gv4MFeB+RLNnet+zkRWRQCRwrglb6q4q4brOLpBccL4L0/F1 QTaih+CizEYcB6EMfIOkDQQYIImjqeJykZ+4zS/uisXr0SawD2bY3gVjv2/ezq98o8t3 uTw/vk/wsnydVBTpBtqM91UVUfEz+0fenZiXXGy5ZVo18Qjj8bwmRsExeYOmm/rn4muA vq9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653097; x=1752257897; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=XZoMdOAYvbZ00cTZOCHcf1+vcwk6yO09XTz7UlT/lic=; b=CLmTDYSEhrQx7WV+OeJOyVhMI1kCSvkZc+D70oQHFbNF/MJxMbcNtHoCOMJZRKwnob bMZT+xg28/qdL9WRtgxa4D1+J1mYuT/xwhcOTMio9xqCSKvPXCw08MaPbnsELLgBWmtL cfdyXYh9Y4V6u7zJYrcJFgkeI36K2vpn+sVG8LE4ttVYicely22zxYLnp305K3mBS3cq pYYp53tz+VeWFFVKIbjsQBYQW/qCQR8ZZQI55Hd3g38suG5PGZE3Issp2636dhn3x5Ue x06fDAASCxfxm/iUjw9WIWQkzEk7w14TQ4QjiIHcs6sLfs8k/CCym2wAjoZ/fOqsrFha xNJQ== X-Forwarded-Encrypted: i=1; AJvYcCXcOZXOXmeo8P29VaKex2n7IwhWtBwDMGsd2g3CYdYxNKahxUfcrpa+EmLF4z+2fCHEpBACjAn0fkKhLEg=@vger.kernel.org X-Gm-Message-State: AOJu0YzKYWO3kIbZqxKe48NkKWmtxfC/6sSDamVgHqIER3ZIQ7BIK+QA /047ht9dVfnju+vJ9U2Vk54/zom++zFl+F4UkaKdPkJAQyuXuZgxqz3K X-Gm-Gg: ASbGncso5/XaoyHEff2hKuTGtfSpMrMaoMU1iqihm2ZWQp2OOmlrVbuE88j8tGBgRWc Lz/Flxb67k9V7RZIRsFvwSUz2kdXYi/78mSqhCXrNNIVGtn1XgBUQIOdtxJla/ixJp5g8zC17dP MlN5rsfRMuNZCG1QxiOkYZS0M7UajVbSEq10huuv7dZoofg3GGdzHLRWnM75bRQhfqJrvKSh7qH cyu5V9iQWKMmz9pXWuM5Xh8r5FvXMMIPZcIA8R19MP0yCMIgph44hVlYXOH+QWO3ajVOn1QLNpu xMIY1QboQhtkX6mZybXMfd1husse+r6hfJitB66z8QNV5k8U6B0LvxhUto8GXqPBPFw= X-Google-Smtp-Source: AGHT+IHZ2nb6ar//pNiW7UrB4lVG4pdlvO/xT2T5TEyaxDBNRUFLNVCBYd9MeBTl7yyXqu9f+E/xqw== X-Received: by 2002:a05:620a:2982:b0:7d4:67f1:df8d with SMTP id af79cd13be357-7d5dcd2e0dcmr605505985a.21.1751653096350; Fri, 04 Jul 2025 11:18:16 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:15 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 4/9] mm/shmem, swap: tidy up swap entry splitting Date: Sat, 5 Jul 2025 02:17:43 +0800 Message-ID: <20250704181748.63181-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Instead of keeping different paths of splitting the entry before the swap in start, move the entry splitting after the swapin has put the folio in swap cache (or set the SWAP_HAS_CACHE bit). This way we only need one place and one unified way to split the large entry. Whenever swapin brought in a folio smaller than the shmem swap entry, split the entry and recalculate the entry and index for verification. This removes duplicated codes and function calls, reduces LOC, and the split is less racy as it's guarded by swap cache now. So it will have a lower chance of repeated faults due to raced split. The compiler is also able to optimize the coder further: bloat-o-meter results with GCC 14: With DEBUG_SECTION_MISMATCH (-fno-inline-functions-called-once): ./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-82 (-82) Function old new delta shmem_swapin_folio 2361 2279 -82 Total: Before=3D33151, After=3D33069, chg -0.25% With !DEBUG_SECTION_MISMATCH: ./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o add/remove: 0/1 grow/shrink: 1/0 up/down: 949/-750 (199) Function old new delta shmem_swapin_folio 2878 3827 +949 shmem_split_large_entry.isra 750 - -750 Total: Before=3D33086, After=3D33285, chg +0.60% Since shmem_split_large_entry is only called in one place now. The compiler will either generate more compact code, or inlined it for better performance. Signed-off-by: Kairui Song Reviewed-by: Baolin Wang Tested-by: Baolin Wang --- mm/shmem.c | 53 +++++++++++++++++++++-------------------------------- 1 file changed, 21 insertions(+), 32 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index e43becfa04b3..217264315842 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2266,14 +2266,15 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, struct address_space *mapping =3D inode->i_mapping; struct mm_struct *fault_mm =3D vma ? vma->vm_mm : NULL; struct shmem_inode_info *info =3D SHMEM_I(inode); + swp_entry_t swap, index_entry; struct swap_info_struct *si; struct folio *folio =3D NULL; bool skip_swapcache =3D false; - swp_entry_t swap; int error, nr_pages, order, split_order; + pgoff_t offset; =20 VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); - swap =3D radix_to_swp_entry(*foliop); + swap =3D index_entry =3D radix_to_swp_entry(*foliop); *foliop =3D NULL; =20 if (is_poisoned_swp_entry(swap)) @@ -2321,46 +2322,35 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, } =20 /* - * Now swap device can only swap in order 0 folio, then we - * should split the large swap entry stored in the pagecache - * if necessary. - */ - split_order =3D shmem_split_large_entry(inode, index, swap, gfp); - if (split_order < 0) { - error =3D split_order; - goto failed; - } - - /* - * If the large swap entry has already been split, it is + * Now swap device can only swap in order 0 folio, it is * necessary to recalculate the new swap entry based on - * the old order alignment. + * the offset, as the swapin index might be unalgined. */ - if (split_order > 0) { - pgoff_t offset =3D index - round_down(index, 1 << split_order); - + if (order) { + offset =3D index - round_down(index, 1 << order); swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); } =20 - /* Here we actually start the io */ folio =3D shmem_swapin_cluster(swap, gfp, info, index); if (!folio) { error =3D -ENOMEM; goto failed; } - } else if (order > folio_order(folio)) { + } +alloced: + if (order > folio_order(folio)) { /* - * Swap readahead may swap in order 0 folios into swapcache + * Swapin may get smaller folios due to various reasons: + * It may fallback to order 0 due to memory pressure or race, + * swap readahead may swap in order 0 folios into swapcache * asynchronously, while the shmem mapping can still stores * large swap entries. In such cases, we should split the * large swap entry to prevent possible data corruption. */ - split_order =3D shmem_split_large_entry(inode, index, swap, gfp); + split_order =3D shmem_split_large_entry(inode, index, index_entry, gfp); if (split_order < 0) { - folio_put(folio); - folio =3D NULL; error =3D split_order; - goto failed; + goto failed_nolock; } =20 /* @@ -2369,15 +2359,13 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, * the old order alignment. */ if (split_order > 0) { - pgoff_t offset =3D index - round_down(index, 1 << split_order); - + offset =3D index - round_down(index, 1 << split_order); swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); } } else if (order < folio_order(folio)) { swap.val =3D round_down(swap.val, 1 << folio_order(folio)); } =20 -alloced: /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || @@ -2434,12 +2422,13 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, shmem_set_folio_swapin_error(inode, index, folio, swap, skip_swapcache); unlock: - if (skip_swapcache) - swapcache_clear(si, swap, folio_nr_pages(folio)); - if (folio) { + if (folio) folio_unlock(folio); +failed_nolock: + if (skip_swapcache) + swapcache_clear(si, folio->swap, folio_nr_pages(folio)); + if (folio) folio_put(folio); - } put_swap_device(si); return error; } --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6275F2DEA79 for ; Fri, 4 Jul 2025 18:18:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653104; cv=none; b=dIAXwfPXmtCsg2j5n/7aRluA3s0bW7kLzQCu3IGdteVVhPext3Kbe7c4zYhCMybB63lIsicWGgmNOaXQbZ+UoWR31TreuFIVvopZ93/DNECYFfB7R5sn6Drz/Wsy4eXxDCTzbuR3PnNC88MjJxcmADdXf8HhqiyzoDmbis8iVBw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653104; c=relaxed/simple; bh=WFMgaPbe7/k/TJWVX1DTB5F8B8zvI68uO7UL8ILITxo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UjKa0AQylhYS8FP3xY4rAQ3vSqcncZ5my2rWaMkpd0SjUTooZcNARhMUA6ES9nAimNF/43udUYY97nbGGRA9dx0e/fAse/CGfcFrW5xL20XQCKAfZ5xQEM6qkbr+7auBCHfJmeBJOGvcAGNXpTv/CCWVocr8uhETszOGkqK9Ipc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TOSYIapr; arc=none smtp.client-ip=209.85.222.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TOSYIapr" Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-7d412988b14so119417085a.3 for ; Fri, 04 Jul 2025 11:18:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653101; x=1752257901; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=XC/4+8+y/JD3aw5ERJm3ttJpqS7kWZzM7x3+YpNqmRM=; b=TOSYIaprkYob/PWh05V7/5AlDCqSJqb5XWvpNzGC9cUyQyKt1NdLVrr0FDVRJxnCFy xbUfqkwK/oULSGwEz1NHus6DuPov1oNf2fx0CWNcABtdh/OsGIcQqXbbcZVgP1l1F8ip EEYKdhDzDYgT4EXxoBhcfv0drrYQ3ayl9pmC24sDVUWrhyBnrz1fzXZa5CzmqdZgwxeD GkwJBv6oR8k/yeVNcSLj/1X8+0iCcmTPZrCx3gG4zPXV2Jseb3w4W8XML8dGNC1Qr5FB GOVJWSbCVIwKZFkTiZ4JPlp/V61AuPcbZFk4Hhk/LAVhFtHJpy7Itv3oOv5OaetDZhpF pyHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653101; x=1752257901; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=XC/4+8+y/JD3aw5ERJm3ttJpqS7kWZzM7x3+YpNqmRM=; b=TQqBtct5gHIIdB6REzhRoqr72KwaLPdW/5Hl1izY1IhhkxUW5D0P1ibDHHq2pb7rmZ j1Pth9HXQCdDq3MvO4hPbW8Sfc72xFtTDKfBj73e5tWGN6/eNTWuG+VjVehzSDgG8oiW Gj1y/rXM3/M5vVIImFMvA7+1nc6cXMigi9LmO1iOGIFucdD0+FiODNLMkb5UbLKm5q/g PZjjl8JBS+XOMaS2L9gqofKM/m8f6BCLG8oKgAU6PqeAG5t1RPszRW9B8wlWT0tNVUVT AW6XGH2LYA5MJtPQy86x53tww5gbwSYqIUDuCim6DP3RRIYx4COUazIjRLJuyCqnShdH wDcQ== X-Forwarded-Encrypted: i=1; AJvYcCWTj8E2vu3uMUxjDnximrJi8f6wKoNqlb7a3uEkKBPNdEJ/Z64Hci+IFH0NGESlaZmFNPm9xEuC8AHfHQY=@vger.kernel.org X-Gm-Message-State: AOJu0YwuRO82hzV2GwvD2XGcrcEdfenKb1Ws00u0GsxFanrhdbWls0ly YfIZat0DRgpy3ZkYXzOT0fmOofKPL4l45TKSHQV2Ujf7nZYrlRpjTYsH X-Gm-Gg: ASbGncvTdasPgeEOzCDxVwQ2QtXcnmFJwl58G9swXpRW5siByQtFaCWTS8gyePl5Xzj tOWm1025VupD5cD645QiTL/9YT5fN5EU6mjSWSBfS+W19MfZ0k1l6uanU44q2C92MVrnHFqXKkQ FY5NjJ9EB0CuhMcw9eNgq/jfFveoRpeKsrKxAGATiGoziCDT5fii/04juB3NHsUoxcaFNiooj2e bJkTzK21CR6Ed+9NWIUUC3g8iIfSTVidsKKcebuc2BKZ0dFcircbYwv/lCThO3drynqWKiuwFFU qxITGKDXj2k68cyYcVtcYOuLq/7D3hNVTiHbXoJ6vC355OME56M6yDc3VQ//KzMDyNw= X-Google-Smtp-Source: AGHT+IHIhs+aAYNm5nbtVGiYot+y4zzSvlKo3r5oS8teUGWvSp1aiRlv/9hyciyEEGHfYCfto31h6A== X-Received: by 2002:a05:620a:2482:b0:7d4:5c30:2acd with SMTP id af79cd13be357-7d5ddc2eb1fmr413391385a.28.1751653101143; Fri, 04 Jul 2025 11:18:21 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:20 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 5/9] mm/shmem, swap: avoid false positive swap cache lookup Date: Sat, 5 Jul 2025 02:17:44 +0800 Message-ID: <20250704181748.63181-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song If a shmem read request's index points to the middle of a large swap entry, shmem swap in will try the swap cache lookup using the large swap entry's starting value (which is the first sub swap entry of this large entry). This will lead to false positive lookup results, if only the first few swap entries are cached but the actual requested swap entry pointed by index is uncached. This is not a rare event as swap readahead always try to cache order 0 folios when possible. Currently, shmem will do a large entry split when it occurs, aborts due to a mismatching folio swap value, then retry the swapin from the beginning, which is a waste of CPU and adds wrong info to the readahead statistics. This can be optimized easily by doing the lookup using the right swap entry value. Signed-off-by: Kairui Song --- mm/shmem.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 217264315842..2ab214e2771c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2274,14 +2274,15 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, pgoff_t offset; =20 VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); - swap =3D index_entry =3D radix_to_swp_entry(*foliop); + index_entry =3D radix_to_swp_entry(*foliop); + swap =3D index_entry; *foliop =3D NULL; =20 - if (is_poisoned_swp_entry(swap)) + if (is_poisoned_swp_entry(index_entry)) return -EIO; =20 - si =3D get_swap_device(swap); - order =3D shmem_confirm_swap(mapping, index, swap); + si =3D get_swap_device(index_entry); + order =3D shmem_confirm_swap(mapping, index, index_entry); if (unlikely(!si)) { if (order < 0) return -EEXIST; @@ -2293,6 +2294,12 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, return -EEXIST; } =20 + /* index may point to the middle of a large entry, get the sub entry */ + if (order) { + offset =3D index - round_down(index, 1 << order); + swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); + } + /* Look it up and read it in.. */ folio =3D swap_cache_get_folio(swap, NULL, 0); if (!folio) { @@ -2305,8 +2312,10 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, =20 /* Skip swapcache for synchronous device. */ if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { - folio =3D shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp); + folio =3D shmem_swap_alloc_folio(inode, vma, index, + index_entry, order, gfp); if (!IS_ERR(folio)) { + swap =3D index_entry; skip_swapcache =3D true; goto alloced; } @@ -2320,17 +2329,7 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, if (error =3D=3D -EEXIST) goto failed; } - - /* - * Now swap device can only swap in order 0 folio, it is - * necessary to recalculate the new swap entry based on - * the offset, as the swapin index might be unalgined. - */ - if (order) { - offset =3D index - round_down(index, 1 << order); - swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); - } - + /* Cached swapin with readahead, only supports order 0 */ folio =3D shmem_swapin_cluster(swap, gfp, info, index); if (!folio) { error =3D -ENOMEM; --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFE692DEA91 for ; Fri, 4 Jul 2025 18:18:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653108; cv=none; b=NfzGlWkCbSyuhfc9Cz6Pgqju0528FgYjALuUvenesn8XEEQeMk36t7qeYqDFbfJTtuyGXpvytQ3/E0vfnU+y1GMIzyQIckHrYc0czoTszgKiqCQZkdFKeo7+5nK5A8VEYMGFFpsDxp29H62AKKcRjRt0ChMSKv91auwNXZrXpRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653108; c=relaxed/simple; bh=sOF8j109TVAQDLZQ/4m6sSs9/wpUhGrSj5un3YtCCW0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RmPz4Rah/KxBVtrj7+REKJYwrB+tLPwYQzmiZG4rhqE1hZG/TMhO+HkXwV5lt7w3bJAbRtLTCurS9zbLUtvx8nz6LaWzYa5lUSc6sr4RAQ0214t73mFMXOSTqeLH0XtZL0mpLXee7ioYWI87l1m7vPuIdBKtdeOqkGKIUHLsEwE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=N+O5HbPm; arc=none smtp.client-ip=209.85.222.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="N+O5HbPm" Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-7d094ef02e5so308081585a.1 for ; Fri, 04 Jul 2025 11:18:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653106; x=1752257906; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=fm6vJJaMXbRf6XqtNvSdcQQA6cTKubPXK6+QlM/U09I=; b=N+O5HbPmeO/OGlAug8i80IewpEGXBkG0xQb0t2Z4t17Sj30r4nmgU9SdShRFXTCoFx ZCDZpxVBH9KLUcrdLkjQeJaEtI6PbNhojL8DrJZj4ewMKfqkUJfxA2O7zLJTsdwxCaOr dHesyV+6glxvR6nJzbq2EYn40mBcKmvocHwpoT90mFBaHNi4tXKcmRpkGqGpSElkDLM9 kftn9/spSdKRHhK9rjpBjpnQp1hFnBX/rkprYGqpJx7Zx3HJ1CCBvWKfZpE+CuXHWQRb SB70fgmEsjNBhG3TSKOrIIYK1ORNykLSQgYrXDHg+q3lw5lkgOfLyjhua9AnuuKlN0Nf kXZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653106; x=1752257906; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=fm6vJJaMXbRf6XqtNvSdcQQA6cTKubPXK6+QlM/U09I=; b=uY6WR45PCekpsMQbDXcTIhtwLG7++/bi5II4njFp7F5kuj7Z6iFYFRwbM9xhSXVeyQ ymRfxnkSUjImY1gh3QGSPiJssuMSYGYEVdrIBBej65Y1/5n1BcI/wlXecKSoLtLRaojQ f04fx3n24/clb9hdhXdFHW+W4bdKv/ck5V9FMDzg62e2vsO54xrdf0f1+5flpretfS0J 3YaUoiD1Qu7IIGdqadcfiDjW8le0yeMHVdw4qfNEwTyluB3u7naxL2h+tnldpqpQu98t 8UkIo2jwU2wEcAMwgmJdd1qNUYZZ3D9TPhHRp2Ocim/xDWlpkJPeHV2I59k3xhYOVKAu SVxQ== X-Forwarded-Encrypted: i=1; AJvYcCVZQ8+e4EmxlyCuZpIHd2rVSY+iVlHwf0urDrhV1HNZkSCTOtMrh+fD951Wt4+5jTTD+EjPplQ1Arm00OU=@vger.kernel.org X-Gm-Message-State: AOJu0YyPvGHhuudrplEXgtN3diMaD7gg0q3SeQeBEnK/GhaIKPnKtmoV DLzcM4SRUTdSMr3gwCWSdofpOkLiDdB1CYuQglb6GDC3v9iPMr7wa65S X-Gm-Gg: ASbGncuAwLvgoMRycMIIlyUqJahhez6rf0PkVjGcOf/MTBGzXu8NaoPR+B6YRQxU0Ud OU5diIiwXpGey/UtMwGbTPLQbfplIK25G6V1uGzaxISmjMjhq5JTsl88tVN8qsjTbOKTxlXEXEr gRj9n0F2+fDnADmTz9XAYCE+Gg+RiX2RJOC2DSPmwN20XHO0yxMb6g7NrjmpZFjZQphUgv3AXi0 d2/hLoA6L/Lgky2ugA8X/NR/o5JSkJv2onyTUxgTTV8n/WMFBxVNjOq6Mh8J+KxJBYwnoaTpRVG 1a1qjcaaVvgFbdDA24QoE89O7ZtvQxyQT9UXe3oRJ2ul8/vK0NzXXU7gThQBBwBS9/0= X-Google-Smtp-Source: AGHT+IEXrMp6Gc6YtTfhlo1IKx9/9SN33lsFuLpyujrzcHT63LT3n6kt97kO/u1UXRNsCA8RvHka+g== X-Received: by 2002:a05:620a:6289:b0:7c5:95e6:62c9 with SMTP id af79cd13be357-7d5dcd47e14mr451163785a.29.1751653105731; Fri, 04 Jul 2025 11:18:25 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:25 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 6/9] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO Date: Sat, 5 Jul 2025 02:17:45 +0800 Message-ID: <20250704181748.63181-7-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Currently if a THP swapin failed due to reasons like partially conflicting swap cache or ZSWAP enabled, it will fallback to cached swapin. Right now the swap cache still has a non-trivial overhead, and readahead is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip the readahead and swap cache even if the swapin falls back to order 0. So handle the fallback logic without falling back to the cached read. Signed-off-by: Kairui Song --- mm/shmem.c | 55 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 36 insertions(+), 19 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 2ab214e2771c..1fe9a3eb92b1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1975,13 +1975,16 @@ static struct folio *shmem_alloc_and_add_folio(stru= ct vm_fault *vmf, return ERR_PTR(error); } =20 -static struct folio *shmem_swap_alloc_folio(struct inode *inode, +static struct folio *shmem_swapin_direct(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - swp_entry_t entry, int order, gfp_t gfp) + swp_entry_t swap, swp_entry_t index_entry, + int order, gfp_t gfp) { struct shmem_inode_info *info =3D SHMEM_I(inode); + swp_entry_t entry =3D index_entry; int nr_pages =3D 1 << order; struct folio *new; + gfp_t alloc_gfp; void *shadow; =20 /* @@ -1989,6 +1992,7 @@ static struct folio *shmem_swap_alloc_folio(struct in= ode *inode, * limit chance of success with further cpuset and node constraints. */ gfp &=3D ~GFP_CONSTRAINT_MASK; + alloc_gfp =3D gfp; if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (WARN_ON_ONCE(order)) return ERR_PTR(-EINVAL); @@ -2003,19 +2007,22 @@ static struct folio *shmem_swap_alloc_folio(struct = inode *inode, if ((vma && unlikely(userfaultfd_armed(vma))) || !zswap_never_enabled() || non_swapcache_batch(entry, nr_pages) !=3D nr_pages) - return ERR_PTR(-EINVAL); + goto fallback; =20 - gfp =3D limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); + alloc_gfp =3D limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); + } +retry: + new =3D shmem_alloc_folio(alloc_gfp, order, info, index); + if (!new) { + new =3D ERR_PTR(-ENOMEM); + goto fallback; } - - new =3D shmem_alloc_folio(gfp, order, info, index); - if (!new) - return ERR_PTR(-ENOMEM); =20 if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, - gfp, entry)) { + alloc_gfp, entry)) { folio_put(new); - return ERR_PTR(-ENOMEM); + new =3D ERR_PTR(-ENOMEM); + goto fallback; } =20 /* @@ -2030,7 +2037,9 @@ static struct folio *shmem_swap_alloc_folio(struct in= ode *inode, */ if (swapcache_prepare(entry, nr_pages)) { folio_put(new); - return ERR_PTR(-EEXIST); + new =3D ERR_PTR(-EEXIST); + /* Try smaller folio to avoid cache conflict */ + goto fallback; } =20 __folio_set_locked(new); @@ -2044,6 +2053,15 @@ static struct folio *shmem_swap_alloc_folio(struct i= node *inode, folio_add_lru(new); swap_read_folio(new, NULL); return new; +fallback: + /* Order 0 swapin failed, nothing to fallback to, abort */ + if (!order) + return new; + order =3D 0; + nr_pages =3D 1; + alloc_gfp =3D gfp; + entry =3D swap; + goto retry; } =20 /* @@ -2309,25 +2327,24 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, count_vm_event(PGMAJFAULT); count_memcg_event_mm(fault_mm, PGMAJFAULT); } - /* Skip swapcache for synchronous device. */ if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { - folio =3D shmem_swap_alloc_folio(inode, vma, index, - index_entry, order, gfp); + folio =3D shmem_swapin_direct(inode, vma, index, swap, + index_entry, order, gfp); if (!IS_ERR(folio)) { - swap =3D index_entry; + if (folio_test_large(folio)) + swap =3D index_entry; skip_swapcache =3D true; goto alloced; } =20 /* - * Fallback to swapin order-0 folio unless the swap entry - * already exists. + * Direct swapin handled order 0 fallback already, + * if it failed, abort. */ error =3D PTR_ERR(folio); folio =3D NULL; - if (error =3D=3D -EEXIST) - goto failed; + goto failed; } /* Cached swapin with readahead, only supports order 0 */ folio =3D shmem_swapin_cluster(swap, gfp, info, index); --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB9312DEA9E for ; Fri, 4 Jul 2025 18:18:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653113; cv=none; b=P/9duh+wJ6RdkLGPakQnDi0GaNjjipkK6r1p3ddhLXqp3UIkKDdySMMJfF89QeYDZpteRc4dyvB357pUjwn5JA4iY75X7kEEdRNfOim4o262J4S0PjKuJm6qZy50mVzsNSSyr7QZY4sZ+bfnaN5iuixQUb+cQ7Froq2e1L5cT5A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653113; c=relaxed/simple; bh=RVKsVZatIgujAvo39iiNxV349PG6yzBkXlUFjQWvy/8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gqHwv9gM8yEfCXGLPaqHUuc/q+5+dwjYbYERT8nHOFxYAD8ryeRp265i5ymoihmaVB5IZ093cfdFb/xJHoRBiqAUxENcvKLs6yEzB7Pxm7jrzOzF3atRGVxMn9aad+QS/ohHGIaEvEM7QMdKK9Xeu7zoiq82HpTMFmwmSG+C0kk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=F9aHOswD; arc=none smtp.client-ip=209.85.222.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="F9aHOswD" Received: by mail-qk1-f172.google.com with SMTP id af79cd13be357-7d3e7503333so180051785a.3 for ; Fri, 04 Jul 2025 11:18:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653110; x=1752257910; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=aENP2iu4I7U0kmp4JHtI/iZifd2SYatC9OugRasA0/8=; b=F9aHOswDPj6GxlIVRPRjI2dHgSPtHxta7FP8Iz+/PQmnrzFlgZAf1ASKwpnMU6lNXc L70Kx2jDnObdxQIZTsJjNh3tDEIKqR9fRhkynBmDpTwC0JGTGYF+3nQOnPeR/oj/EK2M nGh4k8erVrArTQsKhvXcXeHEe8Ji2Aj+nAP9rRIbGk/AKoIoPtkyErEzTd1HUiE7PPNX /BwHSJJnVdJprG58XpewOLpiAG9JE4idGWaNeUx0a1YZPkmbIeVXE0Wkf6VrVoDfmIpD KepV9UICTVwuZIhnT6rOfLtrBBeWuZokjxUzudi2KT0MJi7LSMNRm1Ezj0mn3Tgrhh3l XzCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653110; x=1752257910; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=aENP2iu4I7U0kmp4JHtI/iZifd2SYatC9OugRasA0/8=; b=gMkQZ1u0+71WzWPbXstnQLiZAs6QK1lZsXQoT7OVDvPd4IpCOnvK2Fr00MTuNXu02t HqsHetp7N2OAqk04kCvcoCH7Rvltl7tbFARt0c4j+h7jD4E6gnuuTGrmqcWI+cvYjeUF kaxRtQb6vt70GJ3T3y70n4TQdcNWbd5W8gLPdKeonZP653I1T8OUlSaVCZraKAe+ZH88 Qy2q+ktfOWAo1s6zb6TPUAjlotM3zd9AC/0pMc+Npy9uR3HUPMTFEEnZLW5ZpnuJVsVO BOK0r7A7bhKtfox+TakVyeyVUVX3rlQWz6PI8bibM0ED4O/MDR0Fxu3rZXU1JEflpgLH 5+ug== X-Forwarded-Encrypted: i=1; AJvYcCXKMtXwxX3DdhQv8CKqWJUC0pMlF4VtNfk/rahEovsGmb/2tpVrfhBQjyJMr00CbhHLGkV4ZOHA6Q1nPm8=@vger.kernel.org X-Gm-Message-State: AOJu0Yw7i2erhR/K1ynS5P5a6ayL+xecVG3bTyqda+tOp+Q3sL0YQiui zBMtn40ctiIeP9a5RLaG911dz4WdF8VL7TJ5oNfHNEXwzktTkJOkxNF2 X-Gm-Gg: ASbGncsm0eaeej41jGh7MEPXtVTHGe1cS1GeXsRyvizZ46PHHfoxKBZgUyD9hkak8W2 G0KGoNS3qLJBXDzMCIyZ/iimGddWJ1PdmR2nWyPN50+nxVFPP05EpYsrVop5xtkhbiaWnixzmWn JcJOowf21JsDTJ2I39wEOSrU7vHjldZoktFRiLdN+hswaxp8dl7rRxC0UbWQWjgSW9ZuvcpfB6X RyePKMIj8UqWdfNE1Jehn+wj66XGhnGp5lGsMUu2Y0oe+ORUJDw2v7Z34HJoao6tof812z2wBkr bn62M9JeVFhLuJeK1SwW8xUgc+MAKNDT9s9eyyR6t2HLp6Ce10t/au7m8J9wCEk21c0= X-Google-Smtp-Source: AGHT+IGEgOE2jwpm8pCR+sUWgH7jwim5UJ8UqD4Xh0VD4cdL7RfEFoa9A6rupdlAWG9NTMjYVL9FfQ== X-Received: by 2002:a05:620a:7017:b0:7ce:bd8b:2d08 with SMTP id af79cd13be357-7d5dcc4904emr490235885a.10.1751653110162; Fri, 04 Jul 2025 11:18:30 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:29 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 7/9] mm/shmem, swap: simplify swapin path and result handling Date: Sat, 5 Jul 2025 02:17:46 +0800 Message-ID: <20250704181748.63181-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Slightly tidy up the different handling of swap in and error handling for SWP_SYNCHRONOUS_IO and non-SWP_SYNCHRONOUS_IO devices. Now swapin will always use either shmem_swapin_direct or shmem_swapin_cluster, then check the result. Simplify the control flow and avoid a redundant goto label. Signed-off-by: Kairui Song --- mm/shmem.c | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 1fe9a3eb92b1..782162c0c4e0 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2327,33 +2327,28 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, count_vm_event(PGMAJFAULT); count_memcg_event_mm(fault_mm, PGMAJFAULT); } - /* Skip swapcache for synchronous device. */ if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { + /* Direct mTHP swapin skipping swap cache & readhaed */ folio =3D shmem_swapin_direct(inode, vma, index, swap, index_entry, order, gfp); - if (!IS_ERR(folio)) { + if (IS_ERR(folio)) { + error =3D PTR_ERR(folio); + folio =3D NULL; + goto failed; + } else { if (folio_test_large(folio)) swap =3D index_entry; skip_swapcache =3D true; - goto alloced; } - - /* - * Direct swapin handled order 0 fallback already, - * if it failed, abort. - */ - error =3D PTR_ERR(folio); - folio =3D NULL; - goto failed; - } - /* Cached swapin with readahead, only supports order 0 */ - folio =3D shmem_swapin_cluster(swap, gfp, info, index); - if (!folio) { - error =3D -ENOMEM; - goto failed; + } else { + /* Cached swapin with readhaed, only supports order 0 */ + folio =3D shmem_swapin_cluster(swap, gfp, info, index); + if (!folio) { + error =3D -ENOMEM; + goto failed; + } } } -alloced: if (order > folio_order(folio)) { /* * Swapin may get smaller folios due to various reasons: --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24D242E0930 for ; Fri, 4 Jul 2025 18:18:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653117; cv=none; b=k9V8j3KxZ0zKBY7yf3EyFiBmFT0YgSdp9Q3GqLLaSXcn7S/O7orufMlNxtBSLMP2Hgi72FUMXCphMCIHAhk+/15Kh1FO2OdkmAy/sW6HOLIys+ph17LMEX1b8EujsprMCd1t+R6jafgYSoo2CEAcrfyAc8RBqjBmUR3FZPiotfM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653117; c=relaxed/simple; bh=CtSDpT85CFG5ng5SEZoFLiGtPGa/AzU9x1e2bZjTrG8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BtTQDbHwI7mjwhYD6drIW5hV7Xu2O/cUvAPmc2dFCG1aSGxZcGA4sczta+jGNMutBKeRVln0SO66XPBVOUUirZ/zkZvoqnVqCaCitzdgvUxKZwOBA2WPudVysbeVwBuIW2WecEfDAeB1Q3hlNqhHUEKEqRmLHkffzW8sJV/Tr5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TRqW9Ny4; arc=none smtp.client-ip=209.85.222.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TRqW9Ny4" Received: by mail-qk1-f172.google.com with SMTP id af79cd13be357-7d44c6774e7so73089785a.3 for ; Fri, 04 Jul 2025 11:18:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653115; x=1752257915; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=E8faw6sAX+Ml82MO55gj9R3m/gUe9viz373aFcWSXcQ=; b=TRqW9Ny4YPPxkLVxnU/xUv6txWQgejKnvtXJjhrjY2I749NI6b3My9P/8lW4Anoeko w6uR62r4rwRGeRuLqPKpDFG+kfW/PiCkyYvyFEnTg2HFFYbdxNNTn63R9RKoAqKLWQfG ydzvo8Mqq5AGOZZ58033UKobEP10dU64nUgqOWMp8q2DnA28OAVCDal/TW5QDEMATekY ur82FigQXq+1hwYtu3kinbWStBZKZy39ZsCI6JAhz6f6nRu7g+ehkAvnmhavfthdGIVo goDl0p0h4o8qNo0R5RN7RW6UHVtgCYG0dxSSFGS6w+3musv78WT26Qf38U7pGEGLZTFJ F64g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653115; x=1752257915; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=E8faw6sAX+Ml82MO55gj9R3m/gUe9viz373aFcWSXcQ=; b=cs9ZiMDDix2N7MlFOrmUhjX4LvBvec5i51zdXMfeS6lD7w9yJHtDlQzk1GEcrEGATi Grytthc/dfg9VbF50BgWLrm635Q2sen+Fsv7s+bO3VlRT5KT/GULFC38/Ya0/oGe9/Xr AwRTxa7xWOEGs4J6hqfOq+8uhfaJrvdQSQqXa4fXWK/DcypHI9A0woKY20bWQkkGolTv zrreB5VWVj8KHnx9egSQHW0wXvy+flfZ1Mm6L9N7FXr0w2SG51hHjdba84N8AOy7U545 4d+kT4kFKgMHw015P5Wql4kfiVp/gcd7cXhDjI64dZhKbCkvasYKhbyzK0lWEocWEHJx 4v/g== X-Forwarded-Encrypted: i=1; AJvYcCX5liiIo8na5TN1njjbSj5v65kDhxsF3k20kJBLC1fsYYqRhqz23Se+8BQcB0T8o9oYHuu4oCug0sLYF3M=@vger.kernel.org X-Gm-Message-State: AOJu0YyjxZKGz7xzaD6r86YsbK8zE0puVJodwjyzRFdZtKDZTIvIDp4C 2T4KBh83NMYeiuuLSpLiA6XjKlOzFCEDBZNyfxcoITzyCQRFytyot9Y4 X-Gm-Gg: ASbGncuZKdQ2C9PnkSrGkOO/ObITt3zOysZ9NTwH6y/I80+apLodO797UpImpn3pg4l 5Utg4OwVIf5HP4vQ2w2MB4KbXg8bbh/juuJ/CSI2QoLdgWtoQMfAFIuJXujSfI2eJNSVtkk0U+U 5wAc+9lIWV8mAH/qRBZogSCKN2itbLJEvrNJJz65gYlO2BRithAiWoeVULFk+OtV9TyY+rRwt6+ fEQ7NDbytPBjcNratQCBX+C+mV9O/v2dDfd9G8uRCERtiWq052x1pI9XYV5z1642M19pGWSWIGX M3p4HPVz3ShOQifcMBvp9UGwHC2IfSKt3hNmk6jlvxlMMRN1qji/xo+dn33AGqKNcro= X-Google-Smtp-Source: AGHT+IGEl8jKR2TlZ4n+3yLyyMozyoKc99DvoeWJCrCkBYUgU9He1oJ+ddvGlfVppqxhmA8PRgOQCQ== X-Received: by 2002:a05:620a:60db:b0:7d4:604c:31bd with SMTP id af79cd13be357-7d5df194ae3mr349988085a.56.1751653114688; Fri, 04 Jul 2025 11:18:34 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:34 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 8/9] mm/shmem, swap: simplify swap entry and index calculation of large swapin Date: Sat, 5 Jul 2025 02:17:47 +0800 Message-ID: <20250704181748.63181-9-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song Noticing large shmem swapin have already calculated the right swap value to be used before the swap cache look up, simply rounding it down against the size of the folio brought in the swapin is simple and effective enough to get the swap value to be verified. A folio's swap entry is always aligned by its size. Any kind of parallel split or race is fine, because the final shmem_add_to_page_cache always ensures entries covered by the folio are all correct, and there won't be any data corruption. This shouldn't cause any increased repeated fault too, instead, no matter how the shmem mapping is split in parallel, as long as the mapping still contains the right entries, the swapin will succeed. This reduced both the final object size and stack usage: ./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o add/remove: 0/0 grow/shrink: 1/1 up/down: 5/-214 (-209) Function old new delta shmem_read_mapping_page_gfp 143 148 +5 shmem_swapin_folio 4020 3806 -214 Total: Before=3D33478, After=3D33269, chg -0.62% Stack usage (Before vs After): shmem.c:2279:12:shmem_swapin_folio 280 static shmem.c:2279:12:shmem_swapin_folio 264 static Signed-off-by: Kairui Song --- mm/shmem.c | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 782162c0c4e0..646b1db9501c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2267,7 +2267,7 @@ static int shmem_split_large_entry(struct inode *inod= e, pgoff_t index, if (xas_error(&xas)) return xas_error(&xas); =20 - return entry_order; + return 0; } =20 /* @@ -2288,7 +2288,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, struct swap_info_struct *si; struct folio *folio =3D NULL; bool skip_swapcache =3D false; - int error, nr_pages, order, split_order; + int error, nr_pages, order; pgoff_t offset; =20 VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); @@ -2336,8 +2336,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, folio =3D NULL; goto failed; } else { - if (folio_test_large(folio)) - swap =3D index_entry; skip_swapcache =3D true; } } else { @@ -2349,6 +2347,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, } } } + if (order > folio_order(folio)) { /* * Swapin may get smaller folios due to various reasons: @@ -2358,23 +2357,25 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, * large swap entries. In such cases, we should split the * large swap entry to prevent possible data corruption. */ - split_order =3D shmem_split_large_entry(inode, index, index_entry, gfp); - if (split_order < 0) { - error =3D split_order; + error =3D shmem_split_large_entry(inode, index, index_entry, gfp); + if (error) goto failed_nolock; - } + } =20 - /* - * If the large swap entry has already been split, it is - * necessary to recalculate the new swap entry based on - * the old order alignment. - */ - if (split_order > 0) { - offset =3D index - round_down(index, 1 << split_order); - swap =3D swp_entry(swp_type(swap), swp_offset(swap) + offset); - } - } else if (order < folio_order(folio)) { - swap.val =3D round_down(swap.val, 1 << folio_order(folio)); + /* + * If the folio is large, round down swap and index by folio size. + * No matter what race occurs, the swap layer ensures we either get + * a valid folio that has its swap entry aligned by size, or a + * temporarily invalid one which we'll abort very soon and retry. + * + * shmem_add_to_page_cache ensures the whole range contains expected + * entries and prevents any corruption, so any race split is fine + * too, it will succeed as long as the entries are still there. + */ + nr_pages =3D folio_nr_pages(folio); + if (nr_pages > 1) { + swap.val =3D round_down(swap.val, nr_pages); + index =3D round_down(index, nr_pages); } =20 /* We have to do this with folio locked to prevent races */ @@ -2389,7 +2390,6 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, goto failed; } folio_wait_writeback(folio); - nr_pages =3D folio_nr_pages(folio); =20 /* * Some architectures may have to restore extra metadata to the @@ -2403,8 +2403,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, goto failed; } =20 - error =3D shmem_add_to_page_cache(folio, mapping, - round_down(index, nr_pages), + error =3D shmem_add_to_page_cache(folio, mapping, index, swp_to_radix_entry(swap), gfp); if (error) goto failed; --=20 2.50.0 From nobody Tue Oct 7 21:33:36 2025 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E36C25DB12 for ; Fri, 4 Jul 2025 18:18:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653126; cv=none; b=KRqxqolPq4xJ0ejImsNhYEiWDVrJtLAS4wCgQy5ICsWmq4Ng8a+5Km1PTgSLn7RgoU4L/NkrVmUdmFNCO7WxzpLV38CZknMNQUrV2klHR33dZ5gDDRXu98e8hliRJue5GbOhD9QrPWvcgDz0t6iVMXKt24f3Jd0oz8W6Goox0Wk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751653126; c=relaxed/simple; bh=D4bHcWdI5D695XmC0DcbE5uLfT/yv66WjyIRRzNBIF0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q17xUeO7cqrQiUARbt3Bw8RmSFQZZ0Fwq68Ig2RLxMp7K9IO1OKrcYFlpUSpNSEmaF71ireiHPbTVg9dnFGSVzpRlyWaJdDhPIQE0LADg3ykZvxFC0BCsryLlxhXzrweiGU4ylB/O+6fLZpO9E+u5AJ0Od25cmQPXxAHQ4wSPr4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gVAJE5ne; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gVAJE5ne" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7d467a1d9e4so230566385a.0 for ; Fri, 04 Jul 2025 11:18:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653124; x=1752257924; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=SySL2jmSZf+IwbPGxTWXoVXdX1jiyaz28t5V3h6OpcI=; b=gVAJE5neEsFUVX12psefWzd6DXHbsiMq0u6eT3q04ZbFYPieNPEvj1cd0Sqx93IC0J m72x73u4g7l5foEmlEx+ePOSo66DFL6ycsG1H/+4koKGv0z6rVswKyIc+quwDpjU7FsO zv8q01lEHwqt8m5T+c6xEMbqzeyHvlSqmq4clF0IVwBYf/r5AU2+yVqKMfI8egL/B39p kbny5a9IoMxFvN0Lth6uv1SSKujPpL3iF/StB8POup4jlU0kkeQi4Gi5KXAZBvgyQAiJ 08cDbqyYeeR3Uig8D+wjY2elpfnF/OwTyd5mXYWoxmo37bc/gC+XluEQUzFtxE66pRB7 1JDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653124; x=1752257924; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=SySL2jmSZf+IwbPGxTWXoVXdX1jiyaz28t5V3h6OpcI=; b=WIdJ5dd+BfveTZTMy8gfsJHciVCDzFaSaEvDdcAXLuJb6NjO2CGcQjrpN/V5uzTop9 Fh6aMZ5jHgufdsr5pBgDOp0yu5HvDnZTcZC1ZTDia+2Wiw+suLlawehWSBGa7n/pfcO/ 3CFbBZCUyBP/HDL71Fh1TuOiETWgyde3oGnbRweHsZJ30ARBdBXPop3wOiPkH8UiNZy8 ljhUH3551OvANZlifrDqblW/QqMC/wpixt8e9E4S7BkD0qF1lI7j9sp6HkJIeBm1pmDH H7LPSCE168l5WfRQE8Vyw1si1AGi26LnkcEAepSfKLBf1uWz7L5wA4b5AAMghiHog1fV 5b8A== X-Forwarded-Encrypted: i=1; AJvYcCW0xag26lR/Lr9HYoPaozm3bVW0S7Rq+bmjgp17m5932yg341vIDZedE6Et3REv3s2qMAqxA3BArhBAUTM=@vger.kernel.org X-Gm-Message-State: AOJu0YyvWb7/RsbSuOSgsggdCelV2qHem30mIbOpKdeEvvgKdvQReP62 bvZSPESeuakG4C3tytKgrMS+XBgTvn/E3+vLUveQcJ9xhTEpd1bcEMR5 X-Gm-Gg: ASbGnct7xGGClgrje3ked+u2su3gCWMilyKoyfogiY4zTtvOtUVN2AZBD0+06tblKUn 1xmcpFH58w5vDOf6awXVuutr4zkDA9yHwD3L9D8L6dqpj2tFJf8i4bd9NRPlGfUD9l7nukAxB8P yHupSptJgRCtvOP6Hb9usSLJAnu0sdo/qLY7kIKs63HE/Fjc9V3Pqlzm7TgjeWZ8yXsCqxzi9PV W6y/kYJ2o1UECrHyaNNjzONhSLrLHtZyop3Y/wFvVP6DTPGTE0s7jaEkNswAakCwtgj7ziV+xoZ SK9mQ5roZsH0eC6qEMHHbU2qNfKaDTqCTZa0NNRiIMKTdkadCS+wbmU+e5G969xEwSk= X-Google-Smtp-Source: AGHT+IGPsEESA1JToeYXhuEkFG+0ssxexxcX9oiBTKNbVP6yi9xfkhZnssssTwAiqhhq3LtkJSSg5Q== X-Received: by 2002:a05:620a:618b:b0:7d3:e56e:4fd8 with SMTP id af79cd13be357-7d5dc66a567mr493346285a.12.1751653124279; Fri, 04 Jul 2025 11:18:44 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.35 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:41 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 9/9] mm/shmem, swap: fix major fault counting Date: Sat, 5 Jul 2025 02:17:48 +0800 Message-ID: <20250704181748.63181-10-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kairui Song If the swapin failed, don't update the major fault count. There is a long existing comment for doing it this way, now with previous cleanups, we can finally fix it. Signed-off-by: Kairui Song Reviewed-by: Baolin Wang --- mm/shmem.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 646b1db9501c..b03b5bf2df38 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2321,12 +2321,6 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, /* Look it up and read it in.. */ folio =3D swap_cache_get_folio(swap, NULL, 0); if (!folio) { - /* Or update major stats only when swapin succeeds?? */ - if (fault_type) { - *fault_type |=3D VM_FAULT_MAJOR; - count_vm_event(PGMAJFAULT); - count_memcg_event_mm(fault_mm, PGMAJFAULT); - } if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { /* Direct mTHP swapin skipping swap cache & readhaed */ folio =3D shmem_swapin_direct(inode, vma, index, swap, @@ -2346,6 +2340,11 @@ static int shmem_swapin_folio(struct inode *inode, p= goff_t index, goto failed; } } + if (fault_type) { + *fault_type |=3D VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(fault_mm, PGMAJFAULT); + } } =20 if (order > folio_order(folio)) { --=20 2.50.0