From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92E9F1B425C for ; Wed, 29 Jan 2025 11:54:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151663; cv=none; b=gqwC6875KmeZk1710oJuzzgn+WFudkWkrCz8Zhe44w/RlfSmotitpcgFFUm5Kki9y/evEQoZEZfYOzJyQW+m0Sr+d9E2l3LZMsMu0KtqZ1JmRYzKAz3jdRD/qOVF8fcDsXsxf0vUnG7qhceQzAszctNeu0vN6sQb1v7kbewpzwc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151663; c=relaxed/simple; bh=PG/xepVkoE6TFwAOUblN1JhI4fKfhg5uSEiV2p9fFEc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Oihk6xy9k2GPty/UtcIjvVE5XDLiO+Ut44XdwaVwjbGRWnlfqSz1uyn2AqEJIJ9gaaPtd8iRTqfwoVeyVipFkf86VBf8mjrxDPTUi9f9utEPvXFr40PYrPNJc0rQ0XP0tMn1RJHo/f1eVA1YzTgyII+jW4dtSLN4iIOTOdhe08k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CPNZ8zRb; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CPNZ8zRb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151659; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hZQr+sWP4JJuuRk9zHd8B8Ps+AYS2S2IVxxQCmOGAGE=; b=CPNZ8zRbb8uLWAnCXZTLRWjjPtPmGi4mtQaeeZvrfI+TY80P4RFVUAUpUly50IvY36XEi4 X0NhOPZwxEFEs42/aWPriPDiyXW6c0lfh2W+pus06vC8JE0PQNh7Sm5MCpxEeLHvcOz2be v/XLl0vVEdHVcF1gtk0UysAwr/04Dfs= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-624-gYgKGehAMGOp86VFlpbqrw-1; Wed, 29 Jan 2025 06:54:18 -0500 X-MC-Unique: gYgKGehAMGOp86VFlpbqrw-1 X-Mimecast-MFC-AGG-ID: gYgKGehAMGOp86VFlpbqrw Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-4362153dcd6so34330605e9.2 for ; Wed, 29 Jan 2025 03:54:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151657; x=1738756457; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hZQr+sWP4JJuuRk9zHd8B8Ps+AYS2S2IVxxQCmOGAGE=; b=gIP1aKxBY1ojSYHFNuhwDaVdIaauGcpYEI9rvRFzxfcKhRat80PeCF3kOuQimUQiXM iFKCtRyfzZ7usy5s1q2doNJzqHNQ4XCa2QGKVLLbPYJABZdHWMzxOcychtAP274UN0iC AGe2sylaBra7aLGR+vSCSrUMDda9cKC6DY7mgJrgabvEmn3Yh9XccbtYjxanfg1NZ/WY nOA60t/JSc+W5p45cwSyxEWmYSpCP88QHD1ehih5O22GCaSW1Tb9+yMEvAgnIiIsg5vU R0cGW+ERvrCg9EoqJE6RJIXt21Tcn7GqESl/e3aLhIAQDhjrZZuJq5GAc1xIuGefuZuI k48Q== X-Gm-Message-State: AOJu0YzcPDLz15p9ozix+irBw0SiznaCef2WUMzae/vIJBmRsw0vVv4N /b1jOU+gCrvmctVav3t9MAazhwK+7HHu2DOeJQOmcZvsrPiZBaUPV5vF/7cNHbqiYiZhrJJUnTM RwpLaS9CZ+CScslVgVcEIGcJ7FRRQPFf0B6Gkr/7fW9ep3NioLTOeLgdp9ZCI4W1ACa7gQ3x4EQ JGFTeUErAGPmau/80ppKy0Ui5tvXM+SGboad3A4+kkNmBq X-Gm-Gg: ASbGncvQtQ6/WDakemwuoZuxy0QCHFYAWZviyNLpFxiYGAPiDJsgAyMGcU8zeSP14aT fUdNEE967RNc1Ok9nmBRYva1R7+6YBUS3FRRgeqgF3zHMwgHqhDiZu4ZnyE8EI9fTpu9FOwWpNF qw7wJXZTFkkCeFDKsEeXxkgZeQD9mPx+RlRHdx9bbTMOkC0xdt//ELuEv4If3N8Rx6q3IyCPtKa 9Hqf89ianhYA4xFq2JslVVls6/icWBdoQRsulWy5kVtl1JYtGgwNjkU0PWrzMbXr1qLfQ++Ry4R dnwN5S1BE+3hvTiIrsaEuebQNAJjSQlXaOHxP2zLe9EKrWFGYHwKVuvySy1/AMcHFg== X-Received: by 2002:a05:600c:cce:b0:434:ff9d:a370 with SMTP id 5b1f17b1804b1-438dc3366f8mr24658615e9.0.1738151657170; Wed, 29 Jan 2025 03:54:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IEIkSHUg8u1CZ6d8yOIa8EuB96MLk/b/TKJcDJCOFvAxggFPdf4ivg/UC1RFEcPGInF41YvMw== X-Received: by 2002:a05:600c:cce:b0:434:ff9d:a370 with SMTP id 5b1f17b1804b1-438dc3366f8mr24658175e9.0.1738151656750; Wed, 29 Jan 2025 03:54:16 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a1bad92sm16868229f8f.61.2025.01.29.03.54.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:15 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , stable@vger.kernel.org Subject: [PATCH v1 01/12] mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs Date: Wed, 29 Jan 2025 12:53:59 +0100 Message-ID: <20250129115411.2077152-2-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We only have two FOLL_SPLIT_PMD users. While uprobe refuses hugetlb early, make_device_exclusive_range() can end up getting called on hugetlb VMAs. Right now, this means that with a PMD-sized hugetlb page, we can end up calling split_huge_pmd(), because pmd_trans_huge() also succeeds with hugetlb PMDs. For example, using a modified hmm-test selftest one can trigger: [ 207.017134][T14945] ------------[ cut here ]------------ [ 207.018614][T14945] kernel BUG at mm/page_table_check.c:87! [ 207.019716][T14945] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NO= PTI [ 207.021072][T14945] CPU: 3 UID: 0 PID: ... [ 207.023036][T14945] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), = BIOS 1.16.3-2.fc40 04/01/2014 [ 207.024834][T14945] RIP: 0010:page_table_check_clear.part.0+0x488/0x510 [ 207.026128][T14945] Code: ... [ 207.029965][T14945] RSP: 0018:ffffc9000cb8f348 EFLAGS: 00010293 [ 207.031139][T14945] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: fff= fffff8249a0cd [ 207.032649][T14945] RDX: ffff88811e883c80 RSI: ffffffff8249a357 RDI: fff= f88811e883c80 [ 207.034183][T14945] RBP: ffff888105c0a050 R08: 0000000000000005 R09: 000= 0000000000000 [ 207.035688][T14945] R10: 00000000ffffffff R11: 0000000000000003 R12: 000= 0000000000001 [ 207.037203][T14945] R13: 0000000000000200 R14: 0000000000000001 R15: dff= ffc0000000000 [ 207.038711][T14945] FS: 00007f2783275740(0000) GS:ffff8881f4980000(0000= ) knlGS:0000000000000000 [ 207.040407][T14945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 207.041660][T14945] CR2: 00007f2782c00000 CR3: 0000000132356000 CR4: 000= 0000000750ef0 [ 207.043196][T14945] PKRU: 55555554 [ 207.043880][T14945] Call Trace: [ 207.044506][T14945] [ 207.045086][T14945] ? __die+0x51/0x92 [ 207.045864][T14945] ? die+0x29/0x50 [ 207.046596][T14945] ? do_trap+0x250/0x320 [ 207.047430][T14945] ? do_error_trap+0xe7/0x220 [ 207.048346][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.049535][T14945] ? handle_invalid_op+0x34/0x40 [ 207.050494][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.051681][T14945] ? exc_invalid_op+0x2e/0x50 [ 207.052589][T14945] ? asm_exc_invalid_op+0x1a/0x20 [ 207.053596][T14945] ? page_table_check_clear.part.0+0x1fd/0x510 [ 207.054790][T14945] ? page_table_check_clear.part.0+0x487/0x510 [ 207.055993][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.057195][T14945] ? page_table_check_clear.part.0+0x487/0x510 [ 207.058384][T14945] __page_table_check_pmd_clear+0x34b/0x5a0 [ 207.059524][T14945] ? __pfx___page_table_check_pmd_clear+0x10/0x10 [ 207.060775][T14945] ? __pfx___mutex_unlock_slowpath+0x10/0x10 [ 207.061940][T14945] ? __pfx___lock_acquire+0x10/0x10 [ 207.062967][T14945] pmdp_huge_clear_flush+0x279/0x360 [ 207.064024][T14945] split_huge_pmd_locked+0x82b/0x3750 ... Before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code"), we would have ignored the flag; instead, let's simply refuse the combination completely in check_vma_flags(): the caller is likely not prepared to handle any hugetlb folios. We'll teach make_device_exclusive_range() separately to ignore any hugetlb folios as a future-proof safety net. Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mas= k code") Cc: Signed-off-by: David Hildenbrand Reviewed-by: Alistair Popple Reviewed-by: John Hubbard --- mm/gup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index 3883b307780e..61e751baf862 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1283,6 +1283,9 @@ static int check_vma_flags(struct vm_area_struct *vma= , unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; =20 + if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma)) + return -EOPNOTSUPP; + if (vma_is_secretmem(vma)) return -EFAULT; =20 --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C47291DC747 for ; Wed, 29 Jan 2025 11:54:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151666; cv=none; b=GzjP+Yq3BbHaVrBIgFRKvW22DcHgltkx1d5GfNjJ+fzoghoRap5lrsqV5+fc4uFPA9tDKLq0fNKnpItud8sOSz+W8l46EoqfHWxyCICtlI0Oj6LS/50Z/HV+QdO+c8xs9YtVBbcGzJu6Ft2KcY9pgPnh31fnhdrU/2O6X7Rmu7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151666; c=relaxed/simple; bh=dVjhgUnbY5S2QZp19159xKSfA9DylzeqzqhG6tjWiv4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NMb7jFHCf3mco4qWeIjDH9KzqbZLp1vKI6IbscsrVymkza0cOKVnEzWJHA3icvdcDmxwX1w2+/9PtVZuZ3tDO8P5NzyVAU89HaH5VKVFwvp1GBWxq8lGm11G5s8LWQJojCylq1CyqqQkkmGgZbmuHa9yjt4dBIIgC2c4myECHt0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dgtXkhMd; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dgtXkhMd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151663; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=herznK9N3+37fO8/yUnjjPI+kU1qrhYMSQrHDNQrbI8=; b=dgtXkhMd1JRBkDaU3GOlEzqu8vKYgBC3QQsXctI799RtnlDx5F3TU4BZPf8kEsXkbDFHfT Ib76mgNaq00yAHpMDEnmkVmfhOUncop4HzmAJrYMhfgGcnaceBpbn33G1B1LYMclSdpPWs 2oLzEgH/aa7zbLYfjyCLVMlsNKw+pNk= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-670-kV3bkmcPNsOpqyF9ve56Rw-1; Wed, 29 Jan 2025 06:54:22 -0500 X-MC-Unique: kV3bkmcPNsOpqyF9ve56Rw-1 X-Mimecast-MFC-AGG-ID: kV3bkmcPNsOpqyF9ve56Rw Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-4362f893bfaso36483525e9.1 for ; Wed, 29 Jan 2025 03:54:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151661; x=1738756461; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=herznK9N3+37fO8/yUnjjPI+kU1qrhYMSQrHDNQrbI8=; b=DIvRQbSCUUjaMQfPCmLoasfvB24HptrKaXyeX2JzR+6bCRvpN+G+HA6UVWMOuUMZBu WxqiXBJHx9+n+nRC5+Z7d+ktePyHgpCz59dkcZUp5cI2E6Oh+CAUH2LmKQsQN+9f9wXy oXqUVXyfYTuL9ueNDRPG6dipEow3AY2FLsZHopQlsoq4lqCu8CfQr+VXPgd6OeCcTFAf yre1UUHUyOOc86GCtQ7GKqmqvspx6WqomNgJjBSbhAREoFoUvUQqYFas4d459XFAzM16 BKHL8ZQcCq7O64/gOW/Y1c7v/tKP7nq6Tjzr5A2E9PfXq1ebyL5sg2FTyP9jPiP6Aj7l M+ug== X-Gm-Message-State: AOJu0YwFw0JFlpapxebED9LDGDa58LOfoMe1uQRDtPT1hfB1bxAGAF95 lncjEc0gMJFSHACPvMyKPc1+sm0ItEDM+xGZ8ZFSKF7iku4dzQfv9C9tvJ4VK8saxBE/b+oYa40 hX31/J2Q8MamZ4AhHagLqx8pUuKVB2kq5HhhLW8YHo6kdVbVcgk07Wl2XeTNkOLEAWXVYMHdrGS V3n/dXcRaW6MZVa9HQG/SbK8Yen4w3j6+7NSOjDgMID7PP X-Gm-Gg: ASbGncvCCCmkX66yuqf/ozrGPSwadLVpt4092w817ydKXPvuK9t3aWxo47M983ZQ1YC TWt6I76OdDclVspc0DXSaiB/+XbCkjq3uGIiZlA5NeIau4DnhYnVeHi63HnoUtihdeoEh1ovOnt VqCnCXOsIoJuF3tg0FVoF490O17PVZSHYNHXRPkmQ6AxLXsPS7+oJ3pPI77YEJt4I+lnJQXux4A FovfHaWHK8llrE2Hsgla43BMpHDsH6jTDEq4jJVSLVfBDpLhsO4uC0ILX8fRi8aj16BRHoDXDoD e4f+jljFmUTH4Wk/hlHjVXvhYL+YJgr8jyJXwBobJXnhTxOU8xxSP4rJguE0dMnX7A== X-Received: by 2002:a05:600c:5486:b0:433:c76d:d57e with SMTP id 5b1f17b1804b1-438dc3a40d3mr26190745e9.5.1738151660818; Wed, 29 Jan 2025 03:54:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IEopxoArzHie066mjWjUvNKTSc/02V79PTjUrPj2QcjbwdMpR1m12X2+qvEI4pAjYXanIEeWA== X-Received: by 2002:a05:600c:5486:b0:433:c76d:d57e with SMTP id 5b1f17b1804b1-438dc3a40d3mr26190115e9.5.1738151660040; Wed, 29 Jan 2025 03:54:20 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-438dcbbc52dsm21427725e9.0.2025.01.29.03.54.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:18 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , stable@vger.kernel.org Subject: [PATCH v1 02/12] mm/rmap: reject hugetlb folios in folio_make_device_exclusive() Date: Wed, 29 Jan 2025 12:54:00 +0100 Message-ID: <20250129115411.2077152-3-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP, let's add a safety net in case FOLL_SPLIT_PMD usage would ever be reworked. In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have returned a page. In particular, hugetlb folios that are not PMD-sized would never have been prone to FOLL_SPLIT_PMD. hugetlb folios can be anonymous, and page_make_device_exclusive_one() is not really prepared for handling them at all. So let's spell that out. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Cc: Signed-off-by: David Hildenbrand Reviewed-by: Alistair Popple --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7..17fbfa61f7ef 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(struct folio = *folio, * Restrict to anonymous folios for now to avoid potential writeback * issues. */ - if (!folio_test_anon(folio)) + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) return false; =20 rmap_walk(folio, &rwc); --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C92501DDC3C for ; Wed, 29 Jan 2025 11:54:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151669; cv=none; b=Kt/rUIoo2okYp5KZcJWgNo8WT/V+24cQCy3ank5ZTZivvVsj8aCanLTZ2V7n/Mqet9mpXZNJzSPdzF91UBZZNfFXR1AQb3ogPrksKGa/mgxv6OF7huYCXtb80kvfZp0NwNapTQ8sMAv6eMWf5g8ICtbNiwDKr+tlq28m4dm7b1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151669; c=relaxed/simple; bh=nB56LEt4BweD7fZ7V8P6ZzKJLqYCpn9GIE9wy+WIsGg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kniv22I72Dj2TvEMyh9gOFUkppKreyr2UyRaN5L7e1mJyFdEIhtffPXW5g8iyILbPNBhJVSqI910R1zZVtwbmAAvELmQhOms2YB0wkVnZ6e7q0JSgmihkDYEHa6H6IltBTpPxZAp3vw4D7cc5QTd2MT306GvRwne2NrQMLflV5E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=C3uPokhG; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="C3uPokhG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FbpRkzZA4TilYwHsov5kO/PS96AeznJfw9pJaA8wpMg=; b=C3uPokhGbooATzhNYjT+Xe/6yxpR28Cj01EalBy0KQ0p/Lj4QRqsZeUj6kttTM6hevu5l6 GG330j+Kfpiu5IZZJ2dnmYh52WyynMF2J35t/Bzl3aidQvx+gvxICCpJqrvEYhRx1wAQ9W kQc2oqMK9azLnoTukN+sY0hy9Avaq8o= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-688-V9nGS0hmNoyfTZaccFLJNg-1; Wed, 29 Jan 2025 06:54:25 -0500 X-MC-Unique: V9nGS0hmNoyfTZaccFLJNg-1 X-Mimecast-MFC-AGG-ID: V9nGS0hmNoyfTZaccFLJNg Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38bee9ae3b7so4420932f8f.1 for ; Wed, 29 Jan 2025 03:54:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151664; x=1738756464; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FbpRkzZA4TilYwHsov5kO/PS96AeznJfw9pJaA8wpMg=; b=mL4KlDYDe1bGBR5uyerTJAzR1n9bZk9GgIEq65JutjXdsAhtlh1bVDQl4oUlkgRH4R LpEA+lRoCcugUU+HQin8TSz5zAaBvzDxIINe1OUtDeY/gvOQPR/GriVpZsFps/fdf7Tb qU9xfk4HeD5/+QQczPRGrqJ+VlVyxneXSkff/GBjLKlAmZ6Txc/p6s21ZSaCjckhIb9f CjLmh9fWqvkWKpv766/dwM3lgrBoQ45qRrVH6EBWSdGga83yQJ1RM+pe/5aoy0sOZlNt +0a7dFGIVE5SgRpXk7K9VgwAcHtbDBo5xGvS8tuXCB2YQt6Ub9+Lxe01qSb/S4kbwXYH /oiA== X-Gm-Message-State: AOJu0Yynn2ivJCJGHkFMO2n+rthKQUBuGIQzX/YDvC8UTO9EPfCgkUa8 q+AxfcuFVnDIu1oJm9wCly2TV6ZpiXEMHKnwSWsZ/H158YT3beo8yZKKYNar1gsNymPxLj1nig4 7KmUBHLjANvcZRvxoir1XSndaNeVnE3tjXxT7XeWPdZkSflTC40Pc+MwpkUzhk89x823nXIhkb8 07YT1tArOCtmfBt7AyowFb6k20vtKwsuC+flw/EW/G+UDe X-Gm-Gg: ASbGncuj5CE63HO19eDevO8U2Dc13Uvj+A9dz83voiv10TuNseSkZpnB8wJI1PJtDju 7xE53b4r85laiBDpb+h2/iq3xom4z4GVfzkXJaul9GLh09FX6F9ubZl8OsElD2DPbzZ2bRDYJD4 iki8QF5uMrLJckhQAuHz3OuqTzBCdqj/u5cqtofKh70+ymwcbiriuvVoSZMqaSNvGp2Eksaovq+ NgVAiagKxRT3cFBglsq/E0M/+CeWze3VqhdReFzXk08FUge8Ut45HTLljpdNI60fdpPg9SrGJNK GdhlNq8irC5ByDon1/pNB6Ux13RsJY25IaW8LjbQNZaDBqBxhphBTo24ulbsNc/PMQ== X-Received: by 2002:a5d:64c3:0:b0:385:e5d8:2bea with SMTP id ffacd0b85a97d-38c519460aemr2346447f8f.20.1738151663899; Wed, 29 Jan 2025 03:54:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IECXzKNlO/KxhDgtZSrbOQmu7i91GFToVYgJKpoIoUvo8DlvKFDTSa6/lsKfWQw3fetIFbvYQ== X-Received: by 2002:a5d:64c3:0:b0:385:e5d8:2bea with SMTP id ffacd0b85a97d-38c519460aemr2346391f8f.20.1738151663351; Wed, 29 Jan 2025 03:54:23 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-438dcc27130sm20111455e9.16.2025.01.29.03.54.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:22 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 03/12] mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() Date: Wed, 29 Jan 2025 12:54:01 +0100 Message-ID: <20250129115411.2077152-4-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The single "real" user in the tree of make_device_exclusive_range() always requests making only a single address exclusive. The current implementation is hard to fix for properly supporting anonymous THP / large folios and for avoiding messing with rmap walks in weird ways. So let's always process a single address/page and return folio + page to minimize page -> folio lookups. This is a preparation for further changes. Reject any non-anonymous or hugetlb folios early, directly after GUP. Signed-off-by: David Hildenbrand Acked-by: Simona Vetter Reviewed-by: Alistair Popple --- Documentation/mm/hmm.rst | 2 +- Documentation/translations/zh_CN/mm/hmm.rst | 2 +- drivers/gpu/drm/nouveau/nouveau_svm.c | 5 +- include/linux/mmu_notifier.h | 2 +- include/linux/rmap.h | 5 +- lib/test_hmm.c | 45 +++++------ mm/rmap.c | 90 +++++++++++---------- 7 files changed, 75 insertions(+), 76 deletions(-) diff --git a/Documentation/mm/hmm.rst b/Documentation/mm/hmm.rst index f6d53c37a2ca..7d61b7a8b65b 100644 --- a/Documentation/mm/hmm.rst +++ b/Documentation/mm/hmm.rst @@ -400,7 +400,7 @@ Exclusive access memory Some devices have features such as atomic PTE bits that can be used to imp= lement atomic access to system memory. To support atomic operations to a shared v= irtual memory page such a device needs access to that page which is exclusive of = any -userspace access from the CPU. The ``make_device_exclusive_range()`` funct= ion +userspace access from the CPU. The ``make_device_exclusive()`` function can be used to make a memory range inaccessible from userspace. =20 This replaces all mappings for pages in the given range with special swap diff --git a/Documentation/translations/zh_CN/mm/hmm.rst b/Documentation/tr= anslations/zh_CN/mm/hmm.rst index 0669f947d0bc..22c210f4e94f 100644 --- a/Documentation/translations/zh_CN/mm/hmm.rst +++ b/Documentation/translations/zh_CN/mm/hmm.rst @@ -326,7 +326,7 @@ devm_memunmap_pages() =E5=92=8C devm_release_mem_region= () =E5=BD=93=E8=B5=84=E6=BA=90=E5=8F=AF=E4=BB=A5=E7=BB=91=E5=AE=9A=E5=88=B0= ``s =20 =E4=B8=80=E4=BA=9B=E8=AE=BE=E5=A4=87=E5=85=B7=E6=9C=89=E8=AF=B8=E5=A6=82= =E5=8E=9F=E5=AD=90PTE=E4=BD=8D=E7=9A=84=E5=8A=9F=E8=83=BD=EF=BC=8C=E5=8F=AF= =E4=BB=A5=E7=94=A8=E6=9D=A5=E5=AE=9E=E7=8E=B0=E5=AF=B9=E7=B3=BB=E7=BB=9F=E5= =86=85=E5=AD=98=E7=9A=84=E5=8E=9F=E5=AD=90=E8=AE=BF=E9=97=AE=E3=80=82=E4=B8= =BA=E4=BA=86=E6=94=AF=E6=8C=81=E5=AF=B9=E4=B8=80 =E4=B8=AA=E5=85=B1=E4=BA=AB=E7=9A=84=E8=99=9A=E6=8B=9F=E5=86=85=E5=AD=98= =E9=A1=B5=E7=9A=84=E5=8E=9F=E5=AD=90=E6=93=8D=E4=BD=9C=EF=BC=8C=E8=BF=99=E6= =A0=B7=E7=9A=84=E8=AE=BE=E5=A4=87=E9=9C=80=E8=A6=81=E5=AF=B9=E8=AF=A5=E9=A1= =B5=E7=9A=84=E8=AE=BF=E9=97=AE=E6=98=AF=E6=8E=92=E4=BB=96=E7=9A=84=EF=BC=8C= =E8=80=8C=E4=B8=8D=E6=98=AF=E6=9D=A5=E8=87=AACPU -=E7=9A=84=E4=BB=BB=E4=BD=95=E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF= =E9=97=AE=E3=80=82 ``make_device_exclusive_range()`` =E5=87=BD=E6=95=B0=E5= =8F=AF=E4=BB=A5=E7=94=A8=E6=9D=A5=E4=BD=BF=E4=B8=80 +=E7=9A=84=E4=BB=BB=E4=BD=95=E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF= =E9=97=AE=E3=80=82 ``make_device_exclusive()`` =E5=87=BD=E6=95=B0=E5=8F=AF= =E4=BB=A5=E7=94=A8=E6=9D=A5=E4=BD=BF=E4=B8=80 =E4=B8=AA=E5=86=85=E5=AD=98=E8=8C=83=E5=9B=B4=E4=B8=8D=E8=83=BD=E4=BB=8E= =E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF=E9=97=AE=E3=80=82 =20 =E8=BF=99=E5=B0=86=E7=94=A8=E7=89=B9=E6=AE=8A=E7=9A=84=E4=BA=A4=E6=8D=A2= =E6=9D=A1=E7=9B=AE=E6=9B=BF=E6=8D=A2=E7=BB=99=E5=AE=9A=E8=8C=83=E5=9B=B4=E5= =86=85=E7=9A=84=E6=89=80=E6=9C=89=E9=A1=B5=E7=9A=84=E6=98=A0=E5=B0=84=E3=80= =82=E4=BB=BB=E4=BD=95=E8=AF=95=E5=9B=BE=E8=AE=BF=E9=97=AE=E4=BA=A4=E6=8D=A2= =E6=9D=A1=E7=9B=AE=E7=9A=84=E8=A1=8C=E4=B8=BA=E9=83=BD=E4=BC=9A diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouvea= u/nouveau_svm.c index b4da82ddbb6b..39e3740980bb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -609,10 +609,9 @@ static int nouveau_atomic_range_fault(struct nouveau_s= vmm *svmm, =20 notifier_seq =3D mmu_interval_read_begin(¬ifier->notifier); mmap_read_lock(mm); - ret =3D make_device_exclusive_range(mm, start, start + PAGE_SIZE, - &page, drm->dev); + page =3D make_device_exclusive(mm, start, drm->dev, &folio); mmap_read_unlock(mm); - if (ret <=3D 0 || !page) { + if (IS_ERR(page)) { ret =3D -EINVAL; goto out; } diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index e2dd57ca368b..d4e714661826 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -46,7 +46,7 @@ struct mmu_interval_notifier; * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no * longer have exclusive access to the page. When sent during creation of = an * exclusive range the owner will be initialised to the value provided by = the - * caller of make_device_exclusive_range(), otherwise the owner will be NU= LL. + * caller of make_device_exclusive(), otherwise the owner will be NULL. */ enum mmu_notifier_event { MMU_NOTIFY_UNMAP =3D 0, diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 683a04088f3f..86425d42c1a9 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -663,9 +663,8 @@ int folio_referenced(struct folio *, int is_locked, void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); =20 -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *arg); +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, + void *owner, struct folio **foliop); =20 /* Avoid racy checks */ #define PVMW_SYNC (1 << 0) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 056f2e411d7b..9e1b07a227a3 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -780,10 +780,8 @@ static int dmirror_exclusive(struct dmirror *dmirror, unsigned long start, end, addr; unsigned long size =3D cmd->npages << PAGE_SHIFT; struct mm_struct *mm =3D dmirror->notifier.mm; - struct page *pages[64]; struct dmirror_bounce bounce; - unsigned long next; - int ret; + int ret =3D 0; =20 start =3D cmd->addr; end =3D start + size; @@ -795,36 +793,31 @@ static int dmirror_exclusive(struct dmirror *dmirror, return -EINVAL; =20 mmap_read_lock(mm); - for (addr =3D start; addr < end; addr =3D next) { - unsigned long mapped =3D 0; - int i; - - next =3D min(end, addr + (ARRAY_SIZE(pages) << PAGE_SHIFT)); + for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { + struct folio *folio; + struct page *page; =20 - ret =3D make_device_exclusive_range(mm, addr, next, pages, NULL); - /* - * Do dmirror_atomic_map() iff all pages are marked for - * exclusive access to avoid accessing uninitialized - * fields of pages. - */ - if (ret =3D=3D (next - addr) >> PAGE_SHIFT) - mapped =3D dmirror_atomic_map(addr, next, pages, dmirror); - for (i =3D 0; i < ret; i++) { - if (pages[i]) { - unlock_page(pages[i]); - put_page(pages[i]); - } + page =3D make_device_exclusive(mm, addr, &folio, NULL); + if (IS_ERR(page)) { + ret =3D PTR_ERR(page); + break; } =20 - if (addr + (mapped << PAGE_SHIFT) < next) { - mmap_read_unlock(mm); - mmput(mm); - return -EBUSY; - } + ret =3D dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror); + if (!ret) + ret =3D -EBUSY; + folio_unlock(folio); + folio_put(folio); + + if (ret) + break; } mmap_read_unlock(mm); mmput(mm); =20 + if (ret) + return -EBUSY; + /* Return the migrated data for verification. */ ret =3D dmirror_bounce_init(&bounce, start, size); if (ret) diff --git a/mm/rmap.c b/mm/rmap.c index 17fbfa61f7ef..676df4fba5b0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2495,70 +2495,78 @@ static bool folio_make_device_exclusive(struct foli= o *folio, .arg =3D &args, }; =20 - /* - * Restrict to anonymous folios for now to avoid potential writeback - * issues. - */ - if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) - return false; - rmap_walk(folio, &rwc); =20 return args.valid && !folio_mapcount(folio); } =20 /** - * make_device_exclusive_range() - Mark a range for exclusive use by a dev= ice + * make_device_exclusive() - Mark an address for exclusive use by a device * @mm: mm_struct of associated target process - * @start: start of the region to mark for exclusive device access - * @end: end address of region - * @pages: returns the pages which were successfully marked for exclusive = access + * @addr: the virtual address to mark for exclusive device access * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering + * @foliop: folio pointer will be stored here on success. + * + * This function looks up the page mapped at the given address, grabs a + * folio reference, locks the folio and replaces the PTE with special + * device-exclusive non-swap entry, preventing userspace CPU access. The + * function will return with the folio locked and referenced. * - * Returns: number of pages found in the range by GUP. A page is marked for - * exclusive access only if the page pointer is non-NULL. + * On fault these special device-exclusive entries are replaced with the + * original PTE under folio lock, after calling MMU notifiers. * - * This function finds ptes mapping page(s) to the given address range, lo= cks - * them and replaces mappings with special swap entries preventing userspa= ce CPU - * access. On fault these entries are replaced with the original mapping a= fter - * calling MMU notifiers. + * Only anonymous non-hugetlb folios are supported and the VMA must have + * write permissions such that we can fault in the anonymous page writable + * in order to mark it exclusive. The caller must hold the mmap_lock in re= ad + * mode. * * A driver using this to program access from a device must use a mmu noti= fier * critical section to hold a device specific lock during programming. Once * programming is complete it should drop the page lock and reference after * which point CPU access to the page will revoke the exclusive access. + * + * Returns: pointer to mapped page on success, otherwise a negative error. */ -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *owner) +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, + void *owner, struct folio **foliop) { - long npages =3D (end - start) >> PAGE_SHIFT; - long i; + struct folio *folio; + struct page *page; + long npages; + + mmap_assert_locked(mm); =20 - npages =3D get_user_pages_remote(mm, start, npages, + /* + * Fault in the page writable and try to lock it; note that if the + * address would already be marked for exclusive use by the device, + * the GUP call would undo that first by triggering a fault. + */ + npages =3D get_user_pages_remote(mm, addr, 1, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - pages, NULL); - if (npages < 0) - return npages; - - for (i =3D 0; i < npages; i++, start +=3D PAGE_SIZE) { - struct folio *folio =3D page_folio(pages[i]); - if (PageTail(pages[i]) || !folio_trylock(folio)) { - folio_put(folio); - pages[i] =3D NULL; - continue; - } + &page, NULL); + if (npages !=3D 1) + return ERR_PTR(npages); + folio =3D page_folio(page); =20 - if (!folio_make_device_exclusive(folio, mm, start, owner)) { - folio_unlock(folio); - folio_put(folio); - pages[i] =3D NULL; - } + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { + folio_put(folio); + return ERR_PTR(-EOPNOTSUPP); + } + + if (!folio_trylock(folio)) { + folio_put(folio); + return ERR_PTR(-EBUSY); } =20 - return npages; + if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + folio_unlock(folio); + folio_put(folio); + return ERR_PTR(-EBUSY); + } + *foliop =3D folio; + return page; } -EXPORT_SYMBOL_GPL(make_device_exclusive_range); +EXPORT_SYMBOL_GPL(make_device_exclusive); #endif =20 void __put_anon_vma(struct anon_vma *anon_vma) --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C8CB1DE2C2 for ; Wed, 29 Jan 2025 11:54:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151672; cv=none; b=eVdz5im/aiFA+jt+sC9lBfi59o7EL6TK1C8MIpWrIyZDK1t9PogtttPCzA+FvWIYwoygweGHVvcjOcjEShhZfH+M7CmHlstv5lZPepY19k3DjziuLTKWZVCsZQp2wlf9YLApGtMYOOhDN7o0j5S76wdfhZ1D+lfYtCZz2PR6uJE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151672; c=relaxed/simple; bh=CyB+C07mnNjpUZ5lpuoD+B3M65Go6LhWO58CPNe15cs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QHv2Zcv4cuZeMdLYE4G9jtR5flNGBGWKPtti9afQxtLeUY6Qo/R9lDVABqfh+ajO1ScXVHcHrT+bOaIFAqDJPXl+58sIWgGh3iPMKOzj14Ts3nxrKuFThRNI1ehsgepyb0SCpGTeYPFRuuQOYsaD1jE0hQRPuihUTDli7TfRIf8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=g1LrhEnC; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g1LrhEnC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ttFA6KXnCKLshNnRCUvMdxdw/LdWUHfs73bnazzKqOM=; b=g1LrhEnC7uESk2oRlriQHgV0eXRvyEQiCDb9gBOsv/sT4MlS6MmSy/jY3vwVk/nKJPRp6T 3FDeUBdVAgNX9tUjyMOrygIJwsT7otQ0cCe7kXG0pn1gWlrhk7McCPDEzOZn6OTMvZb9w2 lrPO8zUUvPU/gM0uMMX/Dc3bSVoDvh0= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-513-WZBKG5l4OEW2KWpnqjM6iw-1; Wed, 29 Jan 2025 06:54:27 -0500 X-MC-Unique: WZBKG5l4OEW2KWpnqjM6iw-1 X-Mimecast-MFC-AGG-ID: WZBKG5l4OEW2KWpnqjM6iw Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-385fdff9db5so2367676f8f.0 for ; Wed, 29 Jan 2025 03:54:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151667; x=1738756467; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ttFA6KXnCKLshNnRCUvMdxdw/LdWUHfs73bnazzKqOM=; b=s/EK/gcwAw0Nw23RgVJrXBEgYgPMNyo8LWVn2Y+jUiUEioQCeAFQAh0ghkaV5xIeBU lzOYmWGHkPts3ctwqyBd0369ZAbkCbWPOd1GzEP4r9PwVr0N0mhkDkHXvImNld5w1o+2 wpVsJaulzcV4S4Gm6XmJkIEBMvcmFtxQu9yiR4xvuJNb9Y6IhzZlbOoyGnSQb4BECsqn gQ5B4zRX/E4ySsV64VLHbw01PR8uRf3ZMd79FPsC0YFhbeeIIloLPkh/i0weyuhaHWrU wI2VocrWCsXdnE+vybgQIuCPVoRaUkf+08oknlW8Ns4WXVIK6DNX+S1TFKJExIRG0eNr hHzA== X-Gm-Message-State: AOJu0YxzMg4UI8GjdfbREXcwfbk90lgk8F6fk/yeNIEY0ftfqfFjTNH+ fTgLsLjQOOWOipfuT46zE2dsJrY60igHbMIQGnq4gqL8xhDT/CXdGAjNEYfKX4UrCspX9jmt+Pc BHZpAzY9ydW/t+OX3rSZYBFBdcHA9L8cle9emPqwFpofBWg4SV3iUIz/5B+m1n/0UtfXWMIMERA EL3ZwbTQHuVfaydzfbYopvNFtoperR4dqFNV+93M1pXCRN X-Gm-Gg: ASbGnct84+0FOrMzaNaAqIqPYKPjySOd8h5q2DgYH4rwDKtzxxRlaIMpEZKkvp4W6JU UAm1lSNLXfzhbzsRxTZbZjVXWbYZYZ1Jn1ZPrsHpYf74DI5tJHuP2CnCwP9mwOcykvuUAMJZQKj KANN20ld5tIyf2V7lXCN+Kir4TW2pOByrEs0aoC07WRQxTVz3KvfDx80TJ0jBRlyu5tPluvWO1w z77v0REeZGs5XdMRbo4CGjz+ToQppnF+Gqpc/gpYQGaAhYxDjuKkW7Fl34khK4ETb1Wcvi5KYOG Nrp9wuviihyelVXCy0tqPUTlOL7Hidk8wNJprTTIRZdZSCYzyx+3KXhjQTBJdmWWFw== X-Received: by 2002:a05:6000:11c9:b0:38c:1270:f961 with SMTP id ffacd0b85a97d-38c520b7c7fmr2019794f8f.46.1738151666692; Wed, 29 Jan 2025 03:54:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IExNZpiTenD6CJ57erMOTnO5s4zsyHNUaMa/bCtN9QyTRZVkdeAROPCHCw+IAs2gC6Ff7tvTA== X-Received: by 2002:a05:6000:11c9:b0:38c:1270:f961 with SMTP id ffacd0b85a97d-38c520b7c7fmr2019743f8f.46.1738151666064; Wed, 29 Jan 2025 03:54:26 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a1bb0besm17079668f8f.79.2025.01.29.03.54.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:25 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk Date: Wed, 29 Jan 2025 12:54:02 +0100 Message-ID: <20250129115411.2077152-5-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We require a writable PTE and only support anonymous folio: we can only have exactly one PTE pointing at that page, which we can just lookup using a folio walk, avoiding the rmap walk and the anon VMA lock. So let's stop doing an rmap walk and perform a folio walk instead, so we can easily just modify a single PTE and avoid relying on rmap/mapcounts. We now effectively work on a single PTE instead of multiple PTEs of a large folio, allowing for conversion of individual PTEs from non-exclusive to device-exclusive -- note that the other way always worked on single PTEs. We can drop the MMU_NOTIFY_EXCLUSIVE MMU notifier call and document why that is not required: GUP will already take care of the MMU_NOTIFY_EXCLUSIVE call if required (there is already a device-exclusive entry) when not finding a present PTE and having to trigger a fault and ending up in remove_device_exclusive_entry(). Note that the PTE is always writable, and we can always create a writable-device-exclusive entry. With this change, device-exclusive is fully compatible with THPs / large folios. We still require PMD-sized THPs to get PTE-mapped, and supporting PMD-mapped THP (without the PTE-remapping) is a different endeavour that might not be worth it at this point. This gets rid of the "folio_mapcount()" usage and let's us fix ordinary rmap walks (migration/swapout) next. Spell out that messing with the mapcount is wrong and must be fixed. Signed-off-by: David Hildenbrand --- mm/rmap.c | 188 ++++++++++++++++-------------------------------------- 1 file changed, 55 insertions(+), 133 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 676df4fba5b0..49ffac6d27f8 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2375,131 +2375,6 @@ void try_to_migrate(struct folio *folio, enum ttu_f= lags flags) } =20 #ifdef CONFIG_DEVICE_PRIVATE -struct make_exclusive_args { - struct mm_struct *mm; - unsigned long address; - void *owner; - bool valid; -}; - -static bool page_make_device_exclusive_one(struct folio *folio, - struct vm_area_struct *vma, unsigned long address, void *priv) -{ - struct mm_struct *mm =3D vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - struct make_exclusive_args *args =3D priv; - pte_t pteval; - struct page *subpage; - bool ret =3D true; - struct mmu_notifier_range range; - swp_entry_t entry; - pte_t swp_pte; - pte_t ptent; - - mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, - vma->vm_mm, address, min(vma->vm_end, - address + folio_size(folio)), - args->owner); - mmu_notifier_invalidate_range_start(&range); - - while (page_vma_mapped_walk(&pvmw)) { - /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_FOLIO(!pvmw.pte, folio); - - ptent =3D ptep_get(pvmw.pte); - if (!pte_present(ptent)) { - ret =3D false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - subpage =3D folio_page(folio, - pte_pfn(ptent) - folio_pfn(folio)); - address =3D pvmw.address; - - /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(ptent)); - pteval =3D ptep_clear_flush(vma, address, pvmw.pte); - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - - /* - * Check that our target page is still mapped at the expected - * address. - */ - if (args->mm =3D=3D mm && args->address =3D=3D address && - pte_write(pteval)) - args->valid =3D true; - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - if (pte_write(pteval)) - entry =3D make_writable_device_exclusive_entry( - page_to_pfn(subpage)); - else - entry =3D make_readable_device_exclusive_entry( - page_to_pfn(subpage)); - swp_pte =3D swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - - set_pte_at(mm, address, pvmw.pte, swp_pte); - - /* - * There is a reference on the page for the swap entry which has - * been removed, so shouldn't take another. - */ - folio_remove_rmap_pte(folio, subpage, vma); - } - - mmu_notifier_invalidate_range_end(&range); - - return ret; -} - -/** - * folio_make_device_exclusive - Mark the folio exclusively owned by a dev= ice. - * @folio: The folio to replace page table entries for. - * @mm: The mm_struct where the folio is expected to be mapped. - * @address: Address where the folio is expected to be mapped. - * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier callbacks - * - * Tries to remove all the page table entries which are mapping this - * folio and replace them with special device exclusive swap entries to - * grant a device exclusive access to the folio. - * - * Context: Caller must hold the folio lock. - * Return: false if the page is still mapped, or if it could not be unmapp= ed - * from the expected address. Otherwise returns true (success). - */ -static bool folio_make_device_exclusive(struct folio *folio, - struct mm_struct *mm, unsigned long address, void *owner) -{ - struct make_exclusive_args args =3D { - .mm =3D mm, - .address =3D address, - .owner =3D owner, - .valid =3D false, - }; - struct rmap_walk_control rwc =3D { - .rmap_one =3D page_make_device_exclusive_one, - .done =3D folio_not_mapped, - .anon_lock =3D folio_lock_anon_vma_read, - .arg =3D &args, - }; - - rmap_walk(folio, &rwc); - - return args.valid && !folio_mapcount(folio); -} - /** * make_device_exclusive() - Mark an address for exclusive use by a device * @mm: mm_struct of associated target process @@ -2530,9 +2405,12 @@ static bool folio_make_device_exclusive(struct folio= *folio, struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, void *owner, struct folio **foliop) { - struct folio *folio; + struct folio *folio, *fw_folio; + struct vm_area_struct *vma; + struct folio_walk fw; struct page *page; - long npages; + swp_entry_t entry; + pte_t swp_pte; =20 mmap_assert_locked(mm); =20 @@ -2540,12 +2418,16 @@ struct page *make_device_exclusive(struct mm_struct= *mm, unsigned long addr, * Fault in the page writable and try to lock it; note that if the * address would already be marked for exclusive use by the device, * the GUP call would undo that first by triggering a fault. + * + * If any other device would already map this page exclusively, the + * fault will trigger a conversion to an ordinary + * (non-device-exclusive) PTE and issue a MMU_NOTIFY_EXCLUSIVE. */ - npages =3D get_user_pages_remote(mm, addr, 1, - FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - &page, NULL); - if (npages !=3D 1) - return ERR_PTR(npages); + page =3D get_user_page_vma_remote(mm, addr, + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, + &vma); + if (IS_ERR(page)) + return page; folio =3D page_folio(page); =20 if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { @@ -2558,11 +2440,51 @@ struct page *make_device_exclusive(struct mm_struct= *mm, unsigned long addr, return ERR_PTR(-EBUSY); } =20 - if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + /* + * Let's do a second walk and make sure we still find the same page + * mapped writable. If we don't find what we expect, we will trigger + * GUP again to fix it up. Note that a page of an anonymous folio can + * only be mapped writable using exactly one page table mapping + * ("exclusive"), so there cannot be other mappings. + */ + fw_folio =3D folio_walk_start(&fw, vma, addr, 0); + if (fw_folio !=3D folio || fw.page !=3D page || + fw.level !=3D FW_LEVEL_PTE || !pte_write(fw.pte)) { + if (fw_folio) + folio_walk_end(&fw, vma); folio_unlock(folio); folio_put(folio); return ERR_PTR(-EBUSY); } + + /* Nuke the page table entry so we get the uptodate dirty bit. */ + flush_cache_page(vma, addr, page_to_pfn(page)); + fw.pte =3D ptep_clear_flush(vma, addr, fw.ptep); + + /* Set the dirty flag on the folio now the pte is gone. */ + if (pte_dirty(fw.pte)) + folio_mark_dirty(folio); + + /* + * Store the pfn of the page in a special device-exclusive non-swap pte. + * do_swap_page() will trigger the conversion back while holding the + * folio lock. + */ + entry =3D make_writable_device_exclusive_entry(page_to_pfn(page)); + swp_pte =3D swp_entry_to_pte(entry); + if (pte_soft_dirty(fw.pte)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + /* The pte is writable, uffd-wp does not apply. */ + set_pte_at(mm, addr, fw.ptep, swp_pte); + + /* + * TODO: The device-exclusive non-swap PTE holds a folio reference but + * does not count as a mapping (mapcount), which is wrong and must be + * fixed, otherwise RMAP walks don't behave as expected. + */ + folio_remove_rmap_pte(folio, page, vma); + + folio_walk_end(&fw, vma); *foliop =3D folio; return page; } --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FF441DE2C0 for ; Wed, 29 Jan 2025 11:54:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151673; cv=none; b=c/gdla1rG3I3RnXarU8iKHzDe3DWQBF/EfkMmEUU57OaxQyJwvkn1AUbALnOkeE8KOpBqjS7MrGH80OzW5gvb4R1cgsczdKyxJALe/ZevaU1tOLyp2R+tUjH7NKd7UxSjBzU7k7dgRJNoj6YKjgmxf76OpsbO1aCNb9N1mA/Igk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151673; c=relaxed/simple; bh=Yra65BGBV8k5XKIYw+5XZASeR9kSFc/ZXoKOAD23Si4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cXh6TRJNE0SnvV0+n2BsScw1Tub/fXyLd67NKN4yIMa9kkwHjRA+qy0Lricv/yGwzkniLT1FNLAs/mZhAoIW3hkxgioFN3MVBTBdlG5Zg7XKXLofWXOn3gUbogFKdP8iNRDog0aQcWXtK8OQDymrJ6gRPSgLp/uMvZXC9GK72l0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HGytB2dV; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HGytB2dV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/5TTv0Iu+aL+vrfHcl/JjDhmk1gN271wttPyRyQHADk=; b=HGytB2dVkPgiR9ejfwBiWU3gdGB46hzHD07qcVkIvzbpo+vNrbBZfVvuVm5jHegd5icUB5 tqUvegNVk/Es8n+hRQFyg5MRW2KwD0UPjG46XQ+JUZHjqil2x5N4p+bR04OvtRpgMCV45N rvdmNb23GWmotgdCatioCxRvQXBiMq4= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-552-p1gh93FyMu2VG06A4rijvA-1; Wed, 29 Jan 2025 06:54:29 -0500 X-MC-Unique: p1gh93FyMu2VG06A4rijvA-1 X-Mimecast-MFC-AGG-ID: p1gh93FyMu2VG06A4rijvA Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-38a35a65575so4681872f8f.1 for ; Wed, 29 Jan 2025 03:54:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151668; x=1738756468; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/5TTv0Iu+aL+vrfHcl/JjDhmk1gN271wttPyRyQHADk=; b=vxJ1TsbJ+jxt2oK2CQGznwO+1q47yB+rHEuYXMwFjA0JL9FVJeTJGpemzcLlJ7NMAJ VJm/mBd0Vf6gn+Yq/uBh1k+OB+p4qFaFNckZ18GatDlqW4pTIfPqiWrDoqJrNNwACbU9 hBhpK8JCBMw7r4eTXXvooLW5jxr9WGut6vXXEj+6tS7wsDOqqpUt8XvLYkPMM7z8etnG iYFcX2bycu35mf6laKqNv07ISmF59QPfIOry+WMOOoo8o/vw9hWkNeA6oL4oek8TyItq VdgJSAxxPLTMYjCAE4z9ggCsXK/KivM3AbDhqt9vAVncxbjTtxRn99gmVOZ2VXk8E8aZ spMw== X-Gm-Message-State: AOJu0YxXsW4SgX0TGw10RfIYsOEVCpJvAXJoFITL9BnB3VyGH3MFJ6Mi cdFIfDd4Ghdu2hjFZBxbx3bjnxXwS/e4GUc4pqX5MRPSr6rUCrg2icH/ePBN3tkBT5lcGdfFoPx lOgXXJVyGS3OwbghCd23XTwqEnSVfoYkS7uOGqY9KsPrQX3BhQD7bz6YWY7KKbrIb5LIhdw6F7S KHqqzXAtdUjOwdwG/FGP/WKNb9a3xEMfQjjHb1lBfNxBZq X-Gm-Gg: ASbGncuEeOXe/dVk762j02s4om1ufgUgUcbelDzzTMvlvqmj4VF+QJ3GKN9RgpHMQHf bwVOnJvge+NQoJG3nXivaCDzV6uCWj8JKYrM0R3AfarkYVIE8LpfcFuygntP6SqLjTOGIaDmhCM 6xUja6R9nFr+Zrc0m38o1JRH6+Vq6SVllJAg1NZCsyDHvuV95gg0Kc95c3JLc1pWON8qyBc8jTB 8Y/ItM8wwBmD4BxM4KRA6UPWII6BY2KQYXBgm1Tvv9TqSPysAqm6hmGWvjDDBEpGfhMSopP3eHP FsSJZDBZdE8oVBkRGGO6qxk66mw+80mjC3gDOR2GEmbDW/K8ts1A5NNZc3T+KAF7dQ== X-Received: by 2002:a05:6000:1fac:b0:386:3329:6a04 with SMTP id ffacd0b85a97d-38c51e8de63mr2549394f8f.39.1738151668398; Wed, 29 Jan 2025 03:54:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IH3OskkNotGDGa9hKk/bhinDJ94hVo+TUAo4zN3qwEB+ZEXq9Nn9aY72BgysSoJ/r2gZh7hQQ== X-Received: by 2002:a05:6000:1fac:b0:386:3329:6a04 with SMTP id ffacd0b85a97d-38c51e8de63mr2549347f8f.39.1738151667985; Wed, 29 Jan 2025 03:54:27 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a188fe8sm17066981f8f.56.2025.01.29.03.54.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:27 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 05/12] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable() Date: Wed, 29 Jan 2025 12:54:03 +0100 Message-ID: <20250129115411.2077152-6-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Let's do it just like mprotect write-upgrade or during NUMA-hinting faults on PROT_NONE PTEs: detect if the PTE can be writable by using can_change_pte_writable(). Set the PTE only dirty if the folio is dirty: we might not necessarily have a write access, and setting the PTE writable doesn't require setting the PTE dirty. With this change in place, there is no need to have separate readable and writable device-exclusive entry types, and we'll merge them next separately. Note that, during fork(), we first convert the device-exclusive entries back to ordinary PTEs, and we only ever allow conversion of writable PTEs to device-exclusive -- only mprotect can currently change them to readable-device-exclusive. Consequently, we always expect PageAnonExclusive(page)=3D=3Dtrue and can_change_pte_writable()=3D=3Dtrue, unless we are dealing with soft-dirty tracking or uffd-wp. But reusing can_change_pte_writable() for now is cleaner. Signed-off-by: David Hildenbrand --- mm/memory.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 03efeeef895a..db38d6ae4e74 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -725,18 +725,21 @@ static void restore_exclusive_pte(struct vm_area_stru= ct *vma, struct folio *folio =3D page_folio(page); pte_t orig_pte; pte_t pte; - swp_entry_t entry; =20 orig_pte =3D ptep_get(ptep); pte =3D pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_mksoft_dirty(pte); =20 - entry =3D pte_to_swp_entry(orig_pte); if (pte_swp_uffd_wp(orig_pte)) pte =3D pte_mkuffd_wp(pte); - else if (is_writable_device_exclusive_entry(entry)) - pte =3D maybe_mkwrite(pte_mkdirty(pte), vma); + + if ((vma->vm_flags & VM_WRITE) && + can_change_pte_writable(vma, address, pte)) { + if (folio_test_dirty(folio)) + pte =3D pte_mkdirty(pte); + pte =3D pte_mkwrite(pte, vma); + } =20 VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio); --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EB121DE4F9 for ; Wed, 29 Jan 2025 11:54:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151676; cv=none; b=k/8Eg7e4+LuWmDt2wWyOMEJwOK/nOtixPzN7wp2ezKd7AfCZ77j0A0fOdTA2KGtYpM22TpbximYdqFtVMWKKuzdu4PILH5yiNTvc+w4qKUKFsJkfAAdJCKD2Olh7W3vxgg+RWlg+6r5gTg0DwwTfMkW7UNQxcWEZ2c2UxelEqak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151676; c=relaxed/simple; bh=TUOIefDWX1orCSp/1pnVE1CfdjwXLwRBvFvFbOJdqfw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oyKFprjFL+l/LRFqV+OZZo673DAHMOW1GFsDnf72PzyWWqyK/WNqiFuk8+fim5FsE2dOW5iBeI/wQc2zLXYQcCUqy5IvJpYsbrqa7mFdcUOEDTf1k8zEPTWy82www7IWkaJqLqCh6FjKxi9DGLBCv/CcpXX0+7BlyQ/Di8K6iz8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Yfshc1Ul; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Yfshc1Ul" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151674; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bwRdkJe7S5eVWCBWfyg7xh/8u+L2oGZrd41rkCVbduE=; b=Yfshc1UlVTAyHf+Wof3r2B5z3qQW2X5LU/m/5Dp2EvrDPQXR9BtboH5bdQtZmLz4LkDKCE 16Gc+ge92jdzr+U5S9gLxl9WmMKX2ED5JSXaUKs7FpZYIlbp1cN8Sg7Q0ymX2X9LX+9Dw9 jrMi2QV+6poq6Yulk0QKw86wFm31xLk= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-687-vBc7NWiLNV2UYSBOyMfD-Q-1; Wed, 29 Jan 2025 06:54:32 -0500 X-MC-Unique: vBc7NWiLNV2UYSBOyMfD-Q-1 X-Mimecast-MFC-AGG-ID: vBc7NWiLNV2UYSBOyMfD-Q Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-436723db6c4so46630085e9.3 for ; Wed, 29 Jan 2025 03:54:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151672; x=1738756472; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bwRdkJe7S5eVWCBWfyg7xh/8u+L2oGZrd41rkCVbduE=; b=GGjxgC5QY2R4W3qbrQZcBCwpvBPx6OY6TOnNeNsGaXJQdeh2sdOyr5o6sGbPrHw/gm TwqTB8sqh+t4+P0HGvgLA60bKy8p0y/DaLdoJu1rjC+CyEkD567gwHwubjuX3nn9FMiD 6ynChAX/LZTUZk3khWhK07gIWWQza+oOTx739dg1Edo79WI46ybNcedtoLyh7rmVEbuD UlXXbISoe+Hg4qdvEAIQ1yTQ9U8zA+IgdemKX8+oW8WiZl7z06d16DVYIZ4UtbS1rsw3 3WulpEp5dvQdThLn12P52+105QBFA+bREWwujoN2I1PGflWk/LzZnapS/7f3WSRmuaMM KyvA== X-Gm-Message-State: AOJu0YyWFJ/rWMaFRj0LJmMffA4jp+FeJq32lXwIaDcHGsvL15HliojQ VmdBcKcyL1gBgPz/dgFdNJ+QRItdsgRIB/WT1RYBlktru2NS2nh2+Dfu7kWcPJ7qijDjTMrVV3v rw7FsTKho0v0eqZI72d0CIJoJHdQ/kOcb7T2i40dkKO7pnEIVE7+ln7rSiSlSUBU6STzPuM6t3N 0Kk+CUOuK6rWOtb4OtnMVFxAovjDVS4yJ14ycG2YvK1tEs X-Gm-Gg: ASbGncvuXyTWWvOokmWxdhKS0e0QZYsl2EuY5duiv+bQtcFCWZitwigIgFgro0Cf3Zz /4oE/fNoqRo9lcW1bARgZCkKNib11VNr1okXEHdf4XAhPYFGlCFWDQE8oUcLPtdWwJE/LEd2Dam e4cWDLo9bh0fWr29IfudndqS9Mb/2HsICOP5Q9C329BZ6XxwSrCjsvxWFS2uV82Q3o/QjEWMTuz f+YkLE1LDKawK1iHAQAvNP/MmRAJ3A4I4mc4Kzncako+6Txx2XhGeIUWdgNkFtF1wiNebNFBYeV y9eP/uTLfHgxFUePO5G2Xocqh4b+BM9dBJHl7f6wZSr+nVpxFq+djPZU9gG8mDCj+A== X-Received: by 2002:a05:600c:1c1a:b0:434:f9c4:a850 with SMTP id 5b1f17b1804b1-438dc3c4623mr25664965e9.10.1738151671710; Wed, 29 Jan 2025 03:54:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IEhuPkSFqdGYqW92VDVvy4lKssWj+By02GdvgYFn3sg13RZ9e45UT5b96yUwTD+A2GCuT4qDg== X-Received: by 2002:a05:600c:1c1a:b0:434:f9c4:a850 with SMTP id 5b1f17b1804b1-438dc3c4623mr25664525e9.10.1738151671268; Wed, 29 Jan 2025 03:54:31 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-438dcc24e73sm20192895e9.16.2025.01.29.03.54.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:30 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 06/12] mm: use single SWP_DEVICE_EXCLUSIVE entry type Date: Wed, 29 Jan 2025 12:54:04 +0100 Message-ID: <20250129115411.2077152-7-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is no need for the distinction anymore; let's merge the readable and writable device-exclusive entries into a single device-exclusive entry type. Signed-off-by: David Hildenbrand Acked-by: Simona Vetter Reviewed-by: Alistair Popple --- include/linux/swap.h | 7 +++---- include/linux/swapops.h | 27 ++++----------------------- mm/mprotect.c | 8 -------- mm/page_table_check.c | 5 ++--- mm/rmap.c | 2 +- 5 files changed, 10 insertions(+), 39 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 91b30701274e..9a48e79a0a52 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -74,14 +74,13 @@ static inline int current_is_kswapd(void) * to a special SWP_DEVICE_{READ|WRITE} entry. * * When a page is mapped by the device for exclusive access we set the CPU= page - * table entries to special SWP_DEVICE_EXCLUSIVE_* entries. + * table entries to a special SWP_DEVICE_EXCLUSIVE entry. */ #ifdef CONFIG_DEVICE_PRIVATE -#define SWP_DEVICE_NUM 4 +#define SWP_DEVICE_NUM 3 #define SWP_DEVICE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM) #define SWP_DEVICE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+= 1) -#define SWP_DEVICE_EXCLUSIVE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIG= RATION_NUM+2) -#define SWP_DEVICE_EXCLUSIVE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGR= ATION_NUM+3) +#define SWP_DEVICE_EXCLUSIVE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION= _NUM+2) #else #define SWP_DEVICE_NUM 0 #endif diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 96f26e29fefe..64ea151a7ae3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -186,26 +186,16 @@ static inline bool is_writable_device_private_entry(s= wp_entry_t entry) return unlikely(swp_type(entry) =3D=3D SWP_DEVICE_WRITE); } =20 -static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t off= set) +static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { - return swp_entry(SWP_DEVICE_EXCLUSIVE_READ, offset); -} - -static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t off= set) -{ - return swp_entry(SWP_DEVICE_EXCLUSIVE_WRITE, offset); + return swp_entry(SWP_DEVICE_EXCLUSIVE, offset); } =20 static inline bool is_device_exclusive_entry(swp_entry_t entry) { - return swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE_READ || - swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE_WRITE; + return swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE; } =20 -static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) -{ - return unlikely(swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE_WRITE); -} #else /* CONFIG_DEVICE_PRIVATE */ static inline swp_entry_t make_readable_device_private_entry(pgoff_t offse= t) { @@ -227,12 +217,7 @@ static inline bool is_writable_device_private_entry(sw= p_entry_t entry) return false; } =20 -static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t off= set) -{ - return swp_entry(0, 0); -} - -static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t off= set) +static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { return swp_entry(0, 0); } @@ -242,10 +227,6 @@ static inline bool is_device_exclusive_entry(swp_entry= _t entry) return false; } =20 -static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) -{ - return false; -} #endif /* CONFIG_DEVICE_PRIVATE */ =20 #ifdef CONFIG_MIGRATION diff --git a/mm/mprotect.c b/mm/mprotect.c index 516b1d847e2c..9cb6ab7c4048 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -225,14 +225,6 @@ static long change_pte_range(struct mmu_gather *tlb, newpte =3D swp_entry_to_pte(entry); if (pte_swp_uffd_wp(oldpte)) newpte =3D pte_swp_mkuffd_wp(newpte); - } else if (is_writable_device_exclusive_entry(entry)) { - entry =3D make_readable_device_exclusive_entry( - swp_offset(entry)); - newpte =3D swp_entry_to_pte(entry); - if (pte_swp_soft_dirty(oldpte)) - newpte =3D pte_swp_mksoft_dirty(newpte); - if (pte_swp_uffd_wp(oldpte)) - newpte =3D pte_swp_mkuffd_wp(newpte); } else if (is_pte_marker_entry(entry)) { /* * Ignore error swap entries unconditionally, diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 509c6ef8de40..c2b3600429a0 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -196,9 +196,8 @@ EXPORT_SYMBOL(__page_table_check_pud_clear); /* Whether the swap entry cached writable information */ static inline bool swap_cached_writable(swp_entry_t entry) { - return is_writable_device_exclusive_entry(entry) || - is_writable_device_private_entry(entry) || - is_writable_migration_entry(entry); + return is_writable_device_private_entry(entry) || + is_writable_migration_entry(entry); } =20 static inline void page_table_check_pte_flags(pte_t pte) diff --git a/mm/rmap.c b/mm/rmap.c index 49ffac6d27f8..65d9bbea16d0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2470,7 +2470,7 @@ struct page *make_device_exclusive(struct mm_struct *= mm, unsigned long addr, * do_swap_page() will trigger the conversion back while holding the * folio lock. */ - entry =3D make_writable_device_exclusive_entry(page_to_pfn(page)); + entry =3D make_device_exclusive_entry(page_to_pfn(page)); swp_pte =3D swp_entry_to_pte(entry); if (pte_soft_dirty(fw.pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AAA31DED60 for ; Wed, 29 Jan 2025 11:54:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151678; cv=none; b=Ln9OPIKx2nZoskxpzHM39F8nnjfUCEU1pyK6LbLWsM74xA1hhItAqWBduLnfz6r4+ofKh66rURxKjGqB3I7lpFszRFpaDa/rZHgK8noEUaFABvfnOWl0/KKmhMzP604SdHFFYyjksXiMwT4NHlAfHOZaL7YP6s6+6rzTzi0s/S8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151678; c=relaxed/simple; bh=QeZqoAVJcsJt2vivLq7R8JTlFiXhijsLw+UaLook7wY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DBdmCt7EIvnE16cSgm+FGpR8aG6Wq6QLPKSKoNXRHdGJVDwINMg+JlO3pqwezDrt4Z6nXA+2xQU7w9/Jwvu1S+S/uam+OC+zuwlp+9SUoN3BgHufwaA1XMQxad4SUY+8n2mUlBQZwwpkZSgFKtsApZo6JPp4dct9YWkht/ugJvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FaU4Y73n; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FaU4Y73n" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151676; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SXa0V5qonctJf7EFSiCmTQng5dHGkyNA7f/zw6SOSBk=; b=FaU4Y73nHwhJUv5oxG/hEg25zcWBirtuEctnpAEhaR/X8ZZ+PaIr1rYYP6gN9ZByTnvNK2 CuUWKYnLAcEKBSN/A86y4WRAujAN77NkOKngKYv/lQECBv+NkHaYSWn1B6L0D/vLQTDi8T eChfjz8L1xTj8saP1xo3oEf/+ESsyng= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-306-Rrb8tZxgMN2B2nmaMv-acA-1; Wed, 29 Jan 2025 06:54:35 -0500 X-MC-Unique: Rrb8tZxgMN2B2nmaMv-acA-1 X-Mimecast-MFC-AGG-ID: Rrb8tZxgMN2B2nmaMv-acA Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-43646b453bcso34325855e9.3 for ; Wed, 29 Jan 2025 03:54:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151674; x=1738756474; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SXa0V5qonctJf7EFSiCmTQng5dHGkyNA7f/zw6SOSBk=; b=DYezzbFjQJEawVFgxzgFAS1/AGFX8UYg29bd8vNJYayOrEkThd6BcgS7H+H9LZInXl lVTiqU2C7IsloiRkN68K0mUIMX0OyEs1JHPDRAh34b+dXxIyE/7145HIm7ZgPS7qSU8r XHS/LcHxAg/66/Ml2nWzlleoJ1vcQJixwUk8KIV+/MVN3IXn2cjGV6nw7r4OhjmIv9Z5 uRwGQWwQ5dqIL6ju6QVfFoSqoaphqDpvHCr0MjaowYUxHq4F98mmU1QXjNAJ3eT7/Gg7 LpzYr6bQ6p1uVnaN6YP7xGHxeuOPeWJMxHgyn9jEwz/AC4qXANIKIVMFwwWwwJhZytsT oMcg== X-Gm-Message-State: AOJu0Yy5MIE7HM3WwLQ3OwY63g+0HlldcGgLkS5qnuf3R56vTFftvxHC 0YcZ0QkEjoJnXIV2qWNGqKV8baSZ8EpSvIthei9wNxqeu/ntlzGMQAqsGtfuWpDivxEttd7MDJI acQL9zLCoPC64lBTqunvZO95LOP+RZG01PrLNBAZgqIdcXGfIfZ0UhYZVnNmmHKiRVx8KyNtSSi pbZrNtfKyZvEXQe7Y2TxviZyeVgFd/pVgIbrRmKGApMNvv X-Gm-Gg: ASbGncvSTBMcNxcJpXXPcpXCUzqmrDawg9qvOvu/zu+HKB7in+t8czcqr+e9W5QCy59 7fsBvI/LzZMh4WXWeJHI91n6o10+X6fCK9CH9AA7mC2NVOo5MmYzZA7kuBOOUwq+c+f7evr3cP4 /iQATTA0dvkTv+JcX+jB8m9Oi4+vt0ZmqDJGhYY1mVFvBvnqQUoOtiY9T1QdrRX0J65fjoUDXy8 +WeHTJiaxdCF9MVm9Gf/ZJP5wxi62FleL/fAUxC2ucaWra+qwLZj8l1KcDFlWhZr+PVnw0lTChA vvcKd+Cqx2coh9bNs6n4+c8StUl/hUE3uZkWKCMaga7Amjdi2SgG+73MkspnieiUQA== X-Received: by 2002:a5d:61cd:0:b0:385:f7ef:a57f with SMTP id ffacd0b85a97d-38c519744d5mr2113642f8f.27.1738151674255; Wed, 29 Jan 2025 03:54:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IGYVCfHEVHB5xz9x9tgAenDwJY8DuXR9neg40xsYB0gvwDn7frPxCs2ZGd0+jvQzBFPN5gg9A== X-Received: by 2002:a5d:61cd:0:b0:385:f7ef:a57f with SMTP id ffacd0b85a97d-38c519744d5mr2113590f8f.27.1738151673837; Wed, 29 Jan 2025 03:54:33 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a1bb040sm16943248f8f.67.2025.01.29.03.54.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:32 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 07/12] mm/page_vma_mapped: device-private entries are not migration entries Date: Wed, 29 Jan 2025 12:54:05 +0100 Message-ID: <20250129115411.2077152-8-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It's unclear why they would be considered migration entries; they are not. Likely we'll never really trigger that case in practice, because migration (including folio split) of a folio that has device-private entries is never started, as we would detect "additional references": device-private entries adjust the mapcount, but not the refcount. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Reviewed-by: Alistair Popple --- mm/page_vma_mapped.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 81839a9e74f1..32679be22d30 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -111,8 +111,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return false; entry =3D pte_to_swp_entry(ptent); =20 - if (!is_migration_entry(entry) && - !is_device_exclusive_entry(entry)) + if (!is_migration_entry(entry)) return false; =20 pfn =3D swp_offset_pfn(entry); --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42FB01DC994 for ; Wed, 29 Jan 2025 11:54:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151681; cv=none; b=HLvWTuJJCcu4Mj31U1HxjCHkgBH4K7jZDY8lJnNxNtr1oQ50wb7oJ3Zi2J3KkBq+xST0HARMut0zsZJ64LCJJiGz91zH6SKL26Tk7rxtz+DIP2rUuTNPjzWvmXOt+rL2U+j4SUdSemWf3X26w5mdxQzNi9CmjB4mjxYzFEWjYFI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151681; c=relaxed/simple; bh=zoty/tDG38rtCLGJyP6VjP2nShgp7/XdAgyaitu3lVI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NstqMsRONtB5cI1bFE7HEMqvjIeY3lw9Fi3tZBsztUXsWETynIO9k+a7Zfk9YmDkKH/SxKILYaMT1kogYtPLmHkTHB9VJevpqCyFWK0/fLEWN+pGSqnZWlOd1XRg/ARMB6hlvjY6FHZxqqP88kgViHxmLT5/0Lv6giZLNB1xQiU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ITBRxl0s; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ITBRxl0s" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151679; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PlRguNVTZsTlNEYBD/J+r00pj+4jYSoE9sS4zp2ee/k=; b=ITBRxl0sR8HSx/Tp0knZ5bVlvPqhrx5qqfyvT3X8P8q6IAAi2/9EyLwTISKmN4Cgjr48L0 oWUvkrQt8c+7xKe2X1T/TVd+qgQ5+15B7NLweIGJLaW6fPTsBagz5D7LZCtlJaljWytn+x IKqw6hUAib2cWioD5vpraUtB8KktlQ8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-151-KI-vRjnPPZyXN8rizwY8NA-1; Wed, 29 Jan 2025 06:54:38 -0500 X-MC-Unique: KI-vRjnPPZyXN8rizwY8NA-1 X-Mimecast-MFC-AGG-ID: KI-vRjnPPZyXN8rizwY8NA Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-385ded5e92aso2765220f8f.3 for ; Wed, 29 Jan 2025 03:54:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151677; x=1738756477; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PlRguNVTZsTlNEYBD/J+r00pj+4jYSoE9sS4zp2ee/k=; b=eZFMrVgISbCkIiWsqCwnjkQo+BXyrEtq7e2lILE1TiNoN4kLVEf5ujbLHAbLSR+H5Y 7HD931suOWYkqNfcITMs1i3oxejxLiQ1WBaY8U7XXDh41ByQccS+9mQjeczk7DEWxjV+ rCO7sjEWlsRNaEI5PUxfVTzVPUd0Up5pj11Ddc6oQ2glt37vcEnA+YoI24SiPEW/6OHl yaF1EEbXLGRi3a2K6vN0k5GeMkdPdhMeJfu+lbEVLBMeX0nNr65DE6cibj9sbdhLH4MB jc3fJ1iz7F3S11CQ6YSaGOWTtP2nUUEs3IwsU/3aa+gEyHXsXUzW9e00HFKI48HD+W56 D9eA== X-Gm-Message-State: AOJu0Yy2E6xx623WLsXAmRBE891a1ZFpOmqWPmeypUFWBmNFNdYlqa2z 2btrNFGbv6YoNYDXifubbRHnEH2f/bQs/GCsJZoYULOgQzt3Ufu5Ua/axMHktyMBdh3jaDT2BI3 5yDi13kuSf7I7Ska81TM9FQorBMhQPzEKxCijXwQqNSpzpoITvx3uCsWwzdEEDk/BGVdb9Dhr/Q 0q4F7SQva09UCQ52rvXxIT/8a2jiQRJQAIftH+VpxTHBo1 X-Gm-Gg: ASbGnctMFoqp6H/+ImNlGzVZRFHUlbGRZ3wPlXgMKbcOXpJXvKs0m3hxHRihmbt9KYJ muGagjJqbaqzmYbILu2d94PNsu4rSo2on8he2ixe0TsKtHhAWCW+6hhEJZPgTnzIwuur+SHVHrk vvR5fWERgrFu1czWVPmvzsUVlvG/2iiO1OSIMnRlLbXhDiLlvzqa488Cso1LE3LgPP2LVVyd9L9 RJGNJzWpMzZ1Sg9abeviKXNwEuyOGZ7OBb5Kxois5RDHUxCdJbEIyzctrwtYwQEQPmI80ICvPUN jZLP6PsL7QLuYnRNI4ADW2wNZ+j2hPem25m1IeLO++Wdx59XXId1p3ZpVat/rvAavA== X-Received: by 2002:a5d:50c2:0:b0:38b:d7d2:12f2 with SMTP id ffacd0b85a97d-38c520bf925mr1637968f8f.54.1738151676945; Wed, 29 Jan 2025 03:54:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IE238aJbXQykMLxOmkA/GlxflTBLrEd6FkSw7fKwOzieBzddAgHKUTAMn1ykR+OZNV3l083wA== X-Received: by 2002:a5d:50c2:0:b0:38b:d7d2:12f2 with SMTP id ffacd0b85a97d-38c520bf925mr1637927f8f.54.1738151676497; Wed, 29 Jan 2025 03:54:36 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a17d7a7sm17234978f8f.32.2025.01.29.03.54.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:36 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 08/12] mm/rmap: handle device-exclusive entries correctly in try_to_unmap_one() Date: Wed, 29 Jan 2025 12:54:06 +0100 Message-ID: <20250129115411.2077152-9-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_unmap_one() is not prepared for that, so teach it about these non-present nonswap PTEs. Before that, could we also have triggered this case with device-private entries? Unlikely. Note that we could currently only run into this case with device-exclusive entries on THPs. For order-0 folios, we still adjust the mapcount on conversion to device-exclusive, making the rmap walk abort early (folio_mapcount() =3D=3D 0 and breaking swapout). We'll fix that next, now that try_to_unmap_one() can handle it. Further note that try_to_unmap() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for this device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 53 ++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 65d9bbea16d0..12900f367a2a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1648,9 +1648,9 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, { struct mm_struct *mm =3D vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, ret =3D true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret =3D true; struct mmu_notifier_range range; enum ttu_flags flags =3D (enum ttu_flags)(long)arg; unsigned long pfn; @@ -1722,7 +1722,19 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); =20 - pfn =3D pte_pfn(ptep_get(pvmw.pte)); + /* + * We can end up here with selected non-swap entries that + * actually map pages similar to PROT_NONE; see + * page_vma_mapped_walk()->check_pte(). + */ + pteval =3D ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn =3D pte_pfn(pteval); + } else { + pfn =3D swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + } + subpage =3D folio_page(folio, pfn - folio_pfn(folio)); address =3D pvmw.address; anon_exclusive =3D folio_test_anon(folio) && @@ -1778,7 +1790,9 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); } pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -1796,6 +1810,10 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, } else { pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + } else { + pte_clear(mm, address, pvmw.pte); } =20 /* @@ -1805,10 +1823,6 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, */ pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); =20 - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - /* Update high watermark before we lower rss */ update_hiwater_rss(mm); =20 @@ -1822,8 +1836,8 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -1902,6 +1916,12 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } + + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have non-swp entries + * here, so we'll not check/care. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { swap_free(entry); set_pte_at(mm, address, pvmw.pte, pteval); @@ -1926,10 +1946,17 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, swp_pte =3D swp_entry_to_pte(entry); if (anon_exclusive) swp_pte =3D pte_swp_mkexclusive(swp_pte); - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + if (pte_swp_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } set_pte_at(mm, address, pvmw.pte, swp_pte); } else { /* --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E9021DF741 for ; Wed, 29 Jan 2025 11:54:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151685; cv=none; b=RvKAmk2oH4gMOm1mtc03PWr5kcSEPgNhfJEacw4TunklVhamEfN562dVyjQCMZ0tsEJWTcLF7zWuDmgrfRia9GQivfnIxvw4mgdcS5REPbGwo2SDQT3EZWP4j12K+n+LIJ59xu9AlJK1TsQT3jPQkmSnhWJiebIcQUS2tVu9dSE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151685; c=relaxed/simple; bh=6kVwOBl9xLzEExlS8FoIh4QwM1O96E9UpXBajCBA64U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=imfXPEDRrQG/obfxp2PRtQmK2ZdhWLPiO7WY/tn8MVzyxLR10TNA3rll3VtZq5wCgofO+v+NUkGy0R/Bosy1Ekj/Q4xZvd8D6muxujH3KYrfKBsPZCe/PS6i9OY46GDsEtPchAhZHS63ArWw/cgTsJP6qt03bh1sSy1j4gWIy5c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iwLYDqfx; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iwLYDqfx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dBI2mWOVOmwqsdVlja7jNAEkV5b5lmKCa22lnGaqVYs=; b=iwLYDqfxo6xR2DH811l6bMixTrTnUM9LY4OzZeTIEeiJfE2EWUD/uqGRtKE5dzDpHhN3dt pqyyuMZHaznHnStT//nwEGnDRYuFVTAq/ppZa+ef10WErInKDZiz0EG4bGINFrw+8Hn0M6 K4JtZgbSG8x1331azZDmhjuwis9D4tc= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-412-CWQOEdTAOr-QArRPQrwjOA-1; Wed, 29 Jan 2025 06:54:41 -0500 X-MC-Unique: CWQOEdTAOr-QArRPQrwjOA-1 X-Mimecast-MFC-AGG-ID: CWQOEdTAOr-QArRPQrwjOA Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38c24ac3706so5219013f8f.0 for ; Wed, 29 Jan 2025 03:54:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151680; x=1738756480; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dBI2mWOVOmwqsdVlja7jNAEkV5b5lmKCa22lnGaqVYs=; b=mUA+z4/usJCj3xSNlbxtDPJvCyHUj0K33vaPZT2I5ESqc6G0dxQ8WaB7FCgyBqeNzd mbrq0ve148lUQo9QQj6cNqR28GFnztWSibXjeordJI5kAX33uMDxzm71b9FQC6zhJEre 5tbO0zm00lPga+dnDsm8qVcScotdqxT2wn8w0Ukx0q8pUBlwpAmNtRUqEBdy20jLlHXC IHXCLNzL23NUePMekw2fpkJs+A7Hsa1avrUbwETCLHRJMrGnOVvzriQOHXJEXaxPxl22 ZhwBX6WYikeX7ox0QbJ+F9/BEBXsT/MQuss+v9LC5roLJCHfjtQyKzW05d7AMvz8bCY6 aErA== X-Gm-Message-State: AOJu0YxQEp5cfIzfBhdiIy76zxmNar9F1KjLndwqOK2MVL78b26P9b0M 7Fh7+MXFqYhRpYaQGQQIV/sfMsr/pJfJ9TkBgCl+bj9UWlFnI1u0o1UJz0X33ojSvBTq+cBJDEv o1MHOZwv2AbAK8BGsIZzDghrJq2clGNlYN1qF8OI1ApDMv8jEXzmr2uQHYBdxxk/MaP8VtYTKY9 ijDBIFLJ0MiHo+s9++LVmXLUFP/RWvFBza/PjDyQxeMyWF X-Gm-Gg: ASbGncsJoLtwGJP5nyeYYb/eZv+GFDTIxWORcixeZGAfQImH1XyVOfAUZpryTBcDma+ BNxN0/rp/8q+wjL0v0D9PnOze8tTVSu+Z+XnlCMhvMC5YEEjOSVNUDZcD0/Nx6BXawaE3rtSLwY kDgeUEG3e0GBwRl4D73Y2iCU9sz7F9eAFjfszk3gRAV2y7M8drr42pR0q8sV1hLkXhe1HAnoT3H kaTIbHqJzwHsY3W+OwC9HpwXsu9bnZc+GjqOHhZ5aGnAgDwdkQmuSHaViDfrDDLWMBcDg1AulAh 0ax9VlXQMtCwRyIpBvvCZJlcSBQRuF84UiFKG/jZ4LPE1ZwJx+giM8eareI5zc6oYg== X-Received: by 2002:a5d:47c9:0:b0:38c:3eab:2e17 with SMTP id ffacd0b85a97d-38c5194dae9mr2038641f8f.2.1738151679700; Wed, 29 Jan 2025 03:54:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IEuFMqyDWaa/JVieaul0deYFiTz0a9oHzheZm+vFN4IrpXYFm3MUhtZq6TOTjv8DbS0i6Ouqg== X-Received: by 2002:a5d:47c9:0:b0:38c:3eab:2e17 with SMTP id ffacd0b85a97d-38c5194dae9mr2038593f8f.2.1738151679034; Wed, 29 Jan 2025 03:54:39 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a1764d3sm17086479f8f.19.2025.01.29.03.54.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:38 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 09/12] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() Date: Wed, 29 Jan 2025 12:54:07 +0100 Message-ID: <20250129115411.2077152-10-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_migrate_one() is not prepared for that, so teach it about these non-present nonswap PTEs. We already handle device-private entries by specializing on the folio, so we can reshuffle that code to make it work on the non-present nonswap PTEs instead. Get rid of most folio_is_device_private() handling, except when handling HWPoison. It's unclear what the right thing to do here is. Note that we could currently only run into this case with device-exclusive entries on THPs; but as we have a refcount vs. mapcount inbalance, folio splitting etc. will just bail out early and not even try migrating. For order-0 folios, we still adjust the mapcount on conversion to device-exclusive, making the rmap walk abort early (folio_mapcount() =3D=3D 0 and breaking swapout). We'll fix that next, now that try_to_migrate_one() can handle it. Further note that try_to_migrate() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for this device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 125 ++++++++++++++++++++++-------------------------------- 1 file changed, 51 insertions(+), 74 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 12900f367a2a..903a78e60781 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2040,9 +2040,9 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, { struct mm_struct *mm =3D vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, writable, ret =3D true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret =3D true; struct mmu_notifier_range range; enum ttu_flags flags =3D (enum ttu_flags)(long)arg; unsigned long pfn; @@ -2109,24 +2109,20 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); =20 - pfn =3D pte_pfn(ptep_get(pvmw.pte)); - - if (folio_is_zone_device(folio)) { - /* - * Our PTE is a non-present device exclusive entry and - * calculating the subpage as for the common case would - * result in an invalid pointer. - * - * Since only PAGE_SIZE pages can currently be - * migrated, just set it to page. This will need to be - * changed when hugepage migrations to device private - * memory are supported. - */ - VM_BUG_ON_FOLIO(folio_nr_pages(folio) > 1, folio); - subpage =3D &folio->page; + /* + * We can end up here with selected non-swap entries that + * actually map pages similar to PROT_NONE; see + * page_vma_mapped_walk()->check_pte(). + */ + pteval =3D ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn =3D pte_pfn(pteval); } else { - subpage =3D folio_page(folio, pfn - folio_pfn(folio)); + pfn =3D swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); } + + subpage =3D folio_page(folio, pfn - folio_pfn(folio)); address =3D pvmw.address; anon_exclusive =3D folio_test_anon(folio) && PageAnonExclusive(subpage); @@ -2182,7 +2178,10 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, } /* Nuke the hugetlb page table entry */ pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable =3D pte_write(pteval); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -2200,54 +2199,21 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, } else { pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable =3D pte_write(pteval); + } else { + pte_clear(mm, address, pvmw.pte); + writable =3D is_writable_device_private_entry(pte_to_swp_entry(pteval)); } =20 - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); + VM_WARN_ON_FOLIO(writable && folio_test_anon(folio) && + !anon_exclusive, folio); =20 /* Update high watermark before we lower rss */ update_hiwater_rss(mm); =20 - if (folio_is_device_private(folio)) { - unsigned long pfn =3D folio_pfn(folio); - swp_entry_t entry; - pte_t swp_pte; - - if (anon_exclusive) - WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio, - subpage)); - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry =3D pte_to_swp_entry(pteval); - if (is_writable_device_private_entry(entry)) - entry =3D make_writable_migration_entry(pfn); - else if (anon_exclusive) - entry =3D make_readable_exclusive_migration_entry(pfn); - else - entry =3D make_readable_migration_entry(pfn); - swp_pte =3D swp_entry_to_pte(entry); - - /* - * pteval maps a zone device page and is therefore - * a swap pte. - */ - if (pte_swp_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); - trace_set_migration_pte(pvmw.address, pte_val(swp_pte), - folio_order(folio)); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ - } else if (PageHWPoison(subpage)) { + if (PageHWPoison(subpage) && !folio_is_device_private(folio)) { pteval =3D swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); @@ -2257,8 +2223,8 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -2274,6 +2240,11 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, swp_entry_t entry; pte_t swp_pte; =20 + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have non-swp entries + * here. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, @@ -2284,8 +2255,6 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } - VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) && - !anon_exclusive, subpage); =20 /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ if (folio_test_hugetlb(folio)) { @@ -2310,7 +2279,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, * pte. do_swap_page() will wait until the migration * pte is removed and then restart fault handling. */ - if (pte_write(pteval)) + if (writable) entry =3D make_writable_migration_entry( page_to_pfn(subpage)); else if (anon_exclusive) @@ -2319,15 +2288,23 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, else entry =3D make_readable_migration_entry( page_to_pfn(subpage)); - if (pte_young(pteval)) - entry =3D make_migration_entry_young(entry); - if (pte_dirty(pteval)) - entry =3D make_migration_entry_dirty(entry); - swp_pte =3D swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_young(pteval)) + entry =3D make_migration_entry_young(entry); + if (pte_dirty(pteval)) + entry =3D make_migration_entry_dirty(entry); + swp_pte =3D swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + swp_pte =3D swp_entry_to_pte(entry); + if (pte_swp_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, hsz); --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC38F1DF960 for ; Wed, 29 Jan 2025 11:54:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151687; cv=none; b=GgsS6LhDOsYl9xXasjXFicr4H1mrq1t26ovFefrj9oUxtSTdFbsspW28D84E3kC7nhCwCBJcciZhI5hLqbYG6IKI0Wdvw1jnltBc7Gm192OXMM8G7X5Asya4RG1u4ut+CA/+ZuHBnhOr1EY4ZLCB+GiuV7/8Cz1XDoho9eqrRrk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151687; c=relaxed/simple; bh=3RX1C3dSti0Y2JUqODIhfRDdkayq20GNG0HByNlQRYA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SJAHnfQhLFARo7NnNPcZaBbjqYPgeXalb9YKLcV0kGGRSlhQ51EJw2ajTHIIX1XZf3YkJhxNh7/tNTzePbAOHbZuxhlzOTonOWug6L66z2rh9czviWHArOc03lRG4MjVXMiQ5nHcFjoyUwDl1eXJ1TZZJ0YBCa3K8yHjQBsKdfo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Gb0AFx00; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Gb0AFx00" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151684; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mNM6Zw+nv5CKX0FXtzXMADzAxRUJZ/HteqC2IEBjnh8=; b=Gb0AFx00hvqDSKO8mnHFf8JR1IO47yryW3Y5vQB6yJQF1oUKXgDx/6nMAjpD5m8bDyj9MG Wc3REuPcyRZgydF/RYu7nrJGiqgw5MeRHyKW7wwvr9JbJQSKQL3FeiUPcRf5qT4GAk20SQ fskMEhA1hBB2TY/nifET64ZyLILd7fA= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-460-EylURdCNOhaYwUtRVe1Kbw-1; Wed, 29 Jan 2025 06:54:43 -0500 X-MC-Unique: EylURdCNOhaYwUtRVe1Kbw-1 X-Mimecast-MFC-AGG-ID: EylURdCNOhaYwUtRVe1Kbw Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38a684a0971so3067292f8f.2 for ; Wed, 29 Jan 2025 03:54:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151682; x=1738756482; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mNM6Zw+nv5CKX0FXtzXMADzAxRUJZ/HteqC2IEBjnh8=; b=DpP8fgacKRPknF65SGafirUe17hhirwqYh8B/qDzLu68Wl4rg8p81fDgTHZsjXPkFN TD75zAbwE+PP9AC4B5mkxSzLr2h/utC7O2JclJexe8MDbz9n1nGWpTHdfTRux6RzkW5m WKyvWf2Qo6zT4sv20m6bKtUbI5J78rpPSkeksYNO/URb8eWJ3JLqsdiqwqa1KkBxsFbX 8Ev7sItXrXqqibP/0PnUZHQMFEY65Sah6Jz521D7Jb/QMK0nDZig66J+P/CXTR/yT5pC uJhqg7LwZdp2V5IaUtDjgaXKuc9YV9Wue68wk3XBYVP06EXneIFhR1XtmuWsWpQrnnav xSiw== X-Gm-Message-State: AOJu0YyPQxgYQZNO5H3c+8SawhLRqv7mpXPmzj6LX5aQGTsguxEpQDhy 2/UjFoW0cTk6XLvhGzSCMYHujJq4tXCXhth3NrNIyZxiB/cEDOC396Iizn10mHApy9vK7WSwO1Q 7j44SNfBJhfTHM8HVP+mLlXlwQIa5s+fwrMwiQDaiWjwW3fNSqF5xU6/Y/7zWrVEYCMDH3tHDNX bDWzo7IwHQ3FPpTAxBjRClAauRUy0wku1DnI2J45ShmfaV X-Gm-Gg: ASbGncutl7rrFs0yK9jCymfy5KrHghocwYbq9UPvhpUY0B8lMXh5Icu2JphKBfUttBl Oum+32Spr0/11mmnvoKFy3pZWLDH/6x0SsyAY+uDVJL2cwBpZEbdRg0nPgzPY36Ur7g4RqUIt8b a06Yn6vAha+O1KLHOPrXJMRxjIR7MOAcJFtAcbaafqs7zL7ESj2NSoA+Cnm7E5I0qv6ncGOEeEn WP9xoEuvlAcVMSwbj26NMKUqrpKACfv8vFwGEy71ghWU/+kGlBGQGuSR6XAVfismzR4pmZgGvSG 37cyZBJTu1whb9272pE+6cGR6mNNLVeZox0Xi3dRdulWH/zr2KMwx1CzVgcE93fkkw== X-Received: by 2002:a05:6000:4013:b0:385:f631:612 with SMTP id ffacd0b85a97d-38c5195f2e5mr2415004f8f.17.1738151682181; Wed, 29 Jan 2025 03:54:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IEfVb9Gb5/TQQdvQH0d8Y86brLumLHRijGlRqbPqwAUyE2LkrZ873cAyk0+z1AQr8yjahg13A== X-Received: by 2002:a05:6000:4013:b0:385:f631:612 with SMTP id ffacd0b85a97d-38c5195f2e5mr2414952f8f.17.1738151681703; Wed, 29 Jan 2025 03:54:41 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a1c4212sm16316119f8f.87.2025.01.29.03.54.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:41 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 10/12] mm/rmap: handle device-exclusive entries correctly in folio_referenced_one() Date: Wed, 29 Jan 2025 12:54:08 +0100 Message-ID: <20250129115411.2077152-11-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). folio_referenced_one() is not prepared for that, so teach it about these non-present nonswap PTEs. We'll likely never hit that path with device-private entries, but we could with device-exclusive ones. It's not really clear what to do: the device could be accessing this PTE, but we don't have that information in the PTE. Likely MMU notifiers should be taking care of that, and we can just assume "not referenced by the CPU". Note that we could currently only run into this case with device-exclusive entries on THPs. For order-0 folios, we still adjust the mapcount on conversion to device-exclusive, making the rmap walk abort early (folio_mapcount() =3D=3D 0). We'll fix that next, now that folio_referenced_one() can handle it. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 903a78e60781..77b063e9aec4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -899,8 +899,14 @@ static bool folio_referenced_one(struct folio *folio, if (lru_gen_look_around(&pvmw)) referenced++; } else if (pvmw.pte) { - if (ptep_clear_flush_young_notify(vma, address, - pvmw.pte)) + /* + * We can end up here with selected non-swap entries + * that actually map pages similar to PROT_NONE; see + * page_vma_mapped_walk()->check_pte(). From a CPU + * perspective, these PTEs are old. + */ + if (pte_present(ptep_get(pvmw.pte)) && + ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (pmdp_clear_flush_young_notify(vma, address, --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B19BD1DF99D for ; Wed, 29 Jan 2025 11:54:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151690; cv=none; b=XOExQOCe1XKe6dtSEtpZk0WV7ia4Sm0ttZSFZOc1Ue52F2sdnsXwyT0ccsaUh6ZPp7g5DlsK/GZSlklI+UEM+DU5S2iyGfZdNFFaMpsnBwyn/aMXuod37gpjiyKvZ6p53Ph2XucAIzBaPZtpteYFYVpYRtJ37pZNQHFBk2875B0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151690; c=relaxed/simple; bh=+HDKfekOU+adc/V3u+TwozxiLolT/68LibYtSwb3K4A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LI5fsKk2XNi8fgoQ85HJLRZm9mEkWULp9ljfXLvwn0EkKwZ8AeH3df3aRziZir4Lf4viMdY/7pF+B0JNzWzMmgGj5K23ZfYkdeZ2gX6iC1ISYE5IOesVV1g8GSs+rvM6cFXB3z1MIm/vHF/ogMtZJCs7uKSb9mA86G2aFaKkp+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=av7fo/7Z; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="av7fo/7Z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151687; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1V8bFv4UpnBG412679q6Wat0SbxTmvFr7JjVIwnpoX4=; b=av7fo/7ZiKXCT15FVAfYlmZ4Q+Rx4IVINVNEpv3pPxakAPGt2A76IYkVb/+MOYhYmFiFyB UvNoYrso3Riev6MHDi79qNmgKdCxWjWaKywnXBsx2ZNavEXNyn+BqG+BV2SMXbKSy9l0ng f3NT3uZRj5Om4OPzlfxWOHg8G6805Ds= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-60-QrcIRMuPNDCJlHIvXGaVEA-1; Wed, 29 Jan 2025 06:54:46 -0500 X-MC-Unique: QrcIRMuPNDCJlHIvXGaVEA-1 X-Mimecast-MFC-AGG-ID: QrcIRMuPNDCJlHIvXGaVEA Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3860bc1d4f1so3994201f8f.2 for ; Wed, 29 Jan 2025 03:54:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151685; x=1738756485; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1V8bFv4UpnBG412679q6Wat0SbxTmvFr7JjVIwnpoX4=; b=qE8CoylStFVvp2NQRDBhfdBj+YQeXz1AYwJmQdoezYvRxdzih5z4RcGebjgFOo+vFR c45N8XTxM1C23zI30OJsn9XKW6v/7C5bDb21ZJ/hulw+eCwtefDzy+qjJ9iC+jU0mPV9 jKgADCE3amlFU2ynjKauZ3Ew5NAfp4Ky/qn8IUHeGYVEYR/ecRWz/PSHMt7wKZbE1/CE uKyX5k+Ud/XPR7hXETnpozxXN2x3wAHwP9eEUMxYZZSXdKhl8wXjZtLJowOGXqdlOmsE AtJ2yxUOh1BQGDHura0hFU8elecmwOJxRxM2Rd2uAiEACUTNrgFDlyi/ttyrk/5VHIr/ IAJw== X-Gm-Message-State: AOJu0YynQa07ZrXyZRjOsQxQMkOMnWR4JN+uKHjqgnyv0rs+QQBaqN4T 0QyEftGN3sCxENTHQu4vWxc74l2rrKV7JYiJAn4w1AUVfG1TKTycGq1KNsPPBbNcQZ8q5zAdCAj iPA6FBgtmvjxNGO8+4HlJe5SLxcdFgVwPGUyQpUwyhb2Ko0bUCGTnQeJ1FfAjrztsodQf6OlV2/ x7IP6GMlqc8Nqed0dkgtSLW2a23yyY7Ptvv2pqEfr0J7Kn X-Gm-Gg: ASbGncuooMp8410xqTpqJU5A8a93vjVVy85xb+QoB+7AkgUtIw7jz1/XHIE3gpVhO0a C/7+x97xuQyf8AY3HsIk3Lz/KD/a8Yx5M6t5EmW/ZPfWxwEAfvUfwfphvzFeSK61GZiAyJpRvPI QX47sjKLoLquN1llTuC0a7ZfT+beOZzyo1csZIuurygoLhA2Ce6nEApf2kA0HHFrA4wBfRhBDXU oszmj+4kuRno9F7CJ6OvDSQN32/hYgtjPio3XpqDm1Pt/8ySKaEH4d9dG0nOaFEzgn2O8LBIMvr IE8pN3b2YnC7zj4nqXiUiklLmJJxRfApJK/bHjTNaBweaXr4X5c1t+aWku7zmScHpA== X-Received: by 2002:a05:6000:1f88:b0:385:e176:4420 with SMTP id ffacd0b85a97d-38c5194da70mr2305450f8f.10.1738151685620; Wed, 29 Jan 2025 03:54:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IHUbIgz0SPBde6y+6m4GBK9jxdSxFJzhj9pSmwm9jjlsojNxFlCvTBxohQUGPQAzMIjP8ET+w== X-Received: by 2002:a05:6000:1f88:b0:385:e176:4420 with SMTP id ffacd0b85a97d-38c5194da70mr2305401f8f.10.1738151685052; Wed, 29 Jan 2025 03:54:45 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-438dcc2ef08sm20681625e9.22.2025.01.29.03.54.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:43 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 11/12] mm/rmap: handle device-exclusive entries correctly in page_vma_mkclean_one() Date: Wed, 29 Jan 2025 12:54:09 +0100 Message-ID: <20250129115411.2077152-12-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). page_vma_mkclean_one() is not prepared for that, so teach it about these non-present nonswap PTEs. We'll likely never hit that path with device-private entries, but we could with device-exclusive ones. It's not really clear what to do: the device could be accessing this PTE, but we don't have that information in the PTE. Likely MMU notifiers should be taking care of that, and we can just assume "not writable and not dirty from CPU perspective". Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive, making the rmap walk abort early (folio_mapcount() =3D=3D 0) for order-0 folios. We'll fix that next, now that page_vma_mkclean_one() can handle it. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/rmap.c b/mm/rmap.c index 77b063e9aec4..9e2002d97d6f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1050,6 +1050,14 @@ static int page_vma_mkclean_one(struct page_vma_mapp= ed_walk *pvmw) pte_t *pte =3D pvmw->pte; pte_t entry =3D ptep_get(pte); =20 + /* + * We can end up here with selected non-swap entries + * that actually map pages similar to PROT_NONE; see + * page_vma_mapped_walk()->check_pte(). From a CPU + * perspective, these PTEs are clean and not writable. + */ + if (!pte_present(entry)) + continue; if (!pte_dirty(entry) && !pte_write(entry)) continue; =20 --=20 2.48.1 From nobody Mon Feb 9 10:26:21 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20AFE1DFE14 for ; Wed, 29 Jan 2025 11:54:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151693; cv=none; b=fT7XuFglssNUf8dsJsxQqH94xAjm6N7vjnrPOxIKw5cY+Kixr90eRUpjXomempUw9Da6ikyC3Dcc8hUS4pzHM5g+5oIX4hKwN9Py2HGSsCqfLGtNzTCnygLrw41Vo4HfZ8atenfbDtE8V33iiW2TYQj4dFT+a7whnVss0UEipdE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738151693; c=relaxed/simple; bh=Y0+EdNoTZ7KKt/QbFsaS3wPTCnj2OeBbOL4l6HpHEm0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oyNrneyp49jUyU2rtcTsDXW/3FZg3i6SzPOJi1vF4FJJXuQU0r3ETRzE4z6yHwHL+1g/RIFS/mHx78b8zN61aQPaO2qc0qJFljaaQ3qmpYhcNtbKZGOgZqgMKgEiUdSb9xCMD042V4Baw+uXxxDjxA1ftd9Z+xu0ExXV8Qjt5sM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XsN8R1by; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XsN8R1by" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738151691; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8FdUGdC2w/N5TeOcJ3tL/h1ogRWBFVvHy4t3b3m6isA=; b=XsN8R1bylwEDAh3+TloxaWwwdbAcysYbloaY9Y0+R4Qd0KQbC7gnnccErNiT2aPSX7+d3h RgATNATUgBPG7HbYoOpuJjlFKY8KwKTvCjK+Drf8FwF5/tKTyrS7zTFCA74XS6jdCCp2W+ DmOWL+KiLw/Y1y+GZnLsEyTtsUSyDrk= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-290-ybOxRQowP5SBwQ2QcdWytw-1; Wed, 29 Jan 2025 06:54:50 -0500 X-MC-Unique: ybOxRQowP5SBwQ2QcdWytw-1 X-Mimecast-MFC-AGG-ID: ybOxRQowP5SBwQ2QcdWytw Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-385ded5e92aso2765289f8f.3 for ; Wed, 29 Jan 2025 03:54:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738151689; x=1738756489; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8FdUGdC2w/N5TeOcJ3tL/h1ogRWBFVvHy4t3b3m6isA=; b=ippF6QHVdR6zFeG0GizBISEYqFpmIVehIN4qh7vNadFagP/sssekMOrHuf6B6x9tPl jBMR8hVAJ6sJbTNyyVlXjsoed6u44/2finpfJrrjF67TokPLuS15eOKOiJ/K4YaxnD/e /qnTC97J1zU0nkMt00EmfN06hiWBKrFl+0KX1ipdv+vKLV4FE14O6XJYU4uaeU9250bH eDg8bg57AqCxrnwHMLyZIlFaptfv8pgGYUxBv7INyZDhhUo5pktk1R123U2mB3ve1Tkp HweeNAKYERTfLn7Jq1oyQwx8dRT6k6b7lE3IK+4LvKpj90cljZCKO5NvNRutDWEptrIk Ofiw== X-Gm-Message-State: AOJu0YzJMnXfoUbu4dQUfGLmxWwxPd38VsQKz1GQDOMorsKzT5KrGVeL 0YcLQWKC6VIBx///f5c+g9WnmStIQD6/78NofKumdIa974uBfaq0bhlvdRg4jAlVB8UJqDM9FP4 bq9kdE4Z5YomW7utAri6dFkoTbDk2PIUDDTSHZQGzwan+55XwFP8I6Wd4NGIfUQq0p1gRdX9ayw BVm0DHciWnsXIMCRY3fujQ+JfL60OAr/uSvHVAYTTSLE/F X-Gm-Gg: ASbGnctuCDph5j/IIwtt28bSeqLhy0E66qq+2Vv1vaVoHZywY5VDHSD26754uS6YqXK NCk9v7cC1IqLS7V/tJsnc1qlPoJWf8db2C9eHDWufMvnCh/fqgv2HhhVDAKrKJQM3+yLfibfJxf dAUNedALNNthUcMgbuH8gBJ88zG2vAwT3RTR1hVgqMdQk7izXs3dVD2dNbBBvFyUjuwPkdPfoSJ zSHu0jNrRVTouTkJ1pJ1dQyZlbqPX658fwRRL+k6xFtrivr+yYBO2Ejd/iAXr9St2qtK5PYkyF1 GDhEzikfZ+bRDeMJMWjbvXHmZbzTcEx21Mi5CtLkELeh5L/icdJ0jv1n7oMtzyrrGQ== X-Received: by 2002:a5d:5384:0:b0:38a:8ed1:c5c7 with SMTP id ffacd0b85a97d-38c520bdb45mr1846677f8f.46.1738151688884; Wed, 29 Jan 2025 03:54:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IEQObdLJ4S/sxdMO/dY9vI6fd5p4fQOJgmwgX26yitgXbCNTkIYH7aPDZgCkfY36quLJ2o2xA== X-Received: by 2002:a5d:5384:0:b0:38a:8ed1:c5c7 with SMTP id ffacd0b85a97d-38c520bdb45mr1846636f8f.46.1738151688502; Wed, 29 Jan 2025 03:54:48 -0800 (PST) Received: from localhost (p200300cbc7053b0064b867195794bf13.dip0.t-ipconnect.de. [2003:cb:c705:3b00:64b8:6719:5794:bf13]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38c2a1c4006sm17020800f8f.94.2025.01.29.03.54.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jan 2025 03:54:47 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v1 12/12] mm/rmap: keep mapcount untouched for device-exclusive entries Date: Wed, 29 Jan 2025 12:54:10 +0100 Message-ID: <20250129115411.2077152-13-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250129115411.2077152-1-david@redhat.com> References: <20250129115411.2077152-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that conversion to device-exclusive does no longer perform an rmap walk and the main page_vma_mapped_walk() users were taught to properly handle nonswap entries, let's treat device-exclusive entries just as if they would be present, similar to how we handle device-private entries already. This fixes swapout/migration of folios with device-exclusive entries. Likely there are still some page_vma_mapped_walk() callers that are not fully prepared for these entries, and where we simply want to refuse !pte_present() entries. They have to be fixed independently; the ones in mm/rmap.c are prepared. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/memory.c | 17 +---------------- mm/rmap.c | 7 ------- 2 files changed, 1 insertion(+), 23 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index db38d6ae4e74..cd689cd8a7c8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -743,20 +743,6 @@ static void restore_exclusive_pte(struct vm_area_struc= t *vma, =20 VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio); - - /* - * No need to take a page reference as one was already - * created when the swap entry was made. - */ - if (folio_test_anon(folio)) - folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE); - else - /* - * Currently device exclusive access only supports anonymous - * memory so the entry shouldn't point to a filebacked page. - */ - WARN_ON_ONCE(1); - set_pte_at(vma->vm_mm, address, ptep, pte); =20 /* @@ -1628,8 +1614,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gath= er *tlb, */ WARN_ON_ONCE(!vma_is_anonymous(vma)); rss[mm_counter(folio)]--; - if (is_device_private_entry(entry)) - folio_remove_rmap_pte(folio, page, vma); + folio_remove_rmap_pte(folio, page, vma); folio_put(folio); } else if (!non_swap_entry(entry)) { /* Genuine swap entries, hence a private anon pages */ diff --git a/mm/rmap.c b/mm/rmap.c index 9e2002d97d6f..4acc9f6d743a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2495,13 +2495,6 @@ struct page *make_device_exclusive(struct mm_struct = *mm, unsigned long addr, /* The pte is writable, uffd-wp does not apply. */ set_pte_at(mm, addr, fw.ptep, swp_pte); =20 - /* - * TODO: The device-exclusive non-swap PTE holds a folio reference but - * does not count as a mapping (mapcount), which is wrong and must be - * fixed, otherwise RMAP walks don't behave as expected. - */ - folio_remove_rmap_pte(folio, page, vma); - folio_walk_end(&fw, vma); *foliop =3D folio; return page; --=20 2.48.1