From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDF83257420 for ; Mon, 10 Feb 2025 19:38:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216295; cv=none; b=GalnP4KU3SRpLnKN0LZl++xagvjs/k8umY4AlXXCBw033E3dgvuZx47KbK0eZen09JM6X2BnZs7LwHL7bdwA7jCErUDVcsqzCmzA0Y2nay/esgaa3PkIBi9PzKftaPu6kV8RqSztG1QUZ1TQOQaD5PwVXzKM6n2+OGBdzab5w7U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216295; c=relaxed/simple; bh=ODeZini8JnqXcIwx+tCC9TEASo3kB4fA5d8uBTr46Z8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LNlqcDi2h/z0j7YSZmlOcQDMMYPsEI5Q4FgvP8ztgpa03kzppP0LGfLGZvDz0kyik8V9tV9j5k8Lyne5GCsYR6UfT13HnSQA3dB9gJmmo100SssBjG1HIrnudrV3M98pe59nJeeWkjUpknsKfPR1fJoKGdSPdKQEF6DT3gm8k3w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=H4uRtLdm; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="H4uRtLdm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216291; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h7zrLQOHNY2WMB4NJ6M+YTGeviHiiyTUNMzrBj8byNk=; b=H4uRtLdmW1Qx/vAND7WqmIRmVsw8cZXf4ZLVsvR0Qu4v5YeyfnZSDj1mEc5A33TGvObCxp cjenRbbif3wv4vuPBpPdd6xKjduAO3iOG9UyaLe3f72Os1JISWHCfF2XFIMI1aDSVmGBVf Snb8lZ48Ri9vapgiFlDOuPMD6lesk14= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-258-p1dtZ4ZKPaKAzRycqGu0lw-1; Mon, 10 Feb 2025 14:38:10 -0500 X-MC-Unique: p1dtZ4ZKPaKAzRycqGu0lw-1 X-Mimecast-MFC-AGG-ID: p1dtZ4ZKPaKAzRycqGu0lw Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-43935e09897so15655125e9.1 for ; Mon, 10 Feb 2025 11:38:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216289; x=1739821089; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h7zrLQOHNY2WMB4NJ6M+YTGeviHiiyTUNMzrBj8byNk=; b=NyTDkWq4b4HD1Dd2NhYFSoiEAI8CiHPFS7ad1KFsINLQTHzI8sUSrsCt/F22nnopFw E0cWl05/nEoUF00MlxcmxJDupx6X4nCWedjWyvs5UfKc3+TFkeeZRwW954LYIc4LVsBY rGPaX6AbtXCi/9lrRyKBvP6cyMXPKRPMobIG5Hahev0XN2ajfdMJy4RGIbH3Qxfdx5wG rQLx+jYCuOcmo6K39QbJT5mZ0uI8vZ1476gipeMnW1XXfiZaQ+Bb31lCuSmOSUIvEQ38 KhjTNq+Ycc/dmgyYNPs6fshwMfjMh600GlrI6Eu1Ka7SWKdoVUlLzm1QC5nplYEzslGt phug== X-Gm-Message-State: AOJu0YyddvwFSXACkfcZ1x5/NQkY9CxVv0TA8QqWwXyhnGLPzKk/xsqp 0ZczdrgqRCsi8eYTThLbBF7JV7W9u2Zjc/M+dA6uQWO1VeMLmy4Ubp4bahbaPmqekRb+E4dP0Td 5NaKxPhk4AYAPWeSCQY0m4Tu28lgSNJ+yDiInP6gZOiWw4zXwF9G1iTPLCdBrlErP1Bf5RJp4Sq mO7dImD3eGX7OQrvgxMQtRj/eZkw5hvP2x8V/Gb0Z+/YOt X-Gm-Gg: ASbGnct7FvMWKn/41U8cPJFKPVhVGgSlgZpFC3G4LLqlQerjP1LZAV0OdmxFZ0I45Bs ffJ2l2zZzg/OPQcVg5No7HWGuaFyC808VGfUReNxOgP9TjVXXrCo+o7DGF705ax5eNPuNNjaxvc U8/eUzLI/VvxW/O/vdS+kX0o9BKTx9PDv9k6uhMOZJKb4zwrodmjq+eWse6Z2XguYaAITWdnHVd 9oBje6nzxkuhlH5BCeS6UU06CDR1s5Z7VsgEcTDo1ud8CTBlZ5SAGjq0ugsWdkhC2ACq3oLnJTg 6QlhCEW81iP5qF51GijG5L/ZP/YZ3yUZyZ793mp4ox+CFQc9x0DuBbkD95L+4XvdAg== X-Received: by 2002:a05:600c:3482:b0:439:42c6:f108 with SMTP id 5b1f17b1804b1-43942c6f621mr54495955e9.6.1739216289250; Mon, 10 Feb 2025 11:38:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IGBNHtYM4Npt0edDE/HafeE1RKAAghby0XrE6w/BdutPKy4T8is0nT67O6E9ZhRDeehOSIMcA== X-Received: by 2002:a05:600c:3482:b0:439:42c6:f108 with SMTP id 5b1f17b1804b1-43942c6f621mr54495475e9.6.1739216288775; Mon, 10 Feb 2025 11:38:08 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-439452533ecsm23523535e9.0.2025.02.10.11.38.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:07 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , John Hubbard , stable@vger.kernel.org Subject: [PATCH v2 01/17] mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs Date: Mon, 10 Feb 2025 20:37:43 +0100 Message-ID: <20250210193801.781278-2-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We only have two FOLL_SPLIT_PMD users. While uprobe refuses hugetlb early, make_device_exclusive_range() can end up getting called on hugetlb VMAs. Right now, this means that with a PMD-sized hugetlb page, we can end up calling split_huge_pmd(), because pmd_trans_huge() also succeeds with hugetlb PMDs. For example, using a modified hmm-test selftest one can trigger: [ 207.017134][T14945] ------------[ cut here ]------------ [ 207.018614][T14945] kernel BUG at mm/page_table_check.c:87! [ 207.019716][T14945] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NO= PTI [ 207.021072][T14945] CPU: 3 UID: 0 PID: ... [ 207.023036][T14945] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), = BIOS 1.16.3-2.fc40 04/01/2014 [ 207.024834][T14945] RIP: 0010:page_table_check_clear.part.0+0x488/0x510 [ 207.026128][T14945] Code: ... [ 207.029965][T14945] RSP: 0018:ffffc9000cb8f348 EFLAGS: 00010293 [ 207.031139][T14945] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: fff= fffff8249a0cd [ 207.032649][T14945] RDX: ffff88811e883c80 RSI: ffffffff8249a357 RDI: fff= f88811e883c80 [ 207.034183][T14945] RBP: ffff888105c0a050 R08: 0000000000000005 R09: 000= 0000000000000 [ 207.035688][T14945] R10: 00000000ffffffff R11: 0000000000000003 R12: 000= 0000000000001 [ 207.037203][T14945] R13: 0000000000000200 R14: 0000000000000001 R15: dff= ffc0000000000 [ 207.038711][T14945] FS: 00007f2783275740(0000) GS:ffff8881f4980000(0000= ) knlGS:0000000000000000 [ 207.040407][T14945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 207.041660][T14945] CR2: 00007f2782c00000 CR3: 0000000132356000 CR4: 000= 0000000750ef0 [ 207.043196][T14945] PKRU: 55555554 [ 207.043880][T14945] Call Trace: [ 207.044506][T14945] [ 207.045086][T14945] ? __die+0x51/0x92 [ 207.045864][T14945] ? die+0x29/0x50 [ 207.046596][T14945] ? do_trap+0x250/0x320 [ 207.047430][T14945] ? do_error_trap+0xe7/0x220 [ 207.048346][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.049535][T14945] ? handle_invalid_op+0x34/0x40 [ 207.050494][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.051681][T14945] ? exc_invalid_op+0x2e/0x50 [ 207.052589][T14945] ? asm_exc_invalid_op+0x1a/0x20 [ 207.053596][T14945] ? page_table_check_clear.part.0+0x1fd/0x510 [ 207.054790][T14945] ? page_table_check_clear.part.0+0x487/0x510 [ 207.055993][T14945] ? page_table_check_clear.part.0+0x488/0x510 [ 207.057195][T14945] ? page_table_check_clear.part.0+0x487/0x510 [ 207.058384][T14945] __page_table_check_pmd_clear+0x34b/0x5a0 [ 207.059524][T14945] ? __pfx___page_table_check_pmd_clear+0x10/0x10 [ 207.060775][T14945] ? __pfx___mutex_unlock_slowpath+0x10/0x10 [ 207.061940][T14945] ? __pfx___lock_acquire+0x10/0x10 [ 207.062967][T14945] pmdp_huge_clear_flush+0x279/0x360 [ 207.064024][T14945] split_huge_pmd_locked+0x82b/0x3750 ... Before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code"), we would have ignored the flag; instead, let's simply refuse the combination completely in check_vma_flags(): the caller is likely not prepared to handle any hugetlb folios. We'll teach make_device_exclusive_range() separately to ignore any hugetlb folios as a future-proof safety net. Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mas= k code") Reviewed-by: John Hubbard Reviewed-by: Alistair Popple Cc: Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/gup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index 3883b307780ea..61e751baf862c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1283,6 +1283,9 @@ static int check_vma_flags(struct vm_area_struct *vma= , unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; =20 + if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma)) + return -EOPNOTSUPP; + if (vma_is_secretmem(vma)) return -EFAULT; =20 --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47F5D25744C for ; Mon, 10 Feb 2025 19:38:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216297; cv=none; b=PznYNe5hcVUJsP0n867wLC+KozsoZa1C1K7xxMzzsjs7avLfnTc6vAw5HgzAvOKfaOLgvI3INq9wAKpDJNcFiFjgbemEPmuFVXoftggIFYsNbB5zk3L3yJOAlUrvfszoxGM9reP2OQ6V1zVfSZZs4ajkfVjEKdvhSAopbGZ0Ivk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216297; c=relaxed/simple; bh=Xaxds3jkrAWo/nPRt6XArpsBiXLgWRUTYm/DhS1e3Ec=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bi3DepM4YOuA4gLdJk13ibL4fSXnDZ4vhydgTxfNGIcCkR+aI1xK9uJaZfvtwpuPQHSgd7AeBW1nqgKnvPFPfV4VgQUm5NGxNiFOIL3PZIl3U8CytzxVI3zSwfp695Tc7hOLpJkhj9A6N0CllJMUBB2uOs+7XFocuGARGqkFwWo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZV2c84XJ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZV2c84XJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sqeXk3kyzFMqBsxEiyyOZegmX6HzmCNZGUJC3XQma+8=; b=ZV2c84XJJndG+Sf1PG8iSrc28YSOduXCWnVk1TZ3O4l/hmk95tjtQMpqZIFBI1NTcqMszz sxuMATmviL61+TItFf5uJOWPJJ0CG+6T7+ylebvrY/omNZ0LRP6V63jtk5/gMdfSRmO2HP 8I88fbi65WIJ4HGs+mxlqPh45TFjJ48= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-91-FbaO_5CNOUyqXDRmHatwRg-1; Mon, 10 Feb 2025 14:38:14 -0500 X-MC-Unique: FbaO_5CNOUyqXDRmHatwRg-1 X-Mimecast-MFC-AGG-ID: FbaO_5CNOUyqXDRmHatwRg Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-43941ad86d4so7444025e9.2 for ; Mon, 10 Feb 2025 11:38:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216293; x=1739821093; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sqeXk3kyzFMqBsxEiyyOZegmX6HzmCNZGUJC3XQma+8=; b=dv3pe3RYRlPCjv+qwbp8JRi3oKZiIhLh6QMhhJt7qjx2RFDWHNEt0upMVvGSfNPus8 jnW7DqcZQzNKJfqx0tQJDILf2V5jSjm//Wxed0Em/XCMBYn8x/qNHvhfQob+iHaFxLI2 CdLW93xi0fW8TYhxOY4rzgsJbldXupnSvJSY2DakNATn80BMX6UKCRsoSLkZ+Ax4RCCz /qYmkqD1/A3I1qn5357QDbqhc7Op8ZiJaLbgZ+7hDKbSjIFNuJFbQv3wBlLVsmVAZib3 pKcJyXa5qzelSJg3AoAZKY4whXZzx5yerkMNy9bolbP+vtL3/O9MxhLOAEHfQZTMEG5N z1rw== X-Gm-Message-State: AOJu0YxJmrWon5WW3gTK2XIHM9EbchRjj5cYtq3BlHNx8ffkxSysoeFw Zd2ftACTIo4x6V5sSUQ1yWBpsoCzJb7qhmHNNQgSLLND7kJY602HlAxXnrFAqBx3kg4kIC3nHYC DCl40W/1ClEDBQr4uIc8X3kSJY95+EXr7XcRT1aRZwIEsqbuVvdJvKfVIvNdTGPq6d+rIJ6dJYy DFCkqd++Ecmsx/jIT59TybSjVf5FkNKkz9xldpYo4podze X-Gm-Gg: ASbGncsjKkGqNw/MmWBCymF1JHBxagfJIrHTyAP9rgAVr2u+esy3nmlbs/VNT5DPO4z T+rMeN18FtFsdEXcrSeRM7s8vi/j6jGUfKYlbaI1Hf9baPTFcgvWSxw6Z00fLxO3PyjKipEfdwK JiC5SK44Jmk1rDDNhTYxB7CwfxnRNZUYuU8mVWy2g1lm9r83Gkxgedqu2Po3r9+a55I+0pPtrZk MHev6/jiOmmTS0AopQdeGOIzPhUJwNRbXQCXs6VcCGQRZoF0oSC15OHkLuCrf8a9QCyWjSFpXVj f1UOvn+IFFfatX4UXvxq51CRMPXkepubPUvfPCsEFvLqQ8m4XOTcoSJxpT76VIbpZw== X-Received: by 2002:a05:600c:34c4:b0:439:4b9e:461b with SMTP id 5b1f17b1804b1-4394b9e47dfmr21358325e9.14.1739216292974; Mon, 10 Feb 2025 11:38:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IFz4p8WiTgagZxwnDzZ7MmocZCF9a1hUwQMJIvxH3cH5x6X+HJSMDirq9gM71FmkI7KhAXcpA== X-Received: by 2002:a05:600c:34c4:b0:439:4b9e:461b with SMTP id 5b1f17b1804b1-4394b9e47dfmr21357855e9.14.1739216292555; Mon, 10 Feb 2025 11:38:12 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-43947bdc5c4sm26937995e9.23.2025.02.10.11.38.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:11 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , stable@vger.kernel.org Subject: [PATCH v2 02/17] mm/rmap: reject hugetlb folios in folio_make_device_exclusive() Date: Mon, 10 Feb 2025 20:37:44 +0100 Message-ID: <20250210193801.781278-3-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP, let's add a safety net in case FOLL_SPLIT_PMD usage would ever be reworked. In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have returned a page. In particular, hugetlb folios that are not PMD-sized would never have been prone to FOLL_SPLIT_PMD. hugetlb folios can be anonymous, and page_make_device_exclusive_one() is not really prepared for handling them at all. So let's spell that out. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Reviewed-by: Alistair Popple Cc: Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7e..17fbfa61f7efb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(struct folio = *folio, * Restrict to anonymous folios for now to avoid potential writeback * issues. */ - if (!folio_test_anon(folio)) + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) return false; =20 rmap_walk(folio, &rwc); --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B05412586FD for ; Mon, 10 Feb 2025 19:38:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216304; cv=none; b=DHNDf0N7KrDCGz4HZMv6XudUnVe73ODNqlIh3iI7kyevRFbuIVlihpyK034wAYN+zbIwaM6dzoqA2cQad1LkyujjrzaU/RbtAHslJo2anvEA6RbG5F5o4L5LniFWhULj68MRMByjvoGs78n7eFlGLNRH7jCZTMA4EMetAHa10Ug= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216304; c=relaxed/simple; bh=81YcqeJ5l6K3coBhnps2sV67ll19MCM8Tocl525qDoc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=adetJcB2RZQJASsNm2fBWW77FCskbWB27rlPr+joaOjFDaDac31epbNsBrSgPOYTDk4i11OQndK7SUUz71l6jaM8X2iPWMBxkqyoO4a1dGIQL7rzyXCEuu6jXjXRl4xKWMrWSkTkGP9NmKkmQYd6sQWWP6clitb4HJxDEqLPLfo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cmQZ6yYB; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cmQZ6yYB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216299; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rcUwV1yRdQas5qpumSgRDUB0x7445Uy9FJj+N197GyA=; b=cmQZ6yYBSUXdXDbl6NbX5AqxMQDU3CgeBVWG4wim0RIrKgLulckyOxTftDcIVvcLLmgAH7 amSpxAOhjhA3NfV2Ka6Hd8wA5QS9iz7ykfx/w8Df21dMejr4BF25TY//sEkH+UYOcaWgC3 s0akaaJAgLhykTjndXiTiv/Uf9ptTds= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-644-Pwt8GksSOZq0Hr9UkIOFWw-1; Mon, 10 Feb 2025 14:38:18 -0500 X-MC-Unique: Pwt8GksSOZq0Hr9UkIOFWw-1 X-Mimecast-MFC-AGG-ID: Pwt8GksSOZq0Hr9UkIOFWw Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4394c0a58e7so3345025e9.0 for ; Mon, 10 Feb 2025 11:38:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216297; x=1739821097; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rcUwV1yRdQas5qpumSgRDUB0x7445Uy9FJj+N197GyA=; b=og5RtDcv65paChWzXQnbd4al/TmL6h37kUAL++/CvL9tp0rKmITn6aToFI69/iLwLm KHBy/zoJAarF3lUyMxBYTS1gS4R2dvOB4qyEjrp9cq+K7/HpMfepwKGcsTbl51DQ5ZpO TDBqD6ciflZ8rc4ID517tMxtIVCNx7HW4NiZsvb74ovLE7HRZl6W1LJPiOXyp9wKSnt0 W22kLJKdl9q+5/E6hRmbm9VGWs1ITjpi95kKKvgWSE1A80a4QydNL5h9f0vwqHssND03 Ugzc4xRutR4/rUjiQ/Q053u1CnN8sUsFuj00tIrzv9CsM3/HkGHIjuiFwjK56QgyLeqD gcZw== X-Gm-Message-State: AOJu0Yy5JhH0Unc5bfm4mGZcPWN20g/5rc4IFJ/1tdurTezHhz6+n2JF t0l6yavq1+ljCjroIa3ns8TDdQD/4zV7MEDm99eURivMx93pv/6zn2J3tmdbouKhkF90C7VqyCh aCUeTc7dmdNt7JEL/gLjW8lkv01vOmGoKe1SFfYRRPHdkxPzE/zFszoPCxl1AUtzT4AxZZo1PE3 i8OX1e8WahvT4UEhHxM5D0PSlALg4aGlh/grsPlRZ0xXxr X-Gm-Gg: ASbGncvu/kXkL0bxaCUG+LfJaEV0+0zNZnPbc6MCPZr4ou8t+U6N5uujzlHD/hKH37e bHllcEr6OugWF38eaqhLDfFsgziAjLyk2AUr+LRtT07pD1Mn6bzlGhbR/wEx/Zhl20qDbgMqf0S QmhM0r1PwRBujwpwaYq3rMCk4JogOTm4tLv1goLGX9/0MFXKpUXnCeh4He7mdOs9eVpuikKxq8A Gxb5SprH9qTsvjDkZwaAXhiknzPEXJyOo3wtUGLFtsR9+3JBi3PdEbjFJJ2RyXbZk2QvCIGfify m/KoHoH/NxHjHTEcp+WKrzv6TXOjxPTVBUOcboiO37R0/NfgkTGz1M5BmB3hQCl5pg== X-Received: by 2002:a05:600c:34ce:b0:439:3254:4bf1 with SMTP id 5b1f17b1804b1-43932544f7cmr76706555e9.8.1739216297093; Mon, 10 Feb 2025 11:38:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IG+xVOX7QwrZYtM4UMSRPj8gjU3UfaLjkH/l+EqYjgLGDmsbA/lZxZrURGW8ZjfFBhlKrTOWA== X-Received: by 2002:a05:600c:34ce:b0:439:3254:4bf1 with SMTP id 5b1f17b1804b1-43932544f7cmr76706055e9.8.1739216296381; Mon, 10 Feb 2025 11:38:16 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390d94d802sm195253865e9.12.2025.02.10.11.38.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:15 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , Simona Vetter Subject: [PATCH v2 03/17] mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() Date: Mon, 10 Feb 2025 20:37:45 +0100 Message-ID: <20250210193801.781278-4-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The single "real" user in the tree of make_device_exclusive_range() always requests making only a single address exclusive. The current implementation is hard to fix for properly supporting anonymous THP / large folios and for avoiding messing with rmap walks in weird ways. So let's always process a single address/page and return folio + page to minimize page -> folio lookups. This is a preparation for further changes. Reject any non-anonymous or hugetlb folios early, directly after GUP. While at it, extend the documentation of make_device_exclusive() to clarify some things. Acked-by: Simona Vetter Reviewed-by: Alistair Popple Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- Documentation/mm/hmm.rst | 2 +- Documentation/translations/zh_CN/mm/hmm.rst | 2 +- drivers/gpu/drm/nouveau/nouveau_svm.c | 5 +- include/linux/mmu_notifier.h | 2 +- include/linux/rmap.h | 5 +- lib/test_hmm.c | 41 +++----- mm/rmap.c | 103 ++++++++++++-------- 7 files changed, 83 insertions(+), 77 deletions(-) diff --git a/Documentation/mm/hmm.rst b/Documentation/mm/hmm.rst index f6d53c37a2ca8..7d61b7a8b65b7 100644 --- a/Documentation/mm/hmm.rst +++ b/Documentation/mm/hmm.rst @@ -400,7 +400,7 @@ Exclusive access memory Some devices have features such as atomic PTE bits that can be used to imp= lement atomic access to system memory. To support atomic operations to a shared v= irtual memory page such a device needs access to that page which is exclusive of = any -userspace access from the CPU. The ``make_device_exclusive_range()`` funct= ion +userspace access from the CPU. The ``make_device_exclusive()`` function can be used to make a memory range inaccessible from userspace. =20 This replaces all mappings for pages in the given range with special swap diff --git a/Documentation/translations/zh_CN/mm/hmm.rst b/Documentation/tr= anslations/zh_CN/mm/hmm.rst index 0669f947d0bc9..22c210f4e94f3 100644 --- a/Documentation/translations/zh_CN/mm/hmm.rst +++ b/Documentation/translations/zh_CN/mm/hmm.rst @@ -326,7 +326,7 @@ devm_memunmap_pages() =E5=92=8C devm_release_mem_region= () =E5=BD=93=E8=B5=84=E6=BA=90=E5=8F=AF=E4=BB=A5=E7=BB=91=E5=AE=9A=E5=88=B0= ``s =20 =E4=B8=80=E4=BA=9B=E8=AE=BE=E5=A4=87=E5=85=B7=E6=9C=89=E8=AF=B8=E5=A6=82= =E5=8E=9F=E5=AD=90PTE=E4=BD=8D=E7=9A=84=E5=8A=9F=E8=83=BD=EF=BC=8C=E5=8F=AF= =E4=BB=A5=E7=94=A8=E6=9D=A5=E5=AE=9E=E7=8E=B0=E5=AF=B9=E7=B3=BB=E7=BB=9F=E5= =86=85=E5=AD=98=E7=9A=84=E5=8E=9F=E5=AD=90=E8=AE=BF=E9=97=AE=E3=80=82=E4=B8= =BA=E4=BA=86=E6=94=AF=E6=8C=81=E5=AF=B9=E4=B8=80 =E4=B8=AA=E5=85=B1=E4=BA=AB=E7=9A=84=E8=99=9A=E6=8B=9F=E5=86=85=E5=AD=98= =E9=A1=B5=E7=9A=84=E5=8E=9F=E5=AD=90=E6=93=8D=E4=BD=9C=EF=BC=8C=E8=BF=99=E6= =A0=B7=E7=9A=84=E8=AE=BE=E5=A4=87=E9=9C=80=E8=A6=81=E5=AF=B9=E8=AF=A5=E9=A1= =B5=E7=9A=84=E8=AE=BF=E9=97=AE=E6=98=AF=E6=8E=92=E4=BB=96=E7=9A=84=EF=BC=8C= =E8=80=8C=E4=B8=8D=E6=98=AF=E6=9D=A5=E8=87=AACPU -=E7=9A=84=E4=BB=BB=E4=BD=95=E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF= =E9=97=AE=E3=80=82 ``make_device_exclusive_range()`` =E5=87=BD=E6=95=B0=E5= =8F=AF=E4=BB=A5=E7=94=A8=E6=9D=A5=E4=BD=BF=E4=B8=80 +=E7=9A=84=E4=BB=BB=E4=BD=95=E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF= =E9=97=AE=E3=80=82 ``make_device_exclusive()`` =E5=87=BD=E6=95=B0=E5=8F=AF= =E4=BB=A5=E7=94=A8=E6=9D=A5=E4=BD=BF=E4=B8=80 =E4=B8=AA=E5=86=85=E5=AD=98=E8=8C=83=E5=9B=B4=E4=B8=8D=E8=83=BD=E4=BB=8E= =E7=94=A8=E6=88=B7=E7=A9=BA=E9=97=B4=E8=AE=BF=E9=97=AE=E3=80=82 =20 =E8=BF=99=E5=B0=86=E7=94=A8=E7=89=B9=E6=AE=8A=E7=9A=84=E4=BA=A4=E6=8D=A2= =E6=9D=A1=E7=9B=AE=E6=9B=BF=E6=8D=A2=E7=BB=99=E5=AE=9A=E8=8C=83=E5=9B=B4=E5= =86=85=E7=9A=84=E6=89=80=E6=9C=89=E9=A1=B5=E7=9A=84=E6=98=A0=E5=B0=84=E3=80= =82=E4=BB=BB=E4=BD=95=E8=AF=95=E5=9B=BE=E8=AE=BF=E9=97=AE=E4=BA=A4=E6=8D=A2= =E6=9D=A1=E7=9B=AE=E7=9A=84=E8=A1=8C=E4=B8=BA=E9=83=BD=E4=BC=9A diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouvea= u/nouveau_svm.c index b4da82ddbb6b2..39e3740980bb7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -609,10 +609,9 @@ static int nouveau_atomic_range_fault(struct nouveau_s= vmm *svmm, =20 notifier_seq =3D mmu_interval_read_begin(¬ifier->notifier); mmap_read_lock(mm); - ret =3D make_device_exclusive_range(mm, start, start + PAGE_SIZE, - &page, drm->dev); + page =3D make_device_exclusive(mm, start, drm->dev, &folio); mmap_read_unlock(mm); - if (ret <=3D 0 || !page) { + if (IS_ERR(page)) { ret =3D -EINVAL; goto out; } diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index e2dd57ca368b0..d4e7146618262 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -46,7 +46,7 @@ struct mmu_interval_notifier; * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no * longer have exclusive access to the page. When sent during creation of = an * exclusive range the owner will be initialised to the value provided by = the - * caller of make_device_exclusive_range(), otherwise the owner will be NU= LL. + * caller of make_device_exclusive(), otherwise the owner will be NULL. */ enum mmu_notifier_event { MMU_NOTIFY_UNMAP =3D 0, diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 683a04088f3f2..86425d42c1a90 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -663,9 +663,8 @@ int folio_referenced(struct folio *, int is_locked, void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); =20 -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *arg); +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, + void *owner, struct folio **foliop); =20 /* Avoid racy checks */ #define PVMW_SYNC (1 << 0) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 056f2e411d7b4..e4afca8d18802 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -780,10 +780,8 @@ static int dmirror_exclusive(struct dmirror *dmirror, unsigned long start, end, addr; unsigned long size =3D cmd->npages << PAGE_SHIFT; struct mm_struct *mm =3D dmirror->notifier.mm; - struct page *pages[64]; struct dmirror_bounce bounce; - unsigned long next; - int ret; + int ret =3D 0; =20 start =3D cmd->addr; end =3D start + size; @@ -795,36 +793,27 @@ static int dmirror_exclusive(struct dmirror *dmirror, return -EINVAL; =20 mmap_read_lock(mm); - for (addr =3D start; addr < end; addr =3D next) { - unsigned long mapped =3D 0; - int i; - - next =3D min(end, addr + (ARRAY_SIZE(pages) << PAGE_SHIFT)); + for (addr =3D start; !ret && addr < end; addr +=3D PAGE_SIZE) { + struct folio *folio; + struct page *page; =20 - ret =3D make_device_exclusive_range(mm, addr, next, pages, NULL); - /* - * Do dmirror_atomic_map() iff all pages are marked for - * exclusive access to avoid accessing uninitialized - * fields of pages. - */ - if (ret =3D=3D (next - addr) >> PAGE_SHIFT) - mapped =3D dmirror_atomic_map(addr, next, pages, dmirror); - for (i =3D 0; i < ret; i++) { - if (pages[i]) { - unlock_page(pages[i]); - put_page(pages[i]); - } + page =3D make_device_exclusive(mm, addr, NULL, &folio); + if (IS_ERR(page)) { + ret =3D PTR_ERR(page); + break; } =20 - if (addr + (mapped << PAGE_SHIFT) < next) { - mmap_read_unlock(mm); - mmput(mm); - return -EBUSY; - } + ret =3D dmirror_atomic_map(addr, addr + PAGE_SIZE, &page, dmirror); + ret =3D ret =3D=3D 1 ? 0 : -EBUSY; + folio_unlock(folio); + folio_put(folio); } mmap_read_unlock(mm); mmput(mm); =20 + if (ret) + return ret; + /* Return the migrated data for verification. */ ret =3D dmirror_bounce_init(&bounce, start, size); if (ret) diff --git a/mm/rmap.c b/mm/rmap.c index 17fbfa61f7efb..7ccf850565d33 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2495,70 +2495,89 @@ static bool folio_make_device_exclusive(struct foli= o *folio, .arg =3D &args, }; =20 - /* - * Restrict to anonymous folios for now to avoid potential writeback - * issues. - */ - if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) - return false; - rmap_walk(folio, &rwc); =20 return args.valid && !folio_mapcount(folio); } =20 /** - * make_device_exclusive_range() - Mark a range for exclusive use by a dev= ice + * make_device_exclusive() - Mark a page for exclusive use by a device * @mm: mm_struct of associated target process - * @start: start of the region to mark for exclusive device access - * @end: end address of region - * @pages: returns the pages which were successfully marked for exclusive = access + * @addr: the virtual address to mark for exclusive device access * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering + * @foliop: folio pointer will be stored here on success. + * + * This function looks up the page mapped at the given address, grabs a + * folio reference, locks the folio and replaces the PTE with special + * device-exclusive PFN swap entry, preventing access through the process + * page tables. The function will return with the folio locked and referen= ced. * - * Returns: number of pages found in the range by GUP. A page is marked for - * exclusive access only if the page pointer is non-NULL. + * On fault, the device-exclusive entries are replaced with the original P= TE + * under folio lock, after calling MMU notifiers. * - * This function finds ptes mapping page(s) to the given address range, lo= cks - * them and replaces mappings with special swap entries preventing userspa= ce CPU - * access. On fault these entries are replaced with the original mapping a= fter - * calling MMU notifiers. + * Only anonymous non-hugetlb folios are supported and the VMA must have + * write permissions such that we can fault in the anonymous page writable + * in order to mark it exclusive. The caller must hold the mmap_lock in re= ad + * mode. * * A driver using this to program access from a device must use a mmu noti= fier * critical section to hold a device specific lock during programming. Once - * programming is complete it should drop the page lock and reference after + * programming is complete it should drop the folio lock and reference aft= er * which point CPU access to the page will revoke the exclusive access. + * + * Notes: + * #. This function always operates on individual PTEs mapping individual + * pages. PMD-sized THPs are first remapped to be mapped by PTEs befo= re + * the conversion happens on a single PTE corresponding to @addr. + * #. While concurrent access through the process page tables is prevent= ed, + * concurrent access through other page references (e.g., earlier GUP + * invocation) is not handled and not supported. + * #. device-exclusive entries are considered "clean" and "old" by core-= mm. + * Device drivers must update the folio state when informed by MMU + * notifiers. + * + * Returns: pointer to mapped page on success, otherwise a negative error. */ -int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, - unsigned long end, struct page **pages, - void *owner) +struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, + void *owner, struct folio **foliop) { - long npages =3D (end - start) >> PAGE_SHIFT; - long i; + struct folio *folio; + struct page *page; + long npages; + + mmap_assert_locked(mm); =20 - npages =3D get_user_pages_remote(mm, start, npages, + /* + * Fault in the page writable and try to lock it; note that if the + * address would already be marked for exclusive use by a device, + * the GUP call would undo that first by triggering a fault. + */ + npages =3D get_user_pages_remote(mm, addr, 1, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - pages, NULL); - if (npages < 0) - return npages; - - for (i =3D 0; i < npages; i++, start +=3D PAGE_SIZE) { - struct folio *folio =3D page_folio(pages[i]); - if (PageTail(pages[i]) || !folio_trylock(folio)) { - folio_put(folio); - pages[i] =3D NULL; - continue; - } + &page, NULL); + if (npages !=3D 1) + return ERR_PTR(npages); + folio =3D page_folio(page); =20 - if (!folio_make_device_exclusive(folio, mm, start, owner)) { - folio_unlock(folio); - folio_put(folio); - pages[i] =3D NULL; - } + if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { + folio_put(folio); + return ERR_PTR(-EOPNOTSUPP); + } + + if (!folio_trylock(folio)) { + folio_put(folio); + return ERR_PTR(-EBUSY); } =20 - return npages; + if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + folio_unlock(folio); + folio_put(folio); + return ERR_PTR(-EBUSY); + } + *foliop =3D folio; + return page; } -EXPORT_SYMBOL_GPL(make_device_exclusive_range); +EXPORT_SYMBOL_GPL(make_device_exclusive); #endif =20 void __put_anon_vma(struct anon_vma *anon_vma) --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CB4125A348 for ; Mon, 10 Feb 2025 19:38:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216325; cv=none; b=Jbq3YgzapSi/bOdL02qdxnx4tYSjtQ713e/FDTa8S2pQRitPcv8oesL4tajspHT2BVsTYAN7annBHh0QKUGjZ9PcKepL/MbDKcu3mUzMszlQgLzYCCJj6uDwg2H2Rngodq8456eTj5AZ9GcTQ8sLEzgnQIl809pDrwKZ6lePA3w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216325; c=relaxed/simple; bh=4e5JAQXYU1EozA5iJ8OMoEmE9HBCSMZ+9l2eMTqKdcE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=chj/PpEpn6b8tNN9ggZFxXyIRbRLyivwElkey21coei0+ygBjT9jgqTWrAmLhvXXgBHFX3TiKtq3qM+5iqrnUcXzsgFzfXkt5X6tR/cNTDsVflDeJV6tceAUQArxHKDQSwqm2DhBEsbL00zvOpwXCroXWykruw6QH+oA5ZwasI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BJZJxCmA; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BJZJxCmA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PTNkd4QUaFJn8BXhdNDgA0x2CIZMe/1M0YGUP/Aku4s=; b=BJZJxCmAXUEEgM46rzHO+qCcTMQ1yXW+Wz2QH/8O33CA0LPv4enIYrNqJRIjN40Mdr33J3 XGJpMfpoE+9cowAIvqmwmjm6CUlGZnf5aFtAfa6BUU0k0E2O9oNVG2L/wHNjyk9u1jP5x0 li6bXEGBzm8REB6lVRShpX32CWlj1NQ= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-688-JMlDVQcgMLm2n1frp8Z0fQ-1; Mon, 10 Feb 2025 14:38:29 -0500 X-MC-Unique: JMlDVQcgMLm2n1frp8Z0fQ-1 X-Mimecast-MFC-AGG-ID: JMlDVQcgMLm2n1frp8Z0fQ Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4388eee7073so27030775e9.0 for ; Mon, 10 Feb 2025 11:38:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216300; x=1739821100; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PTNkd4QUaFJn8BXhdNDgA0x2CIZMe/1M0YGUP/Aku4s=; b=Pz/5rbcS+PAx5t0d3YL+AhtNFTgu8eS3q1bg3dY5HR9+vz9UyLRlJAaNbFFg0ogUtM FTHK8sHyYwK1YIPdB8IuAZGImxBqgiNGxotaO6NNx5GVsbsdJqRtFcR96Sk7/M36tnjN lWgMw4mBDFZLf1ogDgfo9gPk4pPSkrL88+XZib4i+3U3QIwrOY7pKFgdaeLOuC1zCdgG e6/Xxa6LlDGvBczw0i0CBW7eWnsBjGaWrf+tOU8lqrLMv5UHInZF0px8JuT73UxUEY9T 6L5C3zK94ekYEAxNogLQWGGzsa/9gnv5xbPf+nE1JI9SuOf0KchwUXtidrWk3qfnVcuW 0sUA== X-Gm-Message-State: AOJu0YzktGtr0w8MT2440DksNDENEzyl5rk9rOhPJX/BLI3i1DwBu20I 2mHPuhm4Q/sCyLOfG57NRmrnKP81nXXyjf6gBdR+ni32uxPgH0RNGlHHdtXPeFxXUcCF9kKjnKJ Z3sjS3QYy2hwjWVGrC8kQ0hneR+z073nxTuwrEH03qahiQW4tI2EUY6Cqx9QPgXT4XW3aepETtb pB63UAfdFQcCIXiqSPS0df4QzL0fcrpgA7mO93zQb53P7y X-Gm-Gg: ASbGncs/VjGTRl9xrwHeVApkX+fQyNP9ZzPfnkAnHwpZV1e6C8JnuU7Bl4xiXt9S1Fu 29cPxVfEMmLEfElQmcoghnmzYaRITOsBlu6Pq3QJeNhd7wf8ePu4id+gF7GpIqEUxqkRGPulVNE hVhyXdK5GaxsYxUub0QH1mWcM5or7qruV4RqqTDV9NUjeg0k+CJaiUplrsNRBBEUGEIHuLzH+kF b+J/Jj7qhPw2HhlIXZUTW233nHlQFKkYhLL45nDGDqH8ZNuVVaTnWMiDw1wtS0XuN5nCxJ9t3S7 47HJ53v62R3UhKlOsN3WWrBUcjBxReq3abBD06njBXSmRbWKtWb+cSndPRgVL6rInw== X-Received: by 2002:a05:600c:490f:b0:438:9280:61d5 with SMTP id 5b1f17b1804b1-4394ceb21e5mr4501945e9.5.1739216300008; Mon, 10 Feb 2025 11:38:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IFc0kj/uTsAeIklzzlg0/rE3LVYXx2C/9ALcZ7/TUEZYAL5lOHh0P4FpVZlZXUmD9AHXTDheA== X-Received: by 2002:a05:600c:490f:b0:438:9280:61d5 with SMTP id 5b1f17b1804b1-4394ceb21e5mr4501495e9.5.1739216299521; Mon, 10 Feb 2025 11:38:19 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dc2f6aeafsm11910943f8f.20.2025.02.10.11.38.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:19 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 04/17] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk Date: Mon, 10 Feb 2025 20:37:46 +0100 Message-ID: <20250210193801.781278-5-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We require a writable PTE and only support anonymous folio: we can only have exactly one PTE pointing at that page, which we can just lookup using a folio walk, avoiding the rmap walk and the anon VMA lock. So let's stop doing an rmap walk and perform a folio walk instead, so we can easily just modify a single PTE and avoid relying on rmap/mapcounts. We now effectively work on a single PTE instead of multiple PTEs of a large folio, allowing for conversion of individual PTEs from non-exclusive to device-exclusive -- note that the opposite direction always works on single PTEs: restore_exclusive_pte(). With this change, device-exclusive handling is fully compatible with THPs / large folios. We still require PMD-sized THPs to get PTE-mapped, and supporting PMD-mapped THP (without the PTE-remapping) is a different endeavour that might not be worth it at this point: it might even have negative side-effects [1]. This gets rid of the "folio_mapcount()" usage and let's us fix ordinary rmap walks (migration/swapout) next. Spell out that messing with the mapcount is wrong and must be fixed. [1] https://lkml.kernel.org/r/Z5tI-cOSyzdLjoe_@phenom.ffwll.local Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/rmap.c | 200 ++++++++++++++++++------------------------------------ 1 file changed, 67 insertions(+), 133 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7ccf850565d33..0cd2a2d3de00d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2375,131 +2375,6 @@ void try_to_migrate(struct folio *folio, enum ttu_f= lags flags) } =20 #ifdef CONFIG_DEVICE_PRIVATE -struct make_exclusive_args { - struct mm_struct *mm; - unsigned long address; - void *owner; - bool valid; -}; - -static bool page_make_device_exclusive_one(struct folio *folio, - struct vm_area_struct *vma, unsigned long address, void *priv) -{ - struct mm_struct *mm =3D vma->vm_mm; - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); - struct make_exclusive_args *args =3D priv; - pte_t pteval; - struct page *subpage; - bool ret =3D true; - struct mmu_notifier_range range; - swp_entry_t entry; - pte_t swp_pte; - pte_t ptent; - - mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, - vma->vm_mm, address, min(vma->vm_end, - address + folio_size(folio)), - args->owner); - mmu_notifier_invalidate_range_start(&range); - - while (page_vma_mapped_walk(&pvmw)) { - /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_FOLIO(!pvmw.pte, folio); - - ptent =3D ptep_get(pvmw.pte); - if (!pte_present(ptent)) { - ret =3D false; - page_vma_mapped_walk_done(&pvmw); - break; - } - - subpage =3D folio_page(folio, - pte_pfn(ptent) - folio_pfn(folio)); - address =3D pvmw.address; - - /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(ptent)); - pteval =3D ptep_clear_flush(vma, address, pvmw.pte); - - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - - /* - * Check that our target page is still mapped at the expected - * address. - */ - if (args->mm =3D=3D mm && args->address =3D=3D address && - pte_write(pteval)) - args->valid =3D true; - - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - if (pte_write(pteval)) - entry =3D make_writable_device_exclusive_entry( - page_to_pfn(subpage)); - else - entry =3D make_readable_device_exclusive_entry( - page_to_pfn(subpage)); - swp_pte =3D swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - - set_pte_at(mm, address, pvmw.pte, swp_pte); - - /* - * There is a reference on the page for the swap entry which has - * been removed, so shouldn't take another. - */ - folio_remove_rmap_pte(folio, subpage, vma); - } - - mmu_notifier_invalidate_range_end(&range); - - return ret; -} - -/** - * folio_make_device_exclusive - Mark the folio exclusively owned by a dev= ice. - * @folio: The folio to replace page table entries for. - * @mm: The mm_struct where the folio is expected to be mapped. - * @address: Address where the folio is expected to be mapped. - * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier callbacks - * - * Tries to remove all the page table entries which are mapping this - * folio and replace them with special device exclusive swap entries to - * grant a device exclusive access to the folio. - * - * Context: Caller must hold the folio lock. - * Return: false if the page is still mapped, or if it could not be unmapp= ed - * from the expected address. Otherwise returns true (success). - */ -static bool folio_make_device_exclusive(struct folio *folio, - struct mm_struct *mm, unsigned long address, void *owner) -{ - struct make_exclusive_args args =3D { - .mm =3D mm, - .address =3D address, - .owner =3D owner, - .valid =3D false, - }; - struct rmap_walk_control rwc =3D { - .rmap_one =3D page_make_device_exclusive_one, - .done =3D folio_not_mapped, - .anon_lock =3D folio_lock_anon_vma_read, - .arg =3D &args, - }; - - rmap_walk(folio, &rwc); - - return args.valid && !folio_mapcount(folio); -} - /** * make_device_exclusive() - Mark a page for exclusive use by a device * @mm: mm_struct of associated target process @@ -2541,22 +2416,31 @@ static bool folio_make_device_exclusive(struct foli= o *folio, struct page *make_device_exclusive(struct mm_struct *mm, unsigned long add= r, void *owner, struct folio **foliop) { - struct folio *folio; + struct mmu_notifier_range range; + struct folio *folio, *fw_folio; + struct vm_area_struct *vma; + struct folio_walk fw; struct page *page; - long npages; + swp_entry_t entry; + pte_t swp_pte; =20 mmap_assert_locked(mm); + addr =3D PAGE_ALIGN_DOWN(addr); =20 /* * Fault in the page writable and try to lock it; note that if the * address would already be marked for exclusive use by a device, * the GUP call would undo that first by triggering a fault. + * + * If any other device would already map this page exclusively, the + * fault will trigger a conversion to an ordinary + * (non-device-exclusive) PTE and issue a MMU_NOTIFY_EXCLUSIVE. */ - npages =3D get_user_pages_remote(mm, addr, 1, - FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, - &page, NULL); - if (npages !=3D 1) - return ERR_PTR(npages); + page =3D get_user_page_vma_remote(mm, addr, + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, + &vma); + if (IS_ERR(page)) + return page; folio =3D page_folio(page); =20 if (!folio_test_anon(folio) || folio_test_hugetlb(folio)) { @@ -2569,11 +2453,61 @@ struct page *make_device_exclusive(struct mm_struct= *mm, unsigned long addr, return ERR_PTR(-EBUSY); } =20 - if (!folio_make_device_exclusive(folio, mm, addr, owner)) { + /* + * Inform secondary MMUs that we are going to convert this PTE to + * device-exclusive, such that they unmap it now. Note that the + * caller must filter this event out to prevent livelocks. + */ + mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, + mm, addr, addr + PAGE_SIZE, owner); + mmu_notifier_invalidate_range_start(&range); + + /* + * Let's do a second walk and make sure we still find the same page + * mapped writable. Note that any page of an anonymous folio can + * only be mapped writable using exactly one PTE ("exclusive"), so + * there cannot be other mappings. + */ + fw_folio =3D folio_walk_start(&fw, vma, addr, 0); + if (fw_folio !=3D folio || fw.page !=3D page || + fw.level !=3D FW_LEVEL_PTE || !pte_write(fw.pte)) { + if (fw_folio) + folio_walk_end(&fw, vma); + mmu_notifier_invalidate_range_end(&range); folio_unlock(folio); folio_put(folio); return ERR_PTR(-EBUSY); } + + /* Nuke the page table entry so we get the uptodate dirty bit. */ + flush_cache_page(vma, addr, page_to_pfn(page)); + fw.pte =3D ptep_clear_flush(vma, addr, fw.ptep); + + /* Set the dirty flag on the folio now the PTE is gone. */ + if (pte_dirty(fw.pte)) + folio_mark_dirty(folio); + + /* + * Store the pfn of the page in a special device-exclusive PFN swap PTE. + * do_swap_page() will trigger the conversion back while holding the + * folio lock. + */ + entry =3D make_writable_device_exclusive_entry(page_to_pfn(page)); + swp_pte =3D swp_entry_to_pte(entry); + if (pte_soft_dirty(fw.pte)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + /* The pte is writable, uffd-wp does not apply. */ + set_pte_at(mm, addr, fw.ptep, swp_pte); + + /* + * TODO: The device-exclusive PFN swap PTE holds a folio reference but + * does not count as a mapping (mapcount), which is wrong and must be + * fixed, otherwise RMAP walks don't behave as expected. + */ + folio_remove_rmap_pte(folio, page, vma); + + folio_walk_end(&fw, vma); + mmu_notifier_invalidate_range_end(&range); *foliop =3D folio; return page; } --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2088825A2AA for ; Mon, 10 Feb 2025 19:38:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216308; cv=none; b=g966UmHne6E+DtLayRBsQYzITHEmOm5VIzcve4pjrDv3fA49ap9jNLUNSJ1UBEdxcFjLO0hbNTfs5+ZuGV8qEXa3W9DrQ+Q/sUrX4Qen/WTcnTo5ZZvCDHUBJVNEBtHJgtGlgZbfyWBzrx9at6d1ouUiFixL2y/ci1iZbCUtXhA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216308; c=relaxed/simple; bh=L16e+mH6ktPBqGV6aP9/w+AaZ+mPEhmgfdTbdQ7/2+A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=riKe3maIGzqkNqZ8ftSOIB1+E2LsL7glGtIQdfe2Y2JwwshahjDFW3GAtQ2AgZizAD/cJrUPjAiop1R4CUoiDDYOJoRwrU4PJT6Qv9G6PxKjZy8bs9DD1ZSbkfPQohJ6vVnfohr2svXqq2K/JBUyQrY7KECU145Qs4Mf4c9ywJc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KfS4/8jo; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KfS4/8jo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mARFTZKeX08fN9+zJg7ejGdzGLsE2rU+aFhmUgxRNAg=; b=KfS4/8jodMlKiJVwcVyxdPzC5rjodW+uMZr1E8j5emxB1DZQa0h028WB4EWfwhXvsVqwl7 XdxLCUvtYlmf8VpPiW6mmnp4VZIBl61gmUhchaV8OT8M/H68ydKZ6v5vyI3H0Q9pxqegqu zP7AZIx9CUxYsEcpL1X8EBneiTfF3oc= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-677-iq9lMjPPO3eqagFpnPQcTg-1; Mon, 10 Feb 2025 14:38:24 -0500 X-MC-Unique: iq9lMjPPO3eqagFpnPQcTg-1 X-Mimecast-MFC-AGG-ID: iq9lMjPPO3eqagFpnPQcTg Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-43626224274so28058565e9.0 for ; Mon, 10 Feb 2025 11:38:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216303; x=1739821103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mARFTZKeX08fN9+zJg7ejGdzGLsE2rU+aFhmUgxRNAg=; b=lDMpoFKcrIQdufljjAZGDLO29kKjXF+RTu6abuVMBmJeqpMRtSV/xlBJCptu7FguAW RR77jhoQpmrgzG+91BdOuGqiiWZnUC3UpMOAkahiBYxGebYw5ImxDLvw2feCYBQb956G qAQvOoQVmXmOhpGQucxvr8lEXQFTkyrzQ55BQHSF2aS2na4SwKYge3FvnEQZYjcANcuu 7285w/5hp35tDdFtuKi8hxXzewek9uw7+tYmdQVvZIXlP/O3RLWpZ1XDxHMx9XqirETh Yx0I/BJ/hDDneGB3GFNfYm3mNeObqXRWifQ/twRQ8SIQ2p0SYGaKKgn1nw2C1nU7K1De bq0g== X-Gm-Message-State: AOJu0Yy6ttJ3BoEI2HAFf5+PbVIJK7seUuSdVYn/Xg2pgLgMbjkgEYmG tEqk12XcPtE0UesuWEOmy803a+ptUPAU2J/sJuXRAT6Nng4iUrhUXTiU59yLBOPQggCVRlM6/jI nGZzy3ICddmW55P9kiS0hUxbbNApuLM8TNNdrGsg0r8Tt8ncMuqhFita1AqxzDAFKSLkUHrHfXH 2UnE24X93semaHdyQfZEYGPw9lpLGX0U6VfaIqknaHBWte X-Gm-Gg: ASbGncsePn9XfIWlKpapox1F9TLltMP/3yEIsxEdDcbOYl0jKdT6KwQmcVCUQRl3q9i U2WoiDVgoazhkuNw5I/1MRFxKPTTqkw1bnzhELNBA+QKk60PICu0T9FS4PgUYxOotqDOeu2KufX gVBORFhG83i3u06STJqGR4jKj3De1L4w2ewAS7YWtzesHssfUsI1g2vKhDzHFIECiHIyO1Q0MTE f5cV62+zsrytmsOO1bs1tDKul6Dx1BBtkSRteGMVLGIq9kZVh9FSLr5VIva5fABYJMTUos8kDLs Aq7g+FBf6L5YIm+Yc9g2yk/cJoPpjMYxW2q0MVpIy2oNCy9BPDNBn+e/6lTVtvBwKQ== X-Received: by 2002:a05:6000:18a5:b0:38d:e33d:d0db with SMTP id ffacd0b85a97d-38de33dd2b2mr2312812f8f.14.1739216303604; Mon, 10 Feb 2025 11:38:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IH1Llhtf765OHh3Rdp8g77cDKhZyIe7rPLF3LamWfRFHXBXmPgwl/7uacqNHueSdZU9tH9+kA== X-Received: by 2002:a05:6000:18a5:b0:38d:e33d:d0db with SMTP id ffacd0b85a97d-38de33dd2b2mr2312758f8f.14.1739216303053; Mon, 10 Feb 2025 11:38:23 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dd3fc7ee5sm7734941f8f.39.2025.02.10.11.38.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:21 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 05/17] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable() Date: Mon, 10 Feb 2025 20:37:47 +0100 Message-ID: <20250210193801.781278-6-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Let's do it just like mprotect write-upgrade or during NUMA-hinting faults on PROT_NONE PTEs: detect if the PTE can be writable by using can_change_pte_writable(). Set the PTE only dirty if the folio is dirty: we might not necessarily have a write access, and setting the PTE writable doesn't require setting the PTE dirty. From a CPU perspective, these entries are clean. So only set the PTE dirty if the folios is dirty. With this change in place, there is no need to have separate readable and writable device-exclusive entry types, and we'll merge them next separately. Note that, during fork(), we first convert the device-exclusive entries back to ordinary PTEs, and we only ever allow conversion of writable PTEs to device-exclusive -- only mprotect can currently change them to readable-device-exclusive. Consequently, we always expect PageAnonExclusive(page)=3D=3Dtrue and can_change_pte_writable()=3D=3Dtrue, unless we are dealing with soft-dirty tracking or uffd-wp. But reusing can_change_pte_writable() for now is cleaner. Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/memory.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 539c0f7c6d545..ba33ba3b7ea17 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -723,18 +723,21 @@ static void restore_exclusive_pte(struct vm_area_stru= ct *vma, struct folio *folio =3D page_folio(page); pte_t orig_pte; pte_t pte; - swp_entry_t entry; =20 orig_pte =3D ptep_get(ptep); pte =3D pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); if (pte_swp_soft_dirty(orig_pte)) pte =3D pte_mksoft_dirty(pte); =20 - entry =3D pte_to_swp_entry(orig_pte); if (pte_swp_uffd_wp(orig_pte)) pte =3D pte_mkuffd_wp(pte); - else if (is_writable_device_exclusive_entry(entry)) - pte =3D maybe_mkwrite(pte_mkdirty(pte), vma); + + if ((vma->vm_flags & VM_WRITE) && + can_change_pte_writable(vma, address, pte)) { + if (folio_test_dirty(folio)) + pte =3D pte_mkdirty(pte); + pte =3D pte_mkwrite(pte, vma); + } =20 VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio); --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C305825A2C8 for ; Mon, 10 Feb 2025 19:38:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216312; cv=none; b=dOHc/wnb8lRIWnTzFDsp+S0JhVzKDpvvDNz2cig8PoOt0Bfb7eW5iueQviHa/Fv6tWG+sRRPA04Xj1Hn+sdVPwfeUDUhkQAfP1J/XFTNybB5eDZ8PMobeWH12ty+uQ7ULxnG5TLcY8eHU6wa1KDI7/8TDY2+DUim+PSc8y1Se3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216312; c=relaxed/simple; bh=aAO5U4rIUSmSiW9HiZeU+4RaV76aniUNOw4ep9QcM3I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oGLjqGTGYQNR0srkWh7Ea7qfSC1ujXSHqzQUtunrFN6aW+XnA818OjtwXELF/lFOXMwe1lR0B7s+3KweQwiLnezPAneaqO99Y68woACdD9awlPOEtAiBjoQBsTnSRwGPtYTAevTbdqirzal2KjngHux3Jaju8E7pFgMtiz1d3ac= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Z4pCYD/L; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Z4pCYD/L" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fLiFHA5Aiy4k7pcGr1QPi7NyoGcLfJYJaAk0JBe4xM8=; b=Z4pCYD/LwCPb50vrBw2sRqAvQEDq74nV1jWRw2qZj9G1tagXOjrfQK1EsGaCdOmPScubc3 mh/tOzSsCLj5j1aYBduCzS1Yj53FMwQe77eiqTGNNW3Vox634ORh+yJAkNatWHQuKq0PIS BNdV/fL16C1BQOX978XWyGFEYZOQURQ= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-458-DEA3OxeIPJSuz00Z91FmNA-1; Mon, 10 Feb 2025 14:38:28 -0500 X-MC-Unique: DEA3OxeIPJSuz00Z91FmNA-1 X-Mimecast-MFC-AGG-ID: DEA3OxeIPJSuz00Z91FmNA Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-4388eee7073so27031095e9.0 for ; Mon, 10 Feb 2025 11:38:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216307; x=1739821107; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fLiFHA5Aiy4k7pcGr1QPi7NyoGcLfJYJaAk0JBe4xM8=; b=QGeDp/Fg67gRO8gEfyGLa5nmJ7IfRij5xN14GQWH9An232+2mLgqmldr0pICBcOGzO drk73fXqy0Fg5PW4tCKGd9/3pjaYn527wHssmhIjxEwCdC33/7GTc5BWfSMtaBBKV1ZX EZ8jBhxrWYrVyhLRKxHnz4B33TwgT23AVE05VaPSRR0sjt5q3BQN71Z0ijqcVY+l94QF +ZaNy8WxNf0x4LgdMi2JogZTRLY/RNcMLmTMW2fjn/MgRwJoZTt0582awtee2s3Wl70H uHxHcSS7A/d6oZ5KBgQoT3ZthyGz0R+JPQo583Uz+KnHkWXEiGSkZlEb1+GcbUjQ7eE0 XJjw== X-Gm-Message-State: AOJu0Yxjjy+4DcK9ZOqSdOIFUWad7Ml+k8Ced1rCA2c0n3GH4+I1QYp2 epDNDHoywBzJgXdKeK+fDYRCLH9TBlIng8dQ9+sxc331yxIdACz5+aUy92F4AP64Seiuc+AXbbE aheqvztbeO+ZPW5jQhW37fheH73qBtCnMrpHY1EetvWZjBznS1xOK6BWJp3Dl0N6Cm6HXM6F42y Jc6m2+N3LFt5nWTthBJp1eWNXqLrVR4qzllstDZ4i0gAlw X-Gm-Gg: ASbGnctRssmiDG7e3cC7antfJqHlwgRckOhhsNPiEtOnMSLYrEp46Y5zhlFl7qLQVHA JZWW4zqEwt/cGzL+BiOgMnBW89rAWuY21q6Fo+EZabH+MaM4Xkd2YWHmlXTeFOCYmeliwOl6C1X cuJuKM8EKNUKCqdzwNU/b2QA0MfnWwtqaUWRMAUS6JFAIiCXeTxBsCNHhkIabkNaPgz9zKbsttA WGNKb5ISdAER/Hb3PNQbkw3JqOZlKEdJRu/tLUMJ30LcGhgSLFDtasIkT/v7Q7BppdQK9uMrs7G pAYYPsf98yqvHdB4WIn3ThAFjAYY77SrbygR8Pk8DQA2xra6VjpVKDP82mxLvs2zPw== X-Received: by 2002:a05:6000:2ab:b0:38d:e3fd:1a1c with SMTP id ffacd0b85a97d-38de43b38d8mr568355f8f.23.1739216307306; Mon, 10 Feb 2025 11:38:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IGsdgSvAB5qD8mlN6AGjEYvBbSJNfBUob/X5TfyCrhv/WAdQoby+DEB/v6QRZcPR+eQm0f2lQ== X-Received: by 2002:a05:6000:2ab:b0:38d:e3fd:1a1c with SMTP id ffacd0b85a97d-38de43b38d8mr568308f8f.23.1739216306759; Mon, 10 Feb 2025 11:38:26 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dc2f6aeafsm11911098f8f.20.2025.02.10.11.38.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:25 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe , Simona Vetter Subject: [PATCH v2 06/17] mm: use single SWP_DEVICE_EXCLUSIVE entry type Date: Mon, 10 Feb 2025 20:37:48 +0100 Message-ID: <20250210193801.781278-7-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is no need for the distinction anymore; let's merge the readable and writable device-exclusive entries into a single device-exclusive entry type. Acked-by: Simona Vetter Reviewed-by: Alistair Popple Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- include/linux/swap.h | 7 +++---- include/linux/swapops.h | 27 ++++----------------------- mm/mprotect.c | 8 -------- mm/page_table_check.c | 5 ++--- mm/rmap.c | 2 +- 5 files changed, 10 insertions(+), 39 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index b13b72645db33..26b1d8cc5b0e7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -74,14 +74,13 @@ static inline int current_is_kswapd(void) * to a special SWP_DEVICE_{READ|WRITE} entry. * * When a page is mapped by the device for exclusive access we set the CPU= page - * table entries to special SWP_DEVICE_EXCLUSIVE_* entries. + * table entries to a special SWP_DEVICE_EXCLUSIVE entry. */ #ifdef CONFIG_DEVICE_PRIVATE -#define SWP_DEVICE_NUM 4 +#define SWP_DEVICE_NUM 3 #define SWP_DEVICE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM) #define SWP_DEVICE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+= 1) -#define SWP_DEVICE_EXCLUSIVE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIG= RATION_NUM+2) -#define SWP_DEVICE_EXCLUSIVE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGR= ATION_NUM+3) +#define SWP_DEVICE_EXCLUSIVE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION= _NUM+2) #else #define SWP_DEVICE_NUM 0 #endif diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 96f26e29fefed..64ea151a7ae39 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -186,26 +186,16 @@ static inline bool is_writable_device_private_entry(s= wp_entry_t entry) return unlikely(swp_type(entry) =3D=3D SWP_DEVICE_WRITE); } =20 -static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t off= set) +static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { - return swp_entry(SWP_DEVICE_EXCLUSIVE_READ, offset); -} - -static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t off= set) -{ - return swp_entry(SWP_DEVICE_EXCLUSIVE_WRITE, offset); + return swp_entry(SWP_DEVICE_EXCLUSIVE, offset); } =20 static inline bool is_device_exclusive_entry(swp_entry_t entry) { - return swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE_READ || - swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE_WRITE; + return swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE; } =20 -static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) -{ - return unlikely(swp_type(entry) =3D=3D SWP_DEVICE_EXCLUSIVE_WRITE); -} #else /* CONFIG_DEVICE_PRIVATE */ static inline swp_entry_t make_readable_device_private_entry(pgoff_t offse= t) { @@ -227,12 +217,7 @@ static inline bool is_writable_device_private_entry(sw= p_entry_t entry) return false; } =20 -static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t off= set) -{ - return swp_entry(0, 0); -} - -static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t off= set) +static inline swp_entry_t make_device_exclusive_entry(pgoff_t offset) { return swp_entry(0, 0); } @@ -242,10 +227,6 @@ static inline bool is_device_exclusive_entry(swp_entry= _t entry) return false; } =20 -static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) -{ - return false; -} #endif /* CONFIG_DEVICE_PRIVATE */ =20 #ifdef CONFIG_MIGRATION diff --git a/mm/mprotect.c b/mm/mprotect.c index 516b1d847e2cd..9cb6ab7c40480 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -225,14 +225,6 @@ static long change_pte_range(struct mmu_gather *tlb, newpte =3D swp_entry_to_pte(entry); if (pte_swp_uffd_wp(oldpte)) newpte =3D pte_swp_mkuffd_wp(newpte); - } else if (is_writable_device_exclusive_entry(entry)) { - entry =3D make_readable_device_exclusive_entry( - swp_offset(entry)); - newpte =3D swp_entry_to_pte(entry); - if (pte_swp_soft_dirty(oldpte)) - newpte =3D pte_swp_mksoft_dirty(newpte); - if (pte_swp_uffd_wp(oldpte)) - newpte =3D pte_swp_mkuffd_wp(newpte); } else if (is_pte_marker_entry(entry)) { /* * Ignore error swap entries unconditionally, diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 509c6ef8de400..c2b3600429a0c 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -196,9 +196,8 @@ EXPORT_SYMBOL(__page_table_check_pud_clear); /* Whether the swap entry cached writable information */ static inline bool swap_cached_writable(swp_entry_t entry) { - return is_writable_device_exclusive_entry(entry) || - is_writable_device_private_entry(entry) || - is_writable_migration_entry(entry); + return is_writable_device_private_entry(entry) || + is_writable_migration_entry(entry); } =20 static inline void page_table_check_pte_flags(pte_t pte) diff --git a/mm/rmap.c b/mm/rmap.c index 0cd2a2d3de00d..1129ed132af94 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2492,7 +2492,7 @@ struct page *make_device_exclusive(struct mm_struct *= mm, unsigned long addr, * do_swap_page() will trigger the conversion back while holding the * folio lock. */ - entry =3D make_writable_device_exclusive_entry(page_to_pfn(page)); + entry =3D make_device_exclusive_entry(page_to_pfn(page)); swp_pte =3D swp_entry_to_pte(entry); if (pte_soft_dirty(fw.pte)) swp_pte =3D pte_swp_mksoft_dirty(swp_pte); --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A66725A326 for ; Mon, 10 Feb 2025 19:38:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216316; cv=none; b=qBUBK3RrFxZ/4WC6aQEAamnscmAO8CeKchW6psZSssgbvFreQ99CZyb7qr5Om08aW4rSgRKL3KVelFH9b/iu3GRBDm/3jps+VasKpi2meAT8px1N36vSu9F01xPp62JSXqP0hyT2K4oki9H1BUGZ2cuAT4g/7EXO8EDkz9L7tmc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216316; c=relaxed/simple; bh=WTAKrIObiH0+XnAmevo4XgmL855z4Fu5Qvoz089fAGM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d5GL2HSdvylFHUu7BoczIoKbbctHc3hZhC3BuDuNaMFDG50WskpRiCKWBtgNifz6WWtvs2+WH24vfnjG3C5iBM3S+lNhVs10DJK944lUztUWpksu9sFrjonAoJlMPTeNN76+H3joy1cXjc5WMMlNoaZNtrFrWIvcGeOXuoqXd48= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fuR/tZua; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fuR/tZua" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216314; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NPITbiV5yRtNZIanqm2v+SKlb5hdEoVu/zyhpD+AgeU=; b=fuR/tZuabeAN3RHCueQYmMmKXGcP+aIsicl1j4W0aOTKGy9nuOtaXTjpOJWY9SIHPqDlLx 2HJuS4Ydqm7cRqlYbQoy6VQxslXHDmx2ahdbN5EyWTKP7dSlayTL4TaN7pvglPRGBevX8t NAMIoKbNC+g8PZkEc2sn4FtDYkhbVFc= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-665-jmeZZmuKNzSGpvmLdn_7ZQ-1; Mon, 10 Feb 2025 14:38:32 -0500 X-MC-Unique: jmeZZmuKNzSGpvmLdn_7ZQ-1 X-Mimecast-MFC-AGG-ID: jmeZZmuKNzSGpvmLdn_7ZQ Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-43933b8d9b1so12560335e9.3 for ; Mon, 10 Feb 2025 11:38:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216311; x=1739821111; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NPITbiV5yRtNZIanqm2v+SKlb5hdEoVu/zyhpD+AgeU=; b=A88EukFMn0YHfOhEuHqZfB8qnLuQopeLh8WzFZMvoF3TAOR0FrlAHQ2TnyjIOGqp8X AKOcEhr/yUYLG7lintJ6AggPOAjc3T7iOMz43P7HQHl1xyRcjxKUGUhVcjDvL6Ek/Quo gQtM6iM8ZydJN/RHMnESMC3q2/jApWIOMzVVIkL35+94CP1vBctvvOXsFdNrSySMDxy/ mIydu5fn/lNcdlyOu9MGALOE/QJm+BD+OUbU2dKNGi5v1kNTguMR8BWpUGEfMGGj2oP0 Fyv/zqvoCOXmmKLhBmqFDef8iy/mKqy3EEB97xJCO0DMkTXKmG4taNh0S+PiyzkMlPCc 2CrQ== X-Gm-Message-State: AOJu0YwvNOF4yNQDtOyd3SA2X51xj2AXUkc4gHUYhD1rlCWGlPgiR50Q u4Jstn8/OuCWW/Q/fh2+jDoKo0Vw1iMTfNFU9dpKsOPS3cLecwIKoyaxHNPbBZG/GS8ePzPv+Ek TWDncy0iAMvPE1sazfdaJg6A6iOdF4yTb9xBxx2POOawg97Gp7tiIzzh5Ejal7fkgiDmIcZn0UD aIw5iWd5bkz7iWJ0uFiHellExiiIx4r5rcB5s5xXVupgVM X-Gm-Gg: ASbGncsuwzD877uGE6z3k9SwLkKYOwUczt2a7gVEtJwHbUkB256kzPcHTx9RrziwQo0 Vm5B0ko0iWACIVYRjq3EosPPY6vgc2gG8MhSGFDXD3JkqSrNwRYRkMbj669CHF26yrOfvPszKmD eP2JYb08AkuoRF2HD3Z8BsRbNLx+M5rKqHc9HKsjvgrpNTQF2QWALqJA1bP+0MIbp7hnk0QUNtE NUElIXHdPB29POKjimJX+vLsqKklbnj85hYVdRnIc2Nf02OLuFuT8Q9HgxT3pSg7CB/e9FZbr6I CAchePmdOqsL9T+VlkUDVlcAf7pC5QQEslboordTqe6JAYdojxko7KALo+oMs6IVVQ== X-Received: by 2002:a05:600c:500d:b0:439:3dc0:29b6 with SMTP id 5b1f17b1804b1-4393dc02be1mr57220465e9.2.1739216311520; Mon, 10 Feb 2025 11:38:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IHbUQzL3Yq1p3VwDR+lxH773I/PQZNBDcgTpUgE2nEJHxULXhio/DNj6yNxLtRtBjWIFSwTIg== X-Received: by 2002:a05:600c:500d:b0:439:3dc0:29b6 with SMTP id 5b1f17b1804b1-4393dc02be1mr57220015e9.2.1739216311167; Mon, 10 Feb 2025 11:38:31 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390d933523sm192523445e9.1.2025.02.10.11.38.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:29 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 07/17] mm/page_vma_mapped: device-exclusive entries are not migration entries Date: Mon, 10 Feb 2025 20:37:49 +0100 Message-ID: <20250210193801.781278-8-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It's unclear why they would be considered migration entries; they are not. Likely we'll never really trigger that case in practice, because migration (including folio split) of a folio that has device-exclusive entries is never started, as we would detect "additional references": device-exclusive entries adjust the mapcount, but not the refcount. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Reviewed-by: Alistair Popple Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/page_vma_mapped.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 81839a9e74f16..32679be22d30c 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -111,8 +111,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return false; entry =3D pte_to_swp_entry(ptent); =20 - if (!is_migration_entry(entry) && - !is_device_exclusive_entry(entry)) + if (!is_migration_entry(entry)) return false; =20 pfn =3D swp_offset_pfn(entry); --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8024A25A34D for ; Mon, 10 Feb 2025 19:38:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216321; cv=none; b=PXQtRdfvDL6V8flmj06WZTC4oBUTZAeTKLyPIEiMh4qJHLCYy5o/zruy78eieYxyHj3VTZ/gLEJ/sTdw8dByyYpBWSMufFqyTHGD8likyyeO3G4fD0Mp8ghvuXpuWFuFsEe7syp4chFaMaRtbNz1MfgIUEb4hIRiihnK4ycc7MU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216321; c=relaxed/simple; bh=76d5pEBVKf4Yijczbwj7KuBibjsq0aAlESoqsTf2wbw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PLENwHrzThErYz5aNEiqVtM8Ip/CM/ASMLL5tMMLzuiOE9T7zbGevgzYMtkmMBgVRo9P5DHQ0T5CbH68duXl5+yvFytDkErRCx6wjSSbMhkhSrUYUThLOS9UzSJ/PVTmy9kpeV+I/sobGPmyoZD/bBcdwVcLgsjUnYKyEXRy3UA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=es0bOGsN; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="es0bOGsN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8sC+CKPivo9nJ2nwAVN+E4SAgEKJPEiCVQrD/mzwcqM=; b=es0bOGsN+emNWNv0iKtdB+/awn1NRoHB7YrfChy+rLlYA5DTzw/IxgRvietxiTkm8QeMQJ PAAYFe9CnfSBFi2IOFCsoMSOVm2xD9R1eH2egtajc9Z9ybOlUE+b/drXUagETlqCbzaihq 1k1rfy1IcAwFoqUrQWAAeKi5VLJGMC4= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-339-Xjc4f11yOXe_gRn-d1PJUA-1; Mon, 10 Feb 2025 14:38:37 -0500 X-MC-Unique: Xjc4f11yOXe_gRn-d1PJUA-1 X-Mimecast-MFC-AGG-ID: Xjc4f11yOXe_gRn-d1PJUA Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-4392fc6bceaso13133485e9.2 for ; Mon, 10 Feb 2025 11:38:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216316; x=1739821116; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8sC+CKPivo9nJ2nwAVN+E4SAgEKJPEiCVQrD/mzwcqM=; b=vxFtFf+OHNqNG1k2lYO0yHmCR8JfwuLtCMJ7MB8byh3DzmWMOXLoOO5T7CcbJhmSSD 50ZY850iYdIP3q21oBm5TV1j8zqJmGeU680ZuzdiwsyFlmCPjTAEiaX0nCj4EckyF4bm 1ERdxSNLGbFeUQYbOFQ1zl8QS1qZLG6LsxM7wig0KDUpPoeX/eHBOm5c2L1nJG4v0bcr jUBKhE6gkNIhmaVnZAK29XOwq65L0D2qWXQw/gaGlrkb5YQS4XMqlAzvPq3jDfX3tvrL zgyC50LgR6c3T1PXXTZnfpSk7KrqjPnruCTus1VutXgY+v2ppQbl9EDXIysh0/1UwkHG iDqw== X-Gm-Message-State: AOJu0YzPzphyIRaxPag6sdKuNyw+nKyJhdPEE4Gk/3PZ3ZlmfhhGRPOz EqEazLwgFZygGJ1ryMVlekk2HfASp4iCXWG7NwbdgO6i7ZkXrdwsjXJbVZRTrVm/gDIJd5erhpa WNCaWf/qOI0XQbcAWXgdV18nkPi2gcyf9zEJzDYnwIIOnHSQeYl1S6dN3CaGAuDVuuFY/Uj7bA5 qi5IIQLT31wwCVJdbMw6qNr6qUkNPstud3Ht6gX1SeFmQE X-Gm-Gg: ASbGnctPsfNrOxuQHUhUl4Mx8AE6cCPOK/PuItCheqjfctTEpvS6p/ElygWNMkUBwyM Da0TdL4Rp1lxEZgCOn2RQrGI3ufwGTy2Hb4jQVj+dNGxmKnBwbKliklf44zasWsqZhW049eTsxM ftkB3lC6ckcnhsrR0B8vnpjEqJwgm/H45/ed/4rCnkzRcARHolKJmQk/BbFRpZ0yjcwQRUxYc2l 0U8h6+vZ/dBar866OgbrYn3zEVa4ftvebgBIbKoitLWLiE8hw2dozk0KpN75qPJXRatPKBv//ST J7cpZ8uaCuhqrV/CjORXUtPtj54/pKfeC1lAB556aMS+NleMPGx/p9q4SS/nRQdKYQ== X-Received: by 2002:a05:600c:4e91:b0:439:4637:9d9 with SMTP id 5b1f17b1804b1-43946370d97mr43287695e9.12.1739216315807; Mon, 10 Feb 2025 11:38:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IF6B7q2dhfsb9dj8LY+8M1CJU6DFUQTZK7m2VERaPDmly8AB+qIiZRItOcPeyw5uy4LKosd2g== X-Received: by 2002:a05:600c:4e91:b0:439:4637:9d9 with SMTP id 5b1f17b1804b1-43946370d97mr43287075e9.12.1739216315147; Mon, 10 Feb 2025 11:38:35 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390d94d802sm195260345e9.12.2025.02.10.11.38.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:33 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 08/17] kernel/events/uprobes: handle device-exclusive entries correctly in __replace_page() Date: Mon, 10 Feb 2025 20:37:50 +0100 Message-ID: <20250210193801.781278-9-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). __replace_page() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, because GUP would never have returned such folios (conversion to device-private happens by page migration, not in-place conversion of the PTE). There is a race between GUP and us locking the folio to look it up using page_vma_mapped_walk(), so this is likely a fix (unless something else could prevent that race, but it doesn't look like). pte_pfn() on something that is not a present pte could give use garbage, and we'd wrongly mess up the mapcount because it was already adjusted by calling folio_remove_rmap_pte() when making the entry device-exclusive. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- kernel/events/uprobes.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 2ca797cbe465f..cd6105b100325 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -173,6 +173,7 @@ static int __replace_page(struct vm_area_struct *vma, u= nsigned long addr, DEFINE_FOLIO_VMA_WALK(pvmw, old_folio, vma, addr, 0); int err; struct mmu_notifier_range range; + pte_t pte; =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, addr + PAGE_SIZE); @@ -192,6 +193,16 @@ static int __replace_page(struct vm_area_struct *vma, = unsigned long addr, if (!page_vma_mapped_walk(&pvmw)) goto unlock; VM_BUG_ON_PAGE(addr !=3D pvmw.address, old_page); + pte =3D ptep_get(pvmw.pte); + + /* + * Handle PFN swap PTES, such as device-exclusive ones, that actually + * map pages: simply trigger GUP again to fix it up. + */ + if (unlikely(!pte_present(pte))) { + page_vma_mapped_walk_done(&pvmw); + goto unlock; + } =20 if (new_page) { folio_get(new_folio); @@ -206,7 +217,7 @@ static int __replace_page(struct vm_area_struct *vma, u= nsigned long addr, inc_mm_counter(mm, MM_ANONPAGES); } =20 - flush_cache_page(vma, addr, pte_pfn(ptep_get(pvmw.pte))); + flush_cache_page(vma, addr, pte_pfn(pte)); ptep_clear_flush(vma, addr, pvmw.pte); if (new_page) set_pte_at(mm, addr, pvmw.pte, --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EF4125A35B for ; Mon, 10 Feb 2025 19:38:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216324; cv=none; b=lmWxSb3jEALE8BEXvfvS5wzA0SthOKqYu77G585mahURwerXed6R4Z+hksAighA2gQVrfYoaES3UzJx8FmDBON8YPUxNlfwK0VzY1S67JtvNxKGJ2wLhZIWVLEekypQf0pBS9bFg5HNnwLLEaNA37TFjpb8YdQy56nirvr9Ou80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216324; c=relaxed/simple; bh=XECEsYRumwDXs38xfqw/DSW1eeDdoyEKYX/zmrQ3FPo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U3djR/ugXjov3RS6MQQFrvd7oQy6hVD3bEjzYAeLp9bnaaqXkf+y1p2xzfuzZS8LCrGZHJy0RDhiFdko6rLSPrG87Z0eNzSbQ+s1PCcsDgUE1/350NBygQHxF4hzo5NikfDU+6SkwWz0Zj+jyl3n3Qjd1KCZXCXFecYwGX0Gn5g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MIOy/ALU; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MIOy/ALU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216321; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b7jLfJ9vH0eUU1vYEXZ3dUW5UEIvFQqjIYltl+lFPPU=; b=MIOy/ALUKYGAj9ZOQIO+yvIS6yeCU1a3sY3yvsaIHW4s7LuhFwTMnVjgBSKJ+2T0wLcWzj 2q5XNuLqUyzpMH1vhsNNHzuShdC8jxni/+Ev5spAUgoFjlfAzp/s6xlQK2RrINS7mGYYgH T0KOmCxPpn2z7mu7n4BL4kfGmtzgM/4= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-471-mM7RNcD2OV6i7Z9OGGEWCg-1; Mon, 10 Feb 2025 14:38:39 -0500 X-MC-Unique: mM7RNcD2OV6i7Z9OGGEWCg-1 X-Mimecast-MFC-AGG-ID: mM7RNcD2OV6i7Z9OGGEWCg Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-4393e89e910so9545675e9.0 for ; Mon, 10 Feb 2025 11:38:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216319; x=1739821119; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b7jLfJ9vH0eUU1vYEXZ3dUW5UEIvFQqjIYltl+lFPPU=; b=lVHzzxnG2ThadftVfuBavAX+ANitA+nIfm6rwCmqXhBvXUWfsRLF/JFQgF4Z4g7O8E U2abLgjqmrbSy3m9cVBvdcaoNg06b8Rrdc6ABHgEFOmZB7GLXlORMQY3El7+FQ9qgHRe Ff3SpZivb51HParZNXbwtJEMVjsA71kEFuTax4H6GecZM3/h6PRyVuaJOicGDMvABMrH PSvFJqbbboOg071SCztLaxYnhj1zRe5OoGsE8jIF3vVerOGas3unxjHaDHoNHKWO30BB oK9Uteyeo4Ignx/VkJW6eGSpkohNzj2y5dv0Vg3O4hMnmsf5hsU3F5CagdxT9O3O6hbW j2JA== X-Gm-Message-State: AOJu0YwLzfb7d0xVMfPeGOs7RHvwSlGHMjpQlklclb7XmTJZp5s7j/k0 0Yr5F3HCxpgVsEcTAcdREt2jXM9BuiphP8UPGQ2rBK/AO7Ikyxj1SoZRj3Cznc3KtAo0FEST3lN VcDs0i02+tSqDIyI+g6J7xDLEeVy5W3mVJ3M4SFUUigQTDokEnDjvKHIVAzj9E129D6SO52DjkH 294Oic2wD7nEJxEIVceg3FaWKUrq9s9RwC6W7Jh+m9r3so X-Gm-Gg: ASbGncuOXemW/t6fuNCFw2FhZkIxePt59MmdxNArd6QK1h32fju1AGoMJo2VEzt7LT3 IdBkVd2zv+H2SXmxc0hw/pmnkZQ67KQ/FQ1WeNe4RAcbqmvTsQ2qj00p/FBBjaaaCzgYtkndj3P Bh0PODDIT8+PF7OWrVwAnYeuqNphCuC0FKM0S64DaKZKiNiX8IiSB3+tyKCFLCHl7QNdrpphO/r zjI3Sy1T1bDzj0mO4EeTpeSAsdjU9YMF8mXLDoEHJ93ODHEPAklA7X4qcQXI8Qgwxjg8qbs25M2 PDX6fBl3aq3IjS8lDFeJa9w/5CncCfW0cJa0M0qV5xsKFH5+SvCw0L2NaV6THUmSgw== X-Received: by 2002:a05:600c:4f90:b0:434:a7e7:a1ca with SMTP id 5b1f17b1804b1-439249b04f8mr116077695e9.20.1739216318675; Mon, 10 Feb 2025 11:38:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IHdlczuWpIJN55C/ZdLw7mis7jMK/FKd714PEZJ/2b6DQtULEf1bpuI2yab2McnQRseLHdLgg== X-Received: by 2002:a05:600c:4f90:b0:434:a7e7:a1ca with SMTP id 5b1f17b1804b1-439249b04f8mr116077285e9.20.1739216318299; Mon, 10 Feb 2025 11:38:38 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4391da96502sm158809495e9.1.2025.02.10.11.38.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:37 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 09/17] mm/ksm: handle device-exclusive entries correctly in write_protect_page() Date: Mon, 10 Feb 2025 20:37:51 +0100 Message-ID: <20250210193801.781278-10-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). write_protect_page() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, because GUP would never have returned such folios (conversion to device-private happens by page migration, not in-place conversion of the PTE). There is a race between performing the folio_walk (which fails on non-present PTEs) and locking the folio to look it up using page_vma_mapped_walk() again, so this is likely a fix (unless something else could prevent that race, but it doesn't look like). In the future it could be handled if ever required, for now just give up and ignore them like folio_walk would. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/ksm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/ksm.c b/mm/ksm.c index 8be2b144fefd6..8583fb91ef136 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1270,8 +1270,15 @@ static int write_protect_page(struct vm_area_struct = *vma, struct folio *folio, if (WARN_ONCE(!pvmw.pte, "Unexpected PMD mapping?")) goto out_unlock; =20 - anon_exclusive =3D PageAnonExclusive(&folio->page); entry =3D ptep_get(pvmw.pte); + /* + * Handle PFN swap PTEs, such as device-exclusive ones, that actually + * map pages: give up just like the next folio_walk would. + */ + if (unlikely(!pte_present(entry))) + goto out_unlock; + + anon_exclusive =3D PageAnonExclusive(&folio->page); if (pte_write(entry) || pte_dirty(entry) || anon_exclusive || mm_tlb_flush_pending(mm)) { swapped =3D folio_test_swapcache(folio); --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F98624C67A for ; Mon, 10 Feb 2025 19:38:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216328; cv=none; b=rL4Rvc9QbvMjI/oivRhWqtzcTphQNkzjZQ5p7Go8QeC03aoBvk5kopyv2OPorvccCYrKRpniPdbcYvHgOFlCk4DhowEi+vstwdw5xQsFjh8xTLoGPwPojrjdkaq6J4hr4bymONQXbwagn4+PvOQmBcUFCmwiHIOYG2K6EGZlkXc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216328; c=relaxed/simple; bh=wTEfnlnRBIWds4A4VWjgD0zaJLq1iJwNdx86V2VqPGE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ndf4nNnMS6+/yFwU4MxyPz9/nJNavedm3newkHGZktK+4ze5wXgi8kMSnstDWIOoPvi+gpfJAFmKHjMNweM8sznjSroeUXbpXPIPVzhhujhZbPulfHKR9mvEG+O0cgHm81fQhyxl927WuysavP6IcGiszfizVBqHkckWL8emL8M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EajNs30u; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EajNs30u" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216325; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LvU1uLGkpTPZRA0QaV3EggyJAUMnys9Fj6An1hzKeYY=; b=EajNs30uCuyuqdLBFHLccm8ccVuWBorm7HHt/kBITUWjF+A5v/dPNasFG/eX8AsyVcZqXT WDkWWSElUMwmZyrkn0I7vf4PBqflaXMxkBlH80PzqIRHkOuIF8KzsPNUl7wJw/c4JlOX1e uZvCPBVMoohpDHFh2IF7CrgQixPXlIw= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-169-F1wUWNSTPeCjKhzgE8iXZg-1; Mon, 10 Feb 2025 14:38:44 -0500 X-MC-Unique: F1wUWNSTPeCjKhzgE8iXZg-1 X-Mimecast-MFC-AGG-ID: F1wUWNSTPeCjKhzgE8iXZg Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4393e89e910so9546255e9.0 for ; Mon, 10 Feb 2025 11:38:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216322; x=1739821122; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LvU1uLGkpTPZRA0QaV3EggyJAUMnys9Fj6An1hzKeYY=; b=XfaZx1xeX1Jij5OxPd69geGmJEcfs68PV9HFNVkzRnYW1bCTBxlR/G6ug9MZJ1mZ4H ZuuMPN8im9Nbqz0D0z6tqed6RjxPEeOl78lK/Vmv3uWFQA5Pfbp7wfrOHTyx/f2Gffr5 Is0GA5h/R3XT/zLRWxJkPrjSi1ANsHsKyhGaLg7M8cN6JcTdC2NjYx5IX/bPnN9QNuo7 D9hco/ESNQDlga/iBfbbuUXJLqYFqMGIZjIW2Z50bD4nIgdY0MObUP/nCthW+OA5e5M0 jrdWD0K0OFJWifzNXIH9NBUeuiqbqnWLkKQE/5v06EUo+EQQH6jMmj9iMtOL717jUiHI P0jg== X-Gm-Message-State: AOJu0Yy+h8DIX/HJZhWULcS1tTfAFhuz5BK44snKd4dMXmXAFJoS+wr8 ETy1KATQSc0O4D2Kpm7Jl8+lqLU9bR6qSRdd0Sro0xKnwJ4ZZfPm6iYHCwJssTPzyRgWNsMPs2+ r3eMlQguXdQuMu2xM0z9qpSlkFJRn6sB4EoSCEl9Q+cnHbj1iMhI7Uzz5T4B1iqFLGJl+lEFnuL fPkLlNVSRFJi68Bafp1PsiEGnIk79krX31vrGnHuAbzoTu X-Gm-Gg: ASbGncv3jUFn4flOf/R231uk3u3iLkMKoG2rKEi2Eb1V6vdKLOc3oSWZ7Vw7j+XdQ7z AKYhwBER/nqoWHgdO63HYMr9/g3snSmMXG/UGWQr1rO+7hLU4Ib9zNuLPF2vnztelm9v82hKaCN BL85Rl9iXMRSnzFTq1DVYAbvi+TpUxlkFRWW+idGCMfl7kLWdFinTU+OTQaObRJnE/Prrzlqnf8 yvRCYrvpWPUt/0b4CJU/FYEtyTvVuDzRZohMZ9DO+0MlXy0HuJoZU71gg1ZSumUyog/OSptA8Yc O/qyK/eIfJwxJaLPy7H6hZeuPtEwKuQxI4PYRPNYdgWqdBPbjy82oPHXTtD0m3uHdA== X-Received: by 2002:a05:600c:1e0e:b0:431:5e3c:2ff0 with SMTP id 5b1f17b1804b1-439249889a8mr117196325e9.8.1739216322579; Mon, 10 Feb 2025 11:38:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IGjSKW3zWL6vzVGiNBIPirrt7XpGCC7ElTlo8fIz4HKSkBODD3ovt81FL+SvLTQ6DCkqg1UrQ== X-Received: by 2002:a05:600c:1e0e:b0:431:5e3c:2ff0 with SMTP id 5b1f17b1804b1-439249889a8mr117195855e9.8.1739216321998; Mon, 10 Feb 2025 11:38:41 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38ddaf333c5sm5084761f8f.36.2025.02.10.11.38.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:40 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 10/17] mm/rmap: handle device-exclusive entries correctly in try_to_unmap_one() Date: Mon, 10 Feb 2025 20:37:52 +0100 Message-ID: <20250210193801.781278-11-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_unmap_one() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as we expect ZONE_DEVICE pages so far only in migration code when it comes to the RMAP. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Further note that try_to_unmap() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for a device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/rmap.c | 52 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 39 insertions(+), 13 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 1129ed132af94..47142a656ae51 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1648,9 +1648,9 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, { struct mm_struct *mm =3D vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, ret =3D true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret =3D true; struct mmu_notifier_range range; enum ttu_flags flags =3D (enum ttu_flags)(long)arg; unsigned long pfn; @@ -1722,7 +1722,18 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); =20 - pfn =3D pte_pfn(ptep_get(pvmw.pte)); + /* + * Handle PFN swap PTEs, such as device-exclusive ones, that + * actually map pages. + */ + pteval =3D ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn =3D pte_pfn(pteval); + } else { + pfn =3D swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); + } + subpage =3D folio_page(folio, pfn - folio_pfn(folio)); address =3D pvmw.address; anon_exclusive =3D folio_test_anon(folio) && @@ -1778,7 +1789,9 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); } pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -1796,6 +1809,10 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, } else { pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + } else { + pte_clear(mm, address, pvmw.pte); } =20 /* @@ -1805,10 +1822,6 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, */ pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); =20 - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); - /* Update high watermark before we lower rss */ update_hiwater_rss(mm); =20 @@ -1822,8 +1835,8 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -1902,6 +1915,12 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, set_pte_at(mm, address, pvmw.pte, pteval); goto walk_abort; } + + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have PFN swap PTEs, + * so we'll not check/care. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { swap_free(entry); set_pte_at(mm, address, pvmw.pte, pteval); @@ -1926,10 +1945,17 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, swp_pte =3D swp_entry_to_pte(entry); if (anon_exclusive) swp_pte =3D pte_swp_mkexclusive(swp_pte); - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + if (pte_swp_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } set_pte_at(mm, address, pvmw.pte, swp_pte); } else { /* --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0841A24E4B6 for ; Mon, 10 Feb 2025 19:38:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216336; cv=none; b=J2CPawee6PZ4bCysMq1Q0Uc19fzxvCEFQZzKA8yY3re6yP198I6/rnLUxRNvzHEMS/P6DVcil7IdC9R/R0YXHaZY4kogx3eKY4Yxr0a+BW1zlNEk5in4jxzoHnGVPdDDIsURXbRvJDGe/jJDVCrvutbHmIbfwvL+Ycgf5vCCrVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216336; c=relaxed/simple; bh=up872fM6yJvkziP+4dI1MrGG1wgvrmLWB2nPuwT2cw8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PS3SoQz9sxCvoU4GtOqKIt4bs+yT68d05KHNXdZG5W2OpT0H3Iy28Irw0uMD0HoZywBxdIQizmScizcgxkXhJSISYBnf1P8J7ugufOKdodTiPeAc2hQMbMfF8/hYMkZzeq3aXlLhaKoTpQj+gbJleLXnrwl3DJTGF9+1aEhauDc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CIFXIGHy; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CIFXIGHy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216334; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QxHe6mgzr5LViG+7+kvBnycnKkPt1kZTw9hZ/c9dl+o=; b=CIFXIGHylK6ogc6F2MmNk/RpK2Py8u4IomRlUmgUp4N7AVLJF8UiAjL1dYlumXULGsMzET tvX4xAVD/T3GuRf8xPX5q4E2WeWC8OIF6uU2ASe4bB9PM6I9d4laCmbRfhuGENCT0Y6eMt J+tVWGth27WAUA3cQXlAyzQht3zr6fQ= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-118-osWTMBZYOlC95vi6FvpddA-1; Mon, 10 Feb 2025 14:38:47 -0500 X-MC-Unique: osWTMBZYOlC95vi6FvpddA-1 X-Mimecast-MFC-AGG-ID: osWTMBZYOlC95vi6FvpddA Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38de0201875so611325f8f.0 for ; Mon, 10 Feb 2025 11:38:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216326; x=1739821126; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QxHe6mgzr5LViG+7+kvBnycnKkPt1kZTw9hZ/c9dl+o=; b=Tv2Kz3Xan3NC/UUEoo0oZmwolqZCcotkAX6SE/YIG05RaAXPIDYIrFx1qduRjEfycO 0nqqdcvu/R0irGyf4d25B3PdzhKxeyayTSOsV23/oT75Ol7Ap4MPZxxxAPUUdGU3hCru wKmtyBPx2BqMrB+Uqf7TovhZNN4VWrEpwcmJHar8uCJXSteazsWPxjAXpbxPSO1SqEVu rwx7LPhR7DflkoUPBp/vS6QeV6h5z9w7pmQWqKGhmiB7cg0QlwH/MouGuNL+vvXkSUCQ lVdN/GH846jEVHiIkE0L1lQVY+p30cUlg70/byuiUSQGU8rmGcJpyru531nGp8qHTF/x JiRg== X-Gm-Message-State: AOJu0Yxt+uoTFjW0ghdZyCnGRCzs3/4eddAgH7zuIXlhxcZDjppdg9X8 cVzqkq+XpIpqUDp8b+gXxIN6XtvL36nJYrexGNs+QRZgS2iUkRsesINaOBzCsN9y5mmNSZjb/HQ Tk+BiczoZyjANXCrj2BIy9B254Xv7pm432nmQfKdf/foaoEMO+AtQo23RZKtmQuo9wT8djHgfJf whdil3FWOcJPIv892RuvXbyUzHGi9xbgN9/nOWFe/mmxk6 X-Gm-Gg: ASbGncu26XrnNds+swLQCuYB5NZesNhQxgkafHeORICpd/gwSZFZzJVWodpvadX1G6I zWGbaYYkaDph/JOGYtvqifigVYKvy6ZmIjvAj35sXO21oGwsBCZgZDob8w2I+sLKD6KyGlYgfE4 5oeXEwRzh9OZ4JNXwIqrFJjF9VQjqmwIPzA436omQ7vv1e5FcTXq8a9zi0XT87HmMzvp6ZQikU0 9AatFj8I03TMm1oraQ1WYvV2MBMq/8b+F3hSr6zn4/3/TEah6RlymTDVmOHlAd4Zu10tQXrR3eL QA0k1WvRmPe9YKd0ffIufZdDMJdPu5Og5EsOjRzpV30gSaAN5IeyGZY9qjvhvxhiHA== X-Received: by 2002:a05:6000:2a6:b0:38d:dc4d:3473 with SMTP id ffacd0b85a97d-38ddc4d34b0mr6018891f8f.51.1739216326555; Mon, 10 Feb 2025 11:38:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IEI9wcsziW5nBYTCF/f9iYx8hWmPInDi/J4NMO1Qty0GFsip5PSFKOaGK1UrWp2GzbjhG1/jA== X-Received: by 2002:a05:6000:2a6:b0:38d:dc4d:3473 with SMTP id ffacd0b85a97d-38ddc4d34b0mr6018834f8f.51.1739216325915; Mon, 10 Feb 2025 11:38:45 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dd295200asm7894656f8f.44.2025.02.10.11.38.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:44 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 11/17] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() Date: Mon, 10 Feb 2025 20:37:53 +0100 Message-ID: <20250210193801.781278-12-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_migrate_one() is not prepared for that, so teach it about these PFN swap PTEs. We already handle device-private entries by specializing on the folio, so we can reshuffle that code to make it work on the PFN swap PTEs instead. Get rid of the folio_is_device_private() handling. Note that we never currently expect device-private folios with HWPoison flag set at that point, so add a warning in case that ever changes and we can figure out what the right thing to do is. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Further note that try_to_migrate() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for a device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/rmap.c | 124 ++++++++++++++++++++++-------------------------------- 1 file changed, 51 insertions(+), 73 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 47142a656ae51..7c471c3ea64c4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2039,9 +2039,9 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, { struct mm_struct *mm =3D vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, writable, ret =3D true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret =3D true; struct mmu_notifier_range range; enum ttu_flags flags =3D (enum ttu_flags)(long)arg; unsigned long pfn; @@ -2108,24 +2108,19 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); =20 - pfn =3D pte_pfn(ptep_get(pvmw.pte)); - - if (folio_is_zone_device(folio)) { - /* - * Our PTE is a non-present device exclusive entry and - * calculating the subpage as for the common case would - * result in an invalid pointer. - * - * Since only PAGE_SIZE pages can currently be - * migrated, just set it to page. This will need to be - * changed when hugepage migrations to device private - * memory are supported. - */ - VM_BUG_ON_FOLIO(folio_nr_pages(folio) > 1, folio); - subpage =3D &folio->page; + /* + * Handle PFN swap PTEs, such as device-exclusive ones, that + * actually map pages. + */ + pteval =3D ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn =3D pte_pfn(pteval); } else { - subpage =3D folio_page(folio, pfn - folio_pfn(folio)); + pfn =3D swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); } + + subpage =3D folio_page(folio, pfn - folio_pfn(folio)); address =3D pvmw.address; anon_exclusive =3D folio_test_anon(folio) && PageAnonExclusive(subpage); @@ -2181,7 +2176,10 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, } /* Nuke the hugetlb page table entry */ pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable =3D pte_write(pteval); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -2199,54 +2197,23 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, } else { pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable =3D pte_write(pteval); + } else { + pte_clear(mm, address, pvmw.pte); + writable =3D is_writable_device_private_entry(pte_to_swp_entry(pteval)); } =20 - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); + VM_WARN_ON_FOLIO(writable && folio_test_anon(folio) && + !anon_exclusive, folio); =20 /* Update high watermark before we lower rss */ update_hiwater_rss(mm); =20 - if (folio_is_device_private(folio)) { - unsigned long pfn =3D folio_pfn(folio); - swp_entry_t entry; - pte_t swp_pte; - - if (anon_exclusive) - WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio, - subpage)); + if (PageHWPoison(subpage)) { + VM_WARN_ON_FOLIO(folio_is_device_private(folio), folio); =20 - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry =3D pte_to_swp_entry(pteval); - if (is_writable_device_private_entry(entry)) - entry =3D make_writable_migration_entry(pfn); - else if (anon_exclusive) - entry =3D make_readable_exclusive_migration_entry(pfn); - else - entry =3D make_readable_migration_entry(pfn); - swp_pte =3D swp_entry_to_pte(entry); - - /* - * pteval maps a zone device page and is therefore - * a swap pte. - */ - if (pte_swp_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); - trace_set_migration_pte(pvmw.address, pte_val(swp_pte), - folio_order(folio)); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ - } else if (PageHWPoison(subpage)) { pteval =3D swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); @@ -2256,8 +2223,8 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -2273,6 +2240,11 @@ static bool try_to_migrate_one(struct folio *folio, = struct vm_area_struct *vma, swp_entry_t entry; pte_t swp_pte; =20 + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have PFN swap PTEs, + * so we'll not check/care. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, @@ -2283,8 +2255,6 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } - VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) && - !anon_exclusive, subpage); =20 /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ if (folio_test_hugetlb(folio)) { @@ -2309,7 +2279,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, * pte. do_swap_page() will wait until the migration * pte is removed and then restart fault handling. */ - if (pte_write(pteval)) + if (writable) entry =3D make_writable_migration_entry( page_to_pfn(subpage)); else if (anon_exclusive) @@ -2318,15 +2288,23 @@ static bool try_to_migrate_one(struct folio *folio,= struct vm_area_struct *vma, else entry =3D make_readable_migration_entry( page_to_pfn(subpage)); - if (pte_young(pteval)) - entry =3D make_migration_entry_young(entry); - if (pte_dirty(pteval)) - entry =3D make_migration_entry_dirty(entry); - swp_pte =3D swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte =3D pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_young(pteval)) + entry =3D make_migration_entry_young(entry); + if (pte_dirty(pteval)) + entry =3D make_migration_entry_dirty(entry); + swp_pte =3D swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } else { + swp_pte =3D swp_entry_to_pte(entry); + if (pte_swp_soft_dirty(pteval)) + swp_pte =3D pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte =3D pte_swp_mkuffd_wp(swp_pte); + } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, hsz); --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9736D24E4BB for ; Mon, 10 Feb 2025 19:38:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216335; cv=none; b=nedWWT0EVXL0g8vHTzALeCyk8B93H9Kcj4ZkP/o8rCONIUTaMzxvJz+HKBHBam4gQ9ZWex/ixnlCbFO2fxNT12RTqB5ZHtgvuw9pubXFtE0ZKl/xgxDS3a9RIGg0deVISBZ04f4pAqvA1gj1c5gOn12I2w5LLHSWS4JPG99uvwI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216335; c=relaxed/simple; bh=cMJXMPJ9fLW/woS9dzf0yuP4P9GXVUKQkO8wkukVnbg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XP1sl8pYMmG1DLwQpz2u3JyHj9NsndR3woTN9FZ8JfdHLwNu5h6aYP+NWGLw8q0pLpDVhYwVV4OAlQDFaj4l2LdiQ4MgZHPXyxYUk8mBnMfuLkzIqZ/aepB+GTZstgQHbYGa/6jYJWmiL+sE8qOOBJi6tqOlVKHm2xSSy+suYW0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=IV3E+oh5; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IV3E+oh5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216332; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LTEWxPe8B8Wum8eXVDX5hUwjNBhNNdMWJNli9ItwISM=; b=IV3E+oh5JFt2EQeY8n1ETUHLKzoML4cSdzYheRXm99ljPTM3eaV2TS7O7rz2UGMOkIkLuC Jpn/fN7+qDXZEsnW/eiee3KnlJOefY05Ool/52g5M+Xw1pzUa5/nvd2kRQewP6nc3dkPKd i3efRgAdNc0hBSIbIjzm71lxD3K3yoc= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-688-hpo330YmM5qFutuTJ5YWgw-1; Mon, 10 Feb 2025 14:38:51 -0500 X-MC-Unique: hpo330YmM5qFutuTJ5YWgw-1 X-Mimecast-MFC-AGG-ID: hpo330YmM5qFutuTJ5YWgw Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-4393786618bso12899735e9.1 for ; Mon, 10 Feb 2025 11:38:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216330; x=1739821130; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LTEWxPe8B8Wum8eXVDX5hUwjNBhNNdMWJNli9ItwISM=; b=bGhFBwOkP9HBahocM6dlFeL8LlRYywfFW4DyCYRwa1YWYdgRbI60OL42M80CcQ4Bt+ 87KPu/3ooK1Rfxfspf7W40kW193U3cr6sdCh/83fCZ6Q3P8fagjkM4eEG9Hkcs/msko8 PBnNmDuQWO8tJMyxXJNSFv18VncBc9TRi8xOhdrvd29vA0zm0cSY82z+B1zbval5BN7Y MiErTpgkjTI4Jpi05Pe8grxvzYEagj9Jm4yZ/oahjRYyyLuzRZJjU6Hl0VT6lH2SorVZ HNqe3wXj6yoIIQ2rxmuO7KHnzxxSSlHTtfzi8OOLKpxYwIPV6gKEgjIkElT70RXjYHbb B0ZA== X-Gm-Message-State: AOJu0YxjunMTuytsgUyEUBCrenayete2qdV3+ob/R6EJjaL2O35lDBOa RfUvVVqb03C6RNSWpqMYxpydV9io5pKTLRvKVrDVYUvXuCn+tCeACv0HBHlUQEiFkQ1f/YCJncF rr7JyRGCPnfXRAu8Uq8/WSQzdLwbapiScCkskdPa0VDTtIjgm+BRCcf7mjEjIcQ2ppUJviBCwFC In42kOjYAxvUArQzaYMDA/5hlLTPiVw7ZKHSBUikJB2xw4 X-Gm-Gg: ASbGncsh/k7mpOqeBclChl716gzTq7y9n+T+RejpO+yRAwNqA16eKsBbGfIh2tszVV9 crq1WQRe3ttD5b5ei6MTAJxbF4mMP3W+b9BR+QOH4VHGhyy2Emqb0dtRck2J0p4RCSf+myPT8vM uuz8wndY4hAE63zYr/V2fQKdyEW/dETFNCNb6poenmLq1N2dohOxclfADQJ+ZGrZSTf1gvHMLhg irJH9wfFK8ucSKUjHkia5COeb0gSrhcaVdVUfOPyfGYE4hJ820RqhRisqvpzxEiiPAaJlX9uj9N OaS9E3PeZJyQbi5Cbf3geTBs+XjYl3qVZpF3RXaAIuMEq7WrNVn8+w7fd5QRwn2wHw== X-Received: by 2002:a05:600c:4f89:b0:439:4bb0:aba0 with SMTP id 5b1f17b1804b1-4394bb0adb6mr17902925e9.8.1739216330003; Mon, 10 Feb 2025 11:38:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IH0Un4mhmMUf8CyrAHOSxNb3M+hrifrlRxr2GMjHtCC/u6kqhVG2aMU260gZj/4uoW1cKICNg== X-Received: by 2002:a05:600c:4f89:b0:439:4bb0:aba0 with SMTP id 5b1f17b1804b1-4394bb0adb6mr17902495e9.8.1739216329619; Mon, 10 Feb 2025 11:38:49 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-43947bdc5c4sm26951255e9.23.2025.02.10.11.38.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:48 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 12/17] mm/rmap: handle device-exclusive entries correctly in page_vma_mkclean_one() Date: Mon, 10 Feb 2025 20:37:54 +0100 Message-ID: <20250210193801.781278-13-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). page_vma_mkclean_one() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as we expect ZONE_DEVICE pages so far only in migration code when it comes to the RMAP. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/rmap.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/rmap.c b/mm/rmap.c index 7c471c3ea64c4..7b737f0f68fb5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1044,6 +1044,14 @@ static int page_vma_mkclean_one(struct page_vma_mapp= ed_walk *pvmw) pte_t *pte =3D pvmw->pte; pte_t entry =3D ptep_get(pte); =20 + /* + * PFN swap PTEs, such as device-exclusive ones, that + * actually map pages are clean and not writable from a + * CPU perspective. The MMU notifier takes care of any + * device aspects. + */ + if (!pte_present(entry)) + continue; if (!pte_dirty(entry) && !pte_write(entry)) continue; =20 --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A517924E4DB for ; Mon, 10 Feb 2025 19:38:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216339; cv=none; b=PFUjwnfM5fTL6HYZrUX9foeMueLmFzuO2OxPoyxgJvkd6uZsLWkBrGFd9CYeGKJVz/vaCSvLtarr6SSFC0upTNG+41q0wuzQwWnz+VWYtjuimdNmsK7Edsj8ObHnVhjo4eWRWr/Ir8QENVoUGKaQ+P7BrSd2W8P8Vriy6QD7pcc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216339; c=relaxed/simple; bh=OGZL6ZsQM7FmXrtZCcJrdgAs8xmw1WSnlW2HAWblnM0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fvV1qHcK7v5BuRGunpfXvx8sp6Ck0IZcpWAgN5RLD6Ywn2Op337/Fgivf7pkcClRDq0KP0vcmVsCVvj2hXGrUoD3xW0eHLa/HD8L5+a8Gi4uJK8lMCK5Vr8dfEkOwAjpDqkxgI7oLSD2Eei99DhgVe5d7x6g+nBW0oVNZXscvsU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XdOo9gMS; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XdOo9gMS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mwHnciP+OvnJG9alUSG9/jEypyVF3ZPTXQLV9dVcfyg=; b=XdOo9gMSZSqnTHaKJ0tOLousfIs0hjXtmEzkgMEz2eSxUVfAzvpHxj4meMF2bBPtdpXstL hWX61uB23xzGHmHcdP/app0H72bz+KGiTWMlWwzroOcPFCfEjTyISGJcNgRyLAVS+3QSOw rndLTExp5bS29vrEFs24ZbrWSe2MNvc= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-304-YZpOHto2PNWn-ea9JQ-wAg-1; Mon, 10 Feb 2025 14:38:55 -0500 X-MC-Unique: YZpOHto2PNWn-ea9JQ-wAg-1 X-Mimecast-MFC-AGG-ID: YZpOHto2PNWn-ea9JQ-wAg Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-38dc6aad9f8so1835254f8f.1 for ; Mon, 10 Feb 2025 11:38:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216334; x=1739821134; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mwHnciP+OvnJG9alUSG9/jEypyVF3ZPTXQLV9dVcfyg=; b=Yy9l3XEpYXDwHy9F7V+149wQKFaSL2gqDSIojeAMPpk+jNIRSZ0J/1qrJ5YcvHkAN8 +fb6hNdkphpeBNn4j2olfSd67TzvsqVYTL1Pmqfhp7okKVxaTwHPh7Rt4LPFKuiajPKz IFIAClWPr99XBEZJeYr6aJskXTCJIPWUQ7lH3kMmICffIC4TatTsZkigbSRHl32IHQ6G SPM4zSu8oa6UC7LV2PIT8r4XZonVOeoVtl38Bn8/Cm0zi+et1vOjM2RP+AWQjZHrQd59 yoffV+EAhifbKa4vMKrllvfnNuIBYMkRrgPYWY4Q1FMzTiwxo9E49ssB1U7/s+sZo3h1 wQzw== X-Gm-Message-State: AOJu0YywfVWP+iSTUIDvVJOHixE+wkyMM8Y40lVZ2HyD+YNtde51jOIe 3Ah5/Vw9CrL7EWLCixWIcANDmreSftK6NhmsvOaW7DzI3f/LCicbll3m3+c8O3hoz3UzXypTIlq B7DlrKhJHRc+9W2ixPCsgN4SEojEvF2FAK89DiMrZ+h73N/gq9+HqaiBtLjo+RnNd7yvjWrzB+9 JoHehbywuDuWTNupn74cF1U5VHSv3HtjSE7oWOQezPC72s X-Gm-Gg: ASbGncvvnSUDjzrpkp7YnUBZskJLxCsgxp++DNYG7KYVZCzp2F+napg4tyRiCsWbZnH CXmDPLcaPSVwH0pn2HIVSD3fKYsHR+HZSsjvtrc6Mc3dUR+mKoIGlvpvr+bby6x5+r59G9l6E0j /VCk6rOjsOZObLZSryk2CSxKciWSnN+mh9Xs52l0si0JPvfJe9XAXO+Rew2KEahWWLkIimeZ6YJ WVePbiMbetQNg/sVvfCm41f0DSaIYe11t9K3nfQvWVFne9BgEpim3gBfgHGx8S74et5sO6t+1U/ 1NMvFYU2Yoc+i2vhFmuea4sXpOt0JbwArXeZ6UyJKVxrmSyfN8jIPRV2SAu8r9NdHA== X-Received: by 2002:a5d:5f42:0:b0:38d:df15:2770 with SMTP id ffacd0b85a97d-38de432d90fmr568628f8f.0.1739216333970; Mon, 10 Feb 2025 11:38:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEQ8jI3Pr0PhgIrX5XFi88nBpwoaMc1TLFqexlyUt/iDFsCN9k6Hzlrlovdfu0/+3mYproMTQ== X-Received: by 2002:a5d:5f42:0:b0:38d:df15:2770 with SMTP id ffacd0b85a97d-38de432d90fmr568579f8f.0.1739216333460; Mon, 10 Feb 2025 11:38:53 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-4390db11200sm187831345e9.38.2025.02.10.11.38.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:52 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 13/17] mm/page_idle: handle device-exclusive entries correctly in page_idle_clear_pte_refs_one() Date: Mon, 10 Feb 2025 20:37:55 +0100 Message-ID: <20250210193801.781278-14-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). page_idle_clear_pte_refs_one() is not prepared for that, so let's teach it what to do with these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as page_idle_get_folio() filters out non-lru folios. Should we just skip PFN swap PTEs completely? Possible, but it seems straight forward to just handle them correctly. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Reviewed-by: SeongJae Park Tested-by: Alistair Popple --- mm/page_idle.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/page_idle.c b/mm/page_idle.c index 947c7c7a37289..408aaf29a3ea6 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -62,9 +62,14 @@ static bool page_idle_clear_pte_refs_one(struct folio *f= olio, /* * For PTE-mapped THP, one sub page is referenced, * the whole THP is referenced. + * + * PFN swap PTEs, such as device-exclusive ones, that + * actually map pages are "old" from a CPU perspective. + * The MMU notifier takes care of any device aspects. */ - if (ptep_clear_young_notify(vma, addr, pvmw.pte)) - referenced =3D true; + if (likely(pte_present(ptep_get(pvmw.pte)))) + referenced |=3D ptep_test_and_clear_young(vma, addr, pvmw.pte); + referenced |=3D mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_= SIZE); } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (pmdp_clear_young_notify(vma, addr, pvmw.pmd)) referenced =3D true; --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B1B72505AF for ; Mon, 10 Feb 2025 19:39:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216343; cv=none; b=Q+o/O0lEF6LatX9UGVPSZA1BWjVMlzN4MVByJ24hT2xsL7wKPvL4NSul7nSWAAueYypnKCRtcoUzy9CXvxuPyFPesp5swsR6NtcwCB1Fzz0EJ9txp2e096hG/oG2fBUeLbNNCZGyr56E/FzYWJTcHnuw2Rur1TtHxpA3GZ9I6DY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216343; c=relaxed/simple; bh=xdb/Bd8nabay5U0SQXbchR0hBzNIzuX4jtY3vE4Olv8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A4jluThhJNpnNOzWMbWtbJDb/apcCxFFSit1RUd5pWoubex7eaV9S2V0kjMDFqXHfOo9lLRHhvtsawvSV/j4xe0hqU9AJOGz6qMYmCAzGLVIHBEnnuKCG01RQht5ABzde+fim/V4DgXkHTEb9Fscm7dvaQCsAc5v4RVAt4w6TiA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=g67cTE5k; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g67cTE5k" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216340; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oE9Ja/aC61CHup4qeT96Odiagg40Q7TPEFXRUvHwXRo=; b=g67cTE5kem53Qfb56KG6/b812KoghfArDZpyBshBrSShtsRrNbMvTvQxkz48P05eyEUwK5 YzsgqYdpC115+QJKyo2DNHaBkA++ut+VMfmYHEBnZcpAwmvTP4cQqanPLSGAI73KZTf+EY MECnnyuNWFQLlBbZidKnT+hpbQNqx3I= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-441-8qCqwfNkPue_nCB3L13D_g-1; Mon, 10 Feb 2025 14:38:59 -0500 X-MC-Unique: 8qCqwfNkPue_nCB3L13D_g-1 X-Mimecast-MFC-AGG-ID: 8qCqwfNkPue_nCB3L13D_g Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-38dc6aad9f8so1835283f8f.1 for ; Mon, 10 Feb 2025 11:38:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216338; x=1739821138; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oE9Ja/aC61CHup4qeT96Odiagg40Q7TPEFXRUvHwXRo=; b=U5MRkzf9W5UnuO0hG543GDhgoYnmO/WJYtzNa3c/VUMl/GDW9+ek5oF9t2EP9Jf7OK gnx2C7IXxmabU79SVym786i5E2ydFiLrsUnaTO1guaUfBg9ob8XkWrdiNN9bvn44XlRi IJj/m7+zNL/EWeaiYZyGoU46dO3/mC3zjlSS6vaDosVo4V4kOad3qOXp5r/XPLGI4ZKD FGix77GsGQCcgnBLxg2vSwSzpF4IUVVLryJ7aKTFmzkvNndHZwc7IjilujAai9rHzgAo 1I42tIKMmpjNgrlJQjgSQjYwIC8FdFga+L1PTFBM6C0f8+tHcaH/rY+rv7KR4OTW8fv3 VC4Q== X-Gm-Message-State: AOJu0YxEkpLsdamTbN3FsE9INbtoliELXK8dOEXm+g3IwtjOu6LZtq8p yGcvRiu6EMZ+/feMQsT33KUvuIoBIce+tA3gH1rgYoFs3Kpz/a8/yfPyAaLanNtsOaO2O45P77y gguFwJuF3en7W92a04fXNNiTEw+NKvvShkNE7Vd/3sJRUXrmeCwE6Lk+g7ti54brmzej6ena3Tp ++ElSaLzeL74aAghY8y2ONpJTucokAz97417ev6yPNRjx9 X-Gm-Gg: ASbGncuWX9pAMxMyFojPMcteQLz7eb7b6A/vHZ8Mgtx7TS4BItoL7LkOU3ErW7XVYAN BjRvEFg3f+qVovMI9njk4WZ2+bxV6lM4Zbq6FF3spLhfWeP1YH+WL0qPyt7lvzpJ2il/BPO6165 vzCC/4HcDAazg53U2Yc/h3KvNUdrTa60Y0kMoDq3KTcsPEsLI/oAgg/IBRti6Scc+fva88+6Mo3 15wxr9DJm6yo2g9HyVS1dDWFe7zNbFxO6UUS54ImBDJxpKrUj/Yq8DT/HgAVCe/TXFvMhwTghB1 Y/0W+nFVOdu1mhYQdCsXVsFf30DpnIxT/+Tq2zZ4PDCG9B7aIUy/V+ep5hi7mKkYTA== X-Received: by 2002:a05:6000:1887:b0:38b:f4e6:21aa with SMTP id ffacd0b85a97d-38de439b7e5mr512566f8f.5.1739216338067; Mon, 10 Feb 2025 11:38:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IF/uzgNmJSm7r4vj5m1RC+0Hk4d4M8qsuHzJI2p/6gaMOev6b/snbfbww+U3ctbsFbws8/HTA== X-Received: by 2002:a05:6000:1887:b0:38b:f4e6:21aa with SMTP id ffacd0b85a97d-38de439b7e5mr512516f8f.5.1739216337518; Mon, 10 Feb 2025 11:38:57 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dcc9bd251sm9816921f8f.9.2025.02.10.11.38.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:56 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 14/17] mm/damon: handle device-exclusive entries correctly in damon_folio_young_one() Date: Mon, 10 Feb 2025 20:37:56 +0100 Message-ID: <20250210193801.781278-15-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). damon_folio_young_one() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as we expect ZONE_DEVICE pages so far only in migration code when it comes to the RMAP. The impact is rather small: we'd be calling pte_young() on a non-present PTE, which is not really defined to have semantic. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Reviewed-by: SeongJae Park Tested-by: Alistair Popple --- mm/damon/paddr.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c index 0f9ae14f884dd..10d75f9ceeafb 100644 --- a/mm/damon/paddr.c +++ b/mm/damon/paddr.c @@ -92,12 +92,20 @@ static bool damon_folio_young_one(struct folio *folio, { bool *accessed =3D arg; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, addr, 0); + pte_t pte; =20 *accessed =3D false; while (page_vma_mapped_walk(&pvmw)) { addr =3D pvmw.address; if (pvmw.pte) { - *accessed =3D pte_young(ptep_get(pvmw.pte)) || + pte =3D ptep_get(pvmw.pte); + + /* + * PFN swap PTEs, such as device-exclusive ones, that + * actually map pages are "old" from a CPU perspective. + * The MMU notifier takes care of any device aspects. + */ + *accessed =3D (pte_present(pte) && pte_young(pte)) || !folio_test_idle(folio) || mmu_notifier_test_young(vma->vm_mm, addr); } else { --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9C482505DC for ; Mon, 10 Feb 2025 19:39:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216347; cv=none; b=U5XUfyJI2DiCM2qiCpl4kDy8mLV+ymmtVrxKMBxy22JH5pRTgGgRVeVvNyq2mqZHd0QIFHNVK7dKbtqf9R9WVKePCtwPk8FT5AUoZ8wwZVK3qON+rvk4z7F4l9F7gbO40mMINPNfXZKqfB6Kf4OegB+GJDxTc/4HFWGN04bP2Zw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216347; c=relaxed/simple; bh=eeTQic3U3Pz3ShuuvH4rmh3qdHr2tcndU+mn29E8LNI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BiJnbeRQZzA+uPVoJ4eAJvvBq6VF4qlR4fHdjYzUC1r5ZTmcwEVZizTJHwJD/K0jOsQVLkPGF92nFwJHg3C30r4/eoi/hb/tJBwq1Q33qaNM888IxAVE7O0c3JWjtgTDPt/0qdi1WBH/t+zs6mQlsX8u9ERB0xts+93PztfwOEU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=GIf9EIBn; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GIf9EIBn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oNpxFEEEyxoH6lwF3K+DQE9+aA/s7jzY02gjXCRiBPg=; b=GIf9EIBn800qN8VHy/2XREBmN6R5F0adch97q/LfMYsUscssVKg/l3JjSI6U+i/GbpwRES rIhLWnGtc310tcF1GdqaNnsWkeGLJ/zonzpPLva9jrv5Ewy2ckvxNAXf2wH430W5PNaHvS 7ag4kxrkQJt81nwFebuCflIHl4volCc= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-324-pcfXIJzPMnaBcKq21PO25Q-1; Mon, 10 Feb 2025 14:39:03 -0500 X-MC-Unique: pcfXIJzPMnaBcKq21PO25Q-1 X-Mimecast-MFC-AGG-ID: pcfXIJzPMnaBcKq21PO25Q Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38ddee8329eso708600f8f.1 for ; Mon, 10 Feb 2025 11:39:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216342; x=1739821142; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oNpxFEEEyxoH6lwF3K+DQE9+aA/s7jzY02gjXCRiBPg=; b=PivDaOEY9Y/uaiwsbex23V0awlRJDhIMIHgLLUFDiyOkev71BpRNH/YLDqcnDdCJ3Q VIkIGmqYy8VDM3Kii8xeE5El78KNKybi0SQD04B0rrNfhuVzQqmfTx8sbF3hri7n7TeQ gLOQ/Sph83szSldHVcSX6l3zrXbv/fu3aN2pQtGLEQRbyC+pgB5VQULVcL8C84/4Sq8t N/bXJCpcu+haYe8/A5s06bR4MIkUh6nE6aao4i8k6QhR6yfOdTp9scucMmtt6f59K6hc xd36A5hJNWeAKpoGL7VS8MpnhGsvU6T+z86bzWKZZN0hMUiWkc69v31dBQggOAG9fz++ 6MrQ== X-Gm-Message-State: AOJu0Ywle9SuviiTY29Jby/FL0bjXlr4qrjsK+zCdZp1paIUMW2Y3g82 PtN4JTFVvzRYk085BtcO6jUhrwSGBAur5y8TkW8ApFdj0xP8dvfCB7Qt6EoqHIXLRroNCa8GNhG i3dL2Y3RETUtApyE5GaWVPI5peYCI7LhoUMbjBj1deN8IU13k+J1AM3K636SbRb5m6LA7NG1uzt dnh8xIBM+tNU/r0fQatizDkClT1+lPkoA2BYxDOpkBiM/Z X-Gm-Gg: ASbGncsbwuw8P69zDR0HNRCsTSU3bpWT0yl6rr68/DoZhz99IkN9iNzKud4pKU7wxce xMvs3NFTa3lhYVnuc7UpxsTlzBEb+VH8Iv4J9FNV0+nLHZ1FRNCJCJG12WW0sfvW9QrITD6Y5Qi 2cQWmskV1C2lSGZ8AHxhzHzBIvc2gfr1omD6dHtsTiKTPhc+eb8cSgTZ6b4l+X5Kkn9U7AM5ehs nhtke9cmorQvWB65Ga1kfAusYNO7HNfO8c/8yHD0PR2HCu+rYQ6hkSKqFAtQrObnb4qlAa27kUh IbT0PJcw8tB4pJDd618ydQiKG1KPE82G66ZB9DhcIpryDrXjwsLx8cP2dfpLxBTayg== X-Received: by 2002:a05:6000:1813:b0:38a:418e:21c7 with SMTP id ffacd0b85a97d-38dc935246fmr8277038f8f.53.1739216342203; Mon, 10 Feb 2025 11:39:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IHTiiTPkwAyfuLx0qL+LO6PapdaXuVNjUwBeGg/Z0ah/0RffyIckymQJ3LkKa1NUzxInLX6yQ== X-Received: by 2002:a05:6000:1813:b0:38a:418e:21c7 with SMTP id ffacd0b85a97d-38dc935246fmr8276996f8f.53.1739216341643; Mon, 10 Feb 2025 11:39:01 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dc4d00645sm11916376f8f.66.2025.02.10.11.38.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:39:00 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 15/17] mm/damon: handle device-exclusive entries correctly in damon_folio_mkold_one() Date: Mon, 10 Feb 2025 20:37:57 +0100 Message-ID: <20250210193801.781278-16-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). damon_folio_mkold_one() is not prepared for that and calls damon_ptep_mkold() with PFN swap PTEs. Teach damon_ptep_mkold() to deal with these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as damon_get_folio() filters out non-lru folios. Should we just skip PFN swap PTEs completely? Possible, but it seems straight forward to just handle it correctly. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Signed-off-by: David Hildenbrand Reviewed-by: SeongJae Park Tested-by: Alistair Popple --- mm/damon/ops-common.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c index d25d99cb5f2bb..86a50e8fbc806 100644 --- a/mm/damon/ops-common.c +++ b/mm/damon/ops-common.c @@ -9,6 +9,8 @@ #include #include #include +#include +#include =20 #include "ops-common.h" =20 @@ -39,12 +41,29 @@ struct folio *damon_get_folio(unsigned long pfn) =20 void damon_ptep_mkold(pte_t *pte, struct vm_area_struct *vma, unsigned lon= g addr) { - struct folio *folio =3D damon_get_folio(pte_pfn(ptep_get(pte))); + pte_t pteval =3D ptep_get(pte); + struct folio *folio; + bool young =3D false; + unsigned long pfn; + + if (likely(pte_present(pteval))) + pfn =3D pte_pfn(pteval); + else + pfn =3D swp_offset_pfn(pte_to_swp_entry(pteval)); =20 + folio =3D damon_get_folio(pfn); if (!folio) return; =20 - if (ptep_clear_young_notify(vma, addr, pte)) + /* + * PFN swap PTEs, such as device-exclusive ones, that actually map pages + * are "old" from a CPU perspective. The MMU notifier takes care of any + * device aspects. + */ + if (likely(pte_present(pteval))) + young |=3D ptep_test_and_clear_young(vma, addr, pte); + young |=3D mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + if (young) folio_set_young(folio); =20 folio_set_idle(folio); --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D502E2512C6 for ; Mon, 10 Feb 2025 19:39:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216351; cv=none; b=t/g7JXsSR/qvMma4xOWku+0tr1L0MQIV7joVxPYEtTpJGGmLNRyt3Lskc6WhzFjpJSqPsOtYMuSA5IqKUdDKeHDJFglr/Z3DaGkEkOFZQLrYrKIbyj8AfAN3Fupjwvb7UjkVKXERGX8dcXHTAyGXFixf3RMXyWJdqIDfbSj/0Fg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216351; c=relaxed/simple; bh=zopD3ByQcw+T0PaGnkSFpwWr8v6nq8ox8511K+fI4zo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fGczwM2lAY3gROPB7paVfhn9jPpGbOxxETjU7bowWHCWnDpBCKJnziNK/cqxc7UHaWnP10AwT7/l58aT8w5QVcI+h2QR/nwA9e7wUR8bzrw8T8NHxpgXKZGG6nBblBVZFI8ALKthfP8djBlW8gbvGxAal1myHK94v3DHzBiT26A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=AkzMp1jI; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="AkzMp1jI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216348; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e5kPSWbMiVGXXHUJW6xa869lKm+zAgfCoOcPS4ARFgc=; b=AkzMp1jI7qn5h1a7seo60CqwI66adK7Z6hUVgsDL9qRGK+DqdX6oPJCaX7ytQrlorWzCsM p+iI3V7W8CloQY2d+n6vCnV21XqT3YCXdNRiGpBJ05ick2VLJY/bLMBPUS7c2LDv/ZhPXH sF4Z8q/x48yytONju5AMQz0IJhbJNTQ= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-654-OaXxgawbOZqX6kmffHm2UA-1; Mon, 10 Feb 2025 14:39:07 -0500 X-MC-Unique: OaXxgawbOZqX6kmffHm2UA-1 X-Mimecast-MFC-AGG-ID: OaXxgawbOZqX6kmffHm2UA Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-43933a8ff58so13555145e9.2 for ; Mon, 10 Feb 2025 11:39:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216346; x=1739821146; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e5kPSWbMiVGXXHUJW6xa869lKm+zAgfCoOcPS4ARFgc=; b=mu9FmdBlBs1TYTPUg9UqLtm3SXLtZsj2PfaO2802eVhzbqT+Bti40Mm8vGFZ/m3hhj 9TtKpkPxbxb178PtfhNr1H1kvEt20PiEx5EwcVFceg63SLKw7FUcHz0XFl+QJt2ItsC8 Pijph1qROlNSwMWeSBAOLeZdSjE4lArp0SYEjoYwHCEiX4+ih6/UjRU/tUVWtUXUY88e /c4CuvIiTM9pw4RdJ8vDWi46E4bNIOIVGDFwEAdZHzRXJtEyOW3sNs7z5eQePj30upQh brZUkmMSfGlhNzzeqyucalu7pUMNX2w0ridgGMaVx85n1/oNpRHtNTciW78mGxtRYPo7 oZ0g== X-Gm-Message-State: AOJu0YysTQz7aeyCxfnCQGhWueX7rwLKqm/Q4ZYh5Pum5H73RTEsQDFy mIghxgZph7GO7F2Ppew9JxHm91EyZnoyzvPJImdK2TT+neVjDExL+rwOBel5d4bato0h2aHm36f CI0ayBt3wcBPG185WMnMRbkZt0OANpI0EsnembzXAOkFbrKSpI5fyN9ucFAmCOiQmjqqbdcMrhB fj+ieKO3mZicsHIrlh9kgC5KEk0u/PdepyLeX7zUvsECx1 X-Gm-Gg: ASbGncvY8JuEJ50FJCQ+NMIcV2jhEsKK2H2QykUru5pZvduZVOGRsom8i2/l135WMfX Xj/xVEe/rUrsyOWHMUATZ4yM/PfnT9s7CkOXH8K4D1jH6ARCLulT62kaqhNJ6jX//+qtwM7YYiF 1hFJI7wMwwqAQTBE8SSwSFcntEFolUVgLNkoFFbnDHYgZ8G7tut/iEvEtU6jcVpTzgYZIHJ+NiI 2bKS5seiUawts3wRhfyZrxwwZY3rNmI6CZqGvieoU+5nzEIEkAdMpfNwjs4r9AQkHtM4QDFS2vL ot97Oc0CIf2B4WEV8xrlF075nf+MS1EXfHBhuk5RSdNj2ZYe+l/BPtNRR5oc1xFO+Q== X-Received: by 2002:a05:600c:1913:b0:434:faa9:5266 with SMTP id 5b1f17b1804b1-43924991f73mr122649055e9.13.1739216345825; Mon, 10 Feb 2025 11:39:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IE0N2Q/3/pgsBEDUurQcQIsnMqE2C8UdYDG2bl1ZHVr+k7cZL7qtDol6R4B+8CggSnILo8JzA== X-Received: by 2002:a05:600c:1913:b0:434:faa9:5266 with SMTP id 5b1f17b1804b1-43924991f73mr122648595e9.13.1739216345384; Mon, 10 Feb 2025 11:39:05 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dd9c48173sm5308677f8f.37.2025.02.10.11.39.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:39:04 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 16/17] mm/rmap: keep mapcount untouched for device-exclusive entries Date: Mon, 10 Feb 2025 20:37:58 +0100 Message-ID: <20250210193801.781278-17-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that conversion to device-exclusive does no longer perform an rmap walk and all page_vma_mapped_walk() users were taught to properly handle device-exclusive entries, let's treat device-exclusive entries just as if they would be present, similar to how we handle device-private entries already. This fixes swapout/migration/split/hwpoison of folios with device-exclusive entries. We only had to take care of page_vma_mapped_walk() users, because these traditionally assume pte_present(). Other page table walkers already have to handle !pte_present(), and some of them might simply skip them (e.g., MADV_PAGEOUT) if they are not specialized on them. This change doesn't modify the latter. Note that while folios with device-exclusive PTEs can now get migrated, khugepaged will not collapse a THP if there is device-exclusive PTE. Doing so might also not be desired if the device frequently performs atomics to the same page. Similarly, KSM will never merge order-0 folios that are device-exclusive. Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/memory.c | 17 +---------------- mm/rmap.c | 7 ------- 2 files changed, 1 insertion(+), 23 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ba33ba3b7ea17..e9f54065b117f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -741,20 +741,6 @@ static void restore_exclusive_pte(struct vm_area_struc= t *vma, =20 VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio); - - /* - * No need to take a page reference as one was already - * created when the swap entry was made. - */ - if (folio_test_anon(folio)) - folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE); - else - /* - * Currently device exclusive access only supports anonymous - * memory so the entry shouldn't point to a filebacked page. - */ - WARN_ON_ONCE(1); - set_pte_at(vma->vm_mm, address, ptep, pte); =20 /* @@ -1626,8 +1612,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gath= er *tlb, */ WARN_ON_ONCE(!vma_is_anonymous(vma)); rss[mm_counter(folio)]--; - if (is_device_private_entry(entry)) - folio_remove_rmap_pte(folio, page, vma); + folio_remove_rmap_pte(folio, page, vma); folio_put(folio); } else if (!non_swap_entry(entry)) { /* Genuine swap entries, hence a private anon pages */ diff --git a/mm/rmap.c b/mm/rmap.c index 7b737f0f68fb5..e2a543f639ce3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2511,13 +2511,6 @@ struct page *make_device_exclusive(struct mm_struct = *mm, unsigned long addr, /* The pte is writable, uffd-wp does not apply. */ set_pte_at(mm, addr, fw.ptep, swp_pte); =20 - /* - * TODO: The device-exclusive PFN swap PTE holds a folio reference but - * does not count as a mapping (mapcount), which is wrong and must be - * fixed, otherwise RMAP walks don't behave as expected. - */ - folio_remove_rmap_pte(folio, page, vma); - folio_walk_end(&fw, vma); mmu_notifier_invalidate_range_end(&range); *foliop =3D folio; --=20 2.48.1 From nobody Tue Feb 10 15:43:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F6FA2512E9 for ; Mon, 10 Feb 2025 19:39:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216355; cv=none; b=MuTUTTcok/6t175BuP5uiGD/TTa1+82Jt9t4xIe8kxxeumNV6DYz3kJoR9S19x20ITHFXDbxZPRH2Zwf7LecJivPQbntJ37m2plnvYHSCJWwLdNDXGgDvuGoRd4i/Wd+WtSvLD6aPif2z5v1X0BYo+yIT1joFM2KHZHrLZgt61E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739216355; c=relaxed/simple; bh=PA01cEf2/9mIQZ9O32CRa6PGoRajWbPJGABkde9iNzs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=l3241qjTzNjtpsufXbdb23tBsFVEzu1AdOBlBzXUl0QgyP+10Afeky2i0oi4Jmp6gXzupc2DkGbdt94UgLbMJvC7W4lBnwsa7o0bmTu6psKF7J/nCR5lalffoIqvrRpUx9EC8FZxjVL/UKuZUpIRAZyKIt7S6+CAO7cQngkP3k4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bC8Kx+Xd; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bC8Kx+Xd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216352; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gxIo4/a9ZeK4nnisZvQU6AU2gxtpzJDN86Rrv62j7Tk=; b=bC8Kx+XdEnbyqg22ao4ynArdTUfP4C95gMF6sY7ngBaO6SHdSjyRIRyMyemnUamAeilVBS 4zHwwbebRd6GE2bDug+ISNAhwHnYwMHWiB2ijDyaTNPNRyBfGIw8lXZ59azG8f/u2dJ78L KP/iuzb9VpwliRmYWNb9fHQ8uwCrHeY= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-551-JCfOj35yOeyfgkFP4uUamA-1; Mon, 10 Feb 2025 14:39:10 -0500 X-MC-Unique: JCfOj35yOeyfgkFP4uUamA-1 X-Mimecast-MFC-AGG-ID: JCfOj35yOeyfgkFP4uUamA Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38dc56ef418so1693465f8f.1 for ; Mon, 10 Feb 2025 11:39:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216349; x=1739821149; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gxIo4/a9ZeK4nnisZvQU6AU2gxtpzJDN86Rrv62j7Tk=; b=VB3NlEAfsGQAW/+x67y7soGP7Q0wI8imJ8ytDOPxzR5BMptpkSdZy4tgy9d1MWNiPD 1gC5bHiz5o3RYiIKlaHHwBITOqkOE/O2JsFmy1NpcOM03u9Nm3ubinKaucEC1rXnQ7D3 M42fIOwJ4fufQxZrewmo4KLveOYb7BSjTHuz/nLQNosBwNt+4nGDKv61AQzSFExEytYw i5GQX4rKQ3d5+IftHmvJlAmQtbi3mUXQwOl0CVPsaGKaKwLcTP4GoXrzGLxXv8gyzgbQ KYvBxdwyZTFliTNbuqG7XgTSCSdxaE/+g53KFlpmCyOWkwTROhwkvCX1E+0o/BPUHZkk R/Yg== X-Gm-Message-State: AOJu0YzVwNGT/nlvbMF2r7b5GCtyZiFegsC3wTjh9Iu9KDcpdCeRNts1 JubMA+6OIqT0Df1RLyGLI7mdCNpIUvLplEHxM7quJFMb7Rkqnr86kSnedcRX++KOsaIkE71HpRz dxuK86r+oR8BjGY9lnkSmUgd5gY2rx6KlqNLJqCTpDTWw+tMyvDuvN22W7sE9HTZ4hoMUzV5qHu 0WHTkxgDZ+mBP/lwlt9DH1minCoHU+Rz8JYjGGCFyF1wqE X-Gm-Gg: ASbGncvG4BilXSGC+IX2mpiC8u7Z0veFV02Qxapd/z08qs+bEHTSKGDuMfEX4nK9K2m FGKDN4ekcAGlIpbvs/pteQ+0fSoiGyfRPMgHN98GpvKM8l/lqRD91Efp2k9S6FMro9p2F4EDtGu 0d2bSrmCziS72XPou+bJDxzD/Vj7iV+uNUDaQO+RIXaNGV8zzVxa+A3T1FIeKmn4fwGv0MflKzw ltiimmHGsPZsehqm6p35CX8FIeQfESCu2KQVxzeyai6VFDY5DO2yrPaQNuRiRiZk9GBWxkeGGcx opK+6EZAn48jv16liCusD/KQm/8TKsIl33rIg6rEAXmTYkIdAZKcEywWEUstKRyIag== X-Received: by 2002:a05:6000:1448:b0:38d:a879:4778 with SMTP id ffacd0b85a97d-38dc9343f89mr13325616f8f.33.1739216349570; Mon, 10 Feb 2025 11:39:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IFOm3YA+KmUhyhlND1SLmjpOJWvwKHSp1MsMH3YdrdBRb69cVHv165Q1M/eFn8bTRexW9yPlw== X-Received: by 2002:a05:6000:1448:b0:38d:a879:4778 with SMTP id ffacd0b85a97d-38dc9343f89mr13325571f8f.33.1739216349113; Mon, 10 Feb 2025 11:39:09 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dca0b4237sm10326047f8f.85.2025.02.10.11.39.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:39:07 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 17/17] mm/rmap: avoid -EBUSY from make_device_exclusive() Date: Mon, 10 Feb 2025 20:37:59 +0100 Message-ID: <20250210193801.781278-18-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Failing to obtain the folio lock, for example because the folio is concurrently getting migrated or swapped out, can easily make the callers fail: for example, the hmm selftest can sometimes be observed to fail because of this. Instead of forcing the caller to retry, let's simply retry in this to-be-expected case. Similarly, avoid spurious failures simply because we raced with someone (e.g., swapout) modifying the page table such that our folio_walk fails. Simply unconditionally lock the folio, and retry GUP if our folio_walk fails. Note that the folio_walk repeatedly failing is not something we expect. Note that we might want to avoid grabbing the folio lock at some point; for now, keep that as is and only unconditionally lock the folio. With this change, the hmm selftests don't fail simply because the folio is already locked. While this fixes the selftests in some cases, it's likely not something that deserves a "Fixes:". Signed-off-by: David Hildenbrand Tested-by: Alistair Popple --- mm/rmap.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index e2a543f639ce3..0f760b93fc0a2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2435,6 +2435,7 @@ struct page *make_device_exclusive(struct mm_struct *= mm, unsigned long addr, struct page *page; swp_entry_t entry; pte_t swp_pte; + int ret; =20 mmap_assert_locked(mm); addr =3D PAGE_ALIGN_DOWN(addr); @@ -2448,6 +2449,7 @@ struct page *make_device_exclusive(struct mm_struct *= mm, unsigned long addr, * fault will trigger a conversion to an ordinary * (non-device-exclusive) PTE and issue a MMU_NOTIFY_EXCLUSIVE. */ +retry: page =3D get_user_page_vma_remote(mm, addr, FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, &vma); @@ -2460,9 +2462,10 @@ struct page *make_device_exclusive(struct mm_struct = *mm, unsigned long addr, return ERR_PTR(-EOPNOTSUPP); } =20 - if (!folio_trylock(folio)) { + ret =3D folio_lock_killable(folio); + if (ret) { folio_put(folio); - return ERR_PTR(-EBUSY); + return ERR_PTR(ret); } =20 /* @@ -2488,7 +2491,7 @@ struct page *make_device_exclusive(struct mm_struct *= mm, unsigned long addr, mmu_notifier_invalidate_range_end(&range); folio_unlock(folio); folio_put(folio); - return ERR_PTR(-EBUSY); + goto retry; } =20 /* Nuke the page table entry so we get the uptodate dirty bit. */ --=20 2.48.1