From nobody Sat Feb 7 23:14:22 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDB96189913 for ; Mon, 12 Aug 2024 18:12:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486356; cv=none; b=QZ0MG15nYs0B0hADvHgxa0Q35iN3vSEQTy/TwPV3ll7/UCGKtIZZ3sun6/0/PRkUP0hU+iQkJ6g9F/zKA2Fj1zhNOfqbv9P6uBFzXbugvzYb7u1q5/5kcBida7/GzO/yLzmFdNzUeYYu05r7yEbHG2TFJCfkYIHLnHi9yYSazt0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486356; c=relaxed/simple; bh=XL7oTSSTB/8YDtl+/bwPSSNlnKCeA31jQyxrFRfpnLw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eJHLLvvewbzEA3zIrYB2SCObmDEto1ic/nFldCGcPtiglzPvo9/UTAuCrlyGgV1r2YhKRJ6wTiqJKFNwpIP2VsYbq18S0vNw1uKW69uFuD6rAxRiDz02JzOgabzKypXq91NrvZteVdz+JWN7Zv9L8lcz6uaMxEGX+qZR+930SWs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=E2COUIBU; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="E2COUIBU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723486353; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9Kq72BfB1Q9PD26Udwsuz2Gt7rCvKvwunl3vkpU7op8=; b=E2COUIBUrbYAlsFtjFi/ON/hA+WZdjwU9ziHl2Ev5A1+Xmmku2GcxruedYEQfrMQEWvu9l JLy4Ku9Wua6PbmI4nR5GxG47L+rSHUrlSkCS7Igf4uY+WaiuG0dRsSrYl+z4ExqEvk+Mqr Sp1QLy1w942MPoVd6eCfYopdhtkzcmw= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-265-3AQrEc2mNLKlMAYw0isJHw-1; Mon, 12 Aug 2024 14:12:32 -0400 X-MC-Unique: 3AQrEc2mNLKlMAYw0isJHw-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6ba92dd246fso10368636d6.0 for ; Mon, 12 Aug 2024 11:12:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723486351; x=1724091151; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9Kq72BfB1Q9PD26Udwsuz2Gt7rCvKvwunl3vkpU7op8=; b=QHnOa4hc5qzz6ND51nXGUUALWYU9PxJnrlFEKRX8PPga4km4aKsYUfG1RhEyg8OG2o eSubvyc4ngNDEWKJlgoxaaQkIc/LzzrDIxBOtvof9XfZY0uFjjyBzwkn8R5sksf3YpJD Q6jm0LycenRT/dCVvkKk3cf0qkRNeWcw1LP8kH4j3v/rA903kSSIy1xcYkZjBPRjWZf1 e4LZEEspBHU4SbxHqRSVMmKHlMIsSnHtECG7C3q955aeYhsL5Jf5NZaNlGbrWJTYi/Ut 6/thmH7AP39k6x655h9kViMum9vdUi6uaFQyrghIXNHazlO/UetIUnYivV4UmrnSX/Gu W+jA== X-Gm-Message-State: AOJu0YzYWz3ctyv9ppWhtK56CQhBPsyuk0ZTBE+TaAfWYAWQxQYL2COs zO6QTyJvMrAn/AebEO3MnsTfiZ3CpLAsHY/Oh4gHuE/yVl3lAzDFs7TjhwZfJF4BDwXb5LQwbjq U5q4aWHOPKuZAVwvT+Zc65uhWIKPIPA3hgUGWEXrJsr7qncFpG5yubWwjW2Q3o4+u7L3V96yWXZ i+s3qTzFCCRQF0WK8pm0fq6QRW/8yCYQRqw/QaPDZT2MY= X-Received: by 2002:a05:620a:4005:b0:7a3:49e2:d5 with SMTP id af79cd13be357-7a4e15b7f45mr75249885a.5.1723486351202; Mon, 12 Aug 2024 11:12:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGINV1er94KHea7itgcrlskkFbFJC5xxX91ZMY2mT7mbgIaqz9jLoR0x5JTp0pkA3b+65JpVA== X-Received: by 2002:a05:620a:4005:b0:7a3:49e2:d5 with SMTP id af79cd13be357-7a4e15b7f45mr75245485a.5.1723486350655; Mon, 12 Aug 2024 11:12:30 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7dee013sm268663985a.84.2024.08.12.11.12.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Aug 2024 11:12:30 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Nicholas Piggin , David Hildenbrand , Matthew Wilcox , Andrew Morton , James Houghton , Huang Ying , "Aneesh Kumar K . V" , peterx@redhat.com, Vlastimil Babka , Rick P Edgecombe , Hugh Dickins , Borislav Petkov , Christophe Leroy , Michael Ellerman , Rik van Riel , Dan Williams , Mel Gorman , x86@kernel.org, Ingo Molnar , linuxppc-dev@lists.ozlabs.org, Dave Hansen , Dave Jiang , Oscar Salvador , Thomas Gleixner Subject: [PATCH v5 1/7] mm/dax: Dump start address in fault handler Date: Mon, 12 Aug 2024 14:12:19 -0400 Message-ID: <20240812181225.1360970-2-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240812181225.1360970-1-peterx@redhat.com> References: <20240812181225.1360970-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently the dax fault handler dumps the vma range when dynamic debugging enabled. That's mostly not useful. Dump the (aligned) address instead with the order info. Acked-by: David Hildenbrand Signed-off-by: Peter Xu --- drivers/dax/device.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 2051e4f73c8a..9c1a729cd77e 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -235,9 +235,9 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *v= mf, unsigned int order) int id; struct dev_dax *dev_dax =3D filp->private_data; =20 - dev_dbg(&dev_dax->dev, "%s: %s (%#lx - %#lx) order:%d\n", current->comm, - (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read", - vmf->vma->vm_start, vmf->vma->vm_end, order); + dev_dbg(&dev_dax->dev, "%s: op=3D%s addr=3D%#lx order=3D%d\n", current->c= omm, + (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read", + vmf->address & ~((1UL << (order + PAGE_SHIFT)) - 1), order); =20 id =3D dax_read_lock(); if (order =3D=3D 0) --=20 2.45.0 From nobody Sat Feb 7 23:14:22 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FD1618B499 for ; Mon, 12 Aug 2024 18:12:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486361; cv=none; b=sn5SIIhWklMYU3VU77eraJEKHNtFld2Hq9Ge50giyzwPlESQ3yenT9KPhk1SkyfulFyhjHZWhb2R+z88uZ8dXRv0Dz0ZMuadu0cSFlebfQyJWEJMdhfIsv+vcfWjh/vE/ehgU94E8PEZ/99tmXpBksgmukezDF0l7RWUlZ890tU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486361; c=relaxed/simple; bh=7RUkptmdOEoTaWI5G/XfwfT/YiV+5T3HhVmyBbiSmpw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IiqRMvD4Bz+bzM41BQPQ6kx+lA7KkqT+JF52FprBIkb7bS46jFxivTFU4q7w7vNCtAp5wfxTP9xwz9C5VRbaSQJWlcdgKwIdyRzg6ECiCjbby3bQY/0Gkgytq/Z0vnR8l0z6UaPqueMU0iZONvCwylaeZaIYjKUoM7U+aRmK9Fs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=g+wNwXxF; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g+wNwXxF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723486356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6aC2RF+//p0GJfjgH/iZrlW9ATLJl4mVfe8Rdc4jHgk=; b=g+wNwXxFxB696wqTE34xC3fI1p16vcRqwUy3waBW23AbkEbFJ/FUbKoDA41VlggRDAH8Tf eMdIsZd2oMzMeZ44UWrAcC7FvvpMKXnLYRYOmZkviarToeOH0dSGKlu1CeYcusU+gqxRRK CUzL94bLy/osHek5lSKN9HpHrES/1D8= Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com [209.85.210.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-441-_fdUnv7lOrqYrCmRkuKR2A-1; Mon, 12 Aug 2024 14:12:35 -0400 X-MC-Unique: _fdUnv7lOrqYrCmRkuKR2A-1 Received: by mail-ot1-f70.google.com with SMTP id 46e09a7af769-7093f4569b3so148711a34.1 for ; Mon, 12 Aug 2024 11:12:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723486354; x=1724091154; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6aC2RF+//p0GJfjgH/iZrlW9ATLJl4mVfe8Rdc4jHgk=; b=pge56zN1XAypncuOi9rdDYFLvixM5Cg2JK3avxeNkXIUbpWxYMRJUQVU14kxiNdJjO xoyok9ApnpQFFlLyJlL4fJW0IB/QvjtCAmhom7Whgxnf43PSCiMb36YPIPm7PItXDgEM Q25AtviqRn9zrn/eJHS3ZN1nmKdd98yugy7X8aHUl5w2h7dGIY1fp50MoS7cnq7/g12c w0+Lrj89GvaKkF0HGPkeb13HF7aoLvxe1t5XalnpBSkMhRg6JwKDfMOcESifOrD9dlKM MvB2qg0qkprFrUFLZJpAuUCtdLwpVQNBc3Xsy2U+CKCEstqxjaMHq2sIPXlM9VKYJKsc Vxmg== X-Gm-Message-State: AOJu0YxBwmBA9WGrwyP2z8h6XLOfUVU9oedXRROJDuXSgsLqJMyyofBq v2l1eKKxPv6Yg0ZKZGszsjzRcyUHAh1WCGhDfUt8HQ94bFoQwQsl/xmYjV4D8pUxM48jaMKmLC4 SdTL942RMGolA6D4Vw+K4pxr2v8ygX0e9x4hcfb++2qIo6TfJruRGzQ0aJuuc1WGCkXvctoaKZ7 bkRzGBVHZlTG59jw2bzgu+OeRSdZhO4z61JwDdGiuRuiY= X-Received: by 2002:a05:6358:d25:b0:1ac:a26c:a07a with SMTP id e5c5f4694b2df-1b1a02f2896mr6055d.4.1723486353871; Mon, 12 Aug 2024 11:12:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFKB0h9tqBDznq7jPudw+rKB1F4j/lDBjBP5vz8Ij/mXfeVHheDU0KW5XKnOwJHdaPAJFxqTQ== X-Received: by 2002:a05:6358:d25:b0:1ac:a26c:a07a with SMTP id e5c5f4694b2df-1b1a02f2896mr2555d.4.1723486353186; Mon, 12 Aug 2024 11:12:33 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7dee013sm268663985a.84.2024.08.12.11.12.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Aug 2024 11:12:32 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Nicholas Piggin , David Hildenbrand , Matthew Wilcox , Andrew Morton , James Houghton , Huang Ying , "Aneesh Kumar K . V" , peterx@redhat.com, Vlastimil Babka , Rick P Edgecombe , Hugh Dickins , Borislav Petkov , Christophe Leroy , Michael Ellerman , Rik van Riel , Dan Williams , Mel Gorman , x86@kernel.org, Ingo Molnar , linuxppc-dev@lists.ozlabs.org, Dave Hansen , Dave Jiang , Oscar Salvador , Thomas Gleixner , kvm@vger.kernel.org, Sean Christopherson , Paolo Bonzini , David Rientjes Subject: [PATCH v5 2/7] mm/mprotect: Push mmu notifier to PUDs Date: Mon, 12 Aug 2024 14:12:20 -0400 Message-ID: <20240812181225.1360970-3-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240812181225.1360970-1-peterx@redhat.com> References: <20240812181225.1360970-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mprotect() does mmu notifiers in PMD levels. It's there since 2014 of commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to change_pmd_range"). At that time, the issue was that NUMA balancing can be applied on a huge range of VM memory, even if nothing was populated. The notification can be avoided in this case if no valid pmd detected, which includes either THP or a PTE pgtable page. Now to pave way for PUD handling, this isn't enough. We need to generate mmu notifications even on PUD entries properly. mprotect() is currently broken on PUD (e.g., one can easily trigger kernel error with dax 1G mappings already), this is the start to fix it. To fix that, this patch proposes to push such notifications to the PUD layers. There is risk on regressing the problem Rik wanted to resolve before, but I think it shouldn't really happen, and I still chose this solution because of a few reasons: 1) Consider a large VM that should definitely contain more than GBs of memory, it's highly likely that PUDs are also none. In this case there will have no regression. 2) KVM has evolved a lot over the years to get rid of rmap walks, which might be the major cause of the previous soft-lockup. At least TDP MMU already got rid of rmap as long as not nested (which should be the major use case, IIUC), then the TDP MMU pgtable walker will simply see empty VM pgtable (e.g. EPT on x86), the invalidation of a full empty region in most cases could be pretty fast now, comparing to 2014. 3) KVM has explicit code paths now to even give way for mmu notifiers just like this one, e.g. in commit d02c357e5bfa ("KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing"). It'll also avoid contentions that may also contribute to a soft-lockup. 4) Stick with PMD layer simply don't work when PUD is there... We need one way or another to fix PUD mappings on mprotect(). Pushing it to PUD should be the safest approach as of now, e.g. there's yet no sign of huge P4D coming on any known archs. Cc: kvm@vger.kernel.org Cc: Sean Christopherson Cc: Paolo Bonzini Cc: David Rientjes Cc: Rik van Riel Signed-off-by: Peter Xu --- mm/mprotect.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 37cf8d249405..d423080e6509 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -363,9 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, unsigned long next; long pages =3D 0; unsigned long nr_huge_updates =3D 0; - struct mmu_notifier_range range; - - range.start =3D 0; =20 pmd =3D pmd_offset(pud, addr); do { @@ -383,14 +380,6 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, if (pmd_none(*pmd)) goto next; =20 - /* invoke the mmu notifier if the pmd is populated */ - if (!range.start) { - mmu_notifier_range_init(&range, - MMU_NOTIFY_PROTECTION_VMA, 0, - vma->vm_mm, addr, end); - mmu_notifier_invalidate_range_start(&range); - } - _pmd =3D pmdp_get_lockless(pmd); if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { if ((next - addr !=3D HPAGE_PMD_SIZE) || @@ -431,9 +420,6 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, cond_resched(); } while (pmd++, addr =3D next, addr !=3D end); =20 - if (range.start) - mmu_notifier_invalidate_range_end(&range); - if (nr_huge_updates) count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates); return pages; @@ -443,22 +429,36 @@ static inline long change_pud_range(struct mmu_gather= *tlb, struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { + struct mmu_notifier_range range; pud_t *pud; unsigned long next; long pages =3D 0, ret; =20 + range.start =3D 0; + pud =3D pud_offset(p4d, addr); do { next =3D pud_addr_end(addr, end); ret =3D change_prepare(vma, pud, pmd, addr, cp_flags); - if (ret) - return ret; + if (ret) { + pages =3D ret; + break; + } if (pud_none_or_clear_bad(pud)) continue; + if (!range.start) { + mmu_notifier_range_init(&range, + MMU_NOTIFY_PROTECTION_VMA, 0, + vma->vm_mm, addr, end); + mmu_notifier_invalidate_range_start(&range); + } pages +=3D change_pmd_range(tlb, vma, pud, addr, next, newprot, cp_flags); } while (pud++, addr =3D next, addr !=3D end); =20 + if (range.start) + mmu_notifier_invalidate_range_end(&range); + return pages; } =20 --=20 2.45.0 From nobody Sat Feb 7 23:14:22 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48533189BBF for ; Mon, 12 Aug 2024 18:12:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486361; cv=none; b=WidcrW+abEnkBhQccGWxNxHfzl1mS5BMcAWWNRMuisN7EjgFyQL30LW3oCxLno0K0AVzgmOchRX8nqjbER8aQyewr+hQPiI4giDO0HjNQgBGXxEy8Csk7HEp/Wp2i1GKQzpbytS8BnOhBWshuIaUP84R77snhxv5defQGkk8aBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486361; c=relaxed/simple; bh=Q9BRWExV0ScETKxkjA3f7Ihs4gMyMyZNmWdsKJun9SM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UDKmAZpW2NQmgtytlal0oTrWCrwWmwqs00+U/RZCCkKyr80qvHXJTRvxpMgt+5W8niKRpLl6eHTxAQrbqO8pPszxklC9OzH8SFOExONtFQJ3l47c5Yp8pXINm+pBHhFUeatXDM6qpyHgV4CBVqlr/QxbxK9mUVmQn42XQHFrjoM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=LEDLc3Yv; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LEDLc3Yv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723486358; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wAu0M0+1UWqmiItjD8rL7YnlMQXKv8wW1V5JPENrD98=; b=LEDLc3YvRwbCgJbZQuCnpDKkQ866ksh09MdLevvoplTyFZGxXKPgtrsb6PeqYrlzGmsMGn 5807NhXZ3SpPcLx44mNskwnJLzIMgQeGyVgeVmOOEWHoDtV3LJEE+KSqu7FZKCVYH3rExY v3kbu2ay5iZNFhi+FhRM3ktbcM5QE0k= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-118-kqFy8varPo-LW0erpvyumg-1; Mon, 12 Aug 2024 14:12:36 -0400 X-MC-Unique: kqFy8varPo-LW0erpvyumg-1 Received: by mail-qk1-f197.google.com with SMTP id af79cd13be357-7a1d3c02fcfso18662785a.0 for ; Mon, 12 Aug 2024 11:12:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723486356; x=1724091156; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wAu0M0+1UWqmiItjD8rL7YnlMQXKv8wW1V5JPENrD98=; b=xRxU5w1h3ePu22Kx1FUTry0uVcXn3nBsFtTYDwk2W/E9gHtMfbVXl7Rx36ArRRImTu qX4Dl4p6uL/kmHoPQcwdfnXjyE2U2M3289V5yRnSf8Lny43TGsRjvl+JaU6v5wX/SydR uI0QJHhzsKbmcB37iHHujZviTMQOes2fRqNa9dYsWZ2L2pwrPHOyKSyM9DtyYMgkbcMT w4VjdCaglGmFV1eFKgjoJR/kssZXfmsLrTfbZlRIhEhgCo9rMCMGweHlz96uH/1Yna8f J9DmDW4VsEnjhdLbG8yPOaN16gW6/LJp3ABX89/mZIAykGmtC+712Zx6mJ5J4FAqg9z3 iL3g== X-Gm-Message-State: AOJu0YzZhBLrUH9RvUhwuoq6bFEW7yDaoZXCMOkFc5Z30v80VKjtUQ8x Wxo38nqB473lOCVkLdokIeFkn9X85q+t1avjar6PWVDpSw803F46PSCXcoh6a3Q9PbzIcguqmj0 sfCDDXE5UIJ3gi8U1BCmTO/5sgGGyGN1fIEgbKSGR1cB2uUoNuKfdTYakfiCkhp/GrIRIS3D8RR ibZkT38/13lAa4b+Gl0lP//JPtmxOXfJ0Cf6pEbxAXnM0= X-Received: by 2002:a05:620a:4005:b0:79f:84f:80b1 with SMTP id af79cd13be357-7a4e15d46a6mr74883385a.7.1723486355882; Mon, 12 Aug 2024 11:12:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFpNsU7cY2l6XFvfi/NYSnYqbYktrA9n3rcHeLoBtv5kBl17wu9V8UOiRcR1CmKk37Q2l0U8w== X-Received: by 2002:a05:620a:4005:b0:79f:84f:80b1 with SMTP id af79cd13be357-7a4e15d46a6mr74878085a.7.1723486355340; Mon, 12 Aug 2024 11:12:35 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7dee013sm268663985a.84.2024.08.12.11.12.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Aug 2024 11:12:34 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Nicholas Piggin , David Hildenbrand , Matthew Wilcox , Andrew Morton , James Houghton , Huang Ying , "Aneesh Kumar K . V" , peterx@redhat.com, Vlastimil Babka , Rick P Edgecombe , Hugh Dickins , Borislav Petkov , Christophe Leroy , Michael Ellerman , Rik van Riel , Dan Williams , Mel Gorman , x86@kernel.org, Ingo Molnar , linuxppc-dev@lists.ozlabs.org, Dave Hansen , Dave Jiang , Oscar Salvador , Thomas Gleixner Subject: [PATCH v5 3/7] mm/powerpc: Add missing pud helpers Date: Mon, 12 Aug 2024 14:12:21 -0400 Message-ID: <20240812181225.1360970-4-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240812181225.1360970-1-peterx@redhat.com> References: <20240812181225.1360970-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some new helpers will be needed for pud entry updates soon. Introduce these helpers by referencing the pmd ones. Namely: - pudp_invalidate(): this helper invalidates a huge pud before a split happens, so that the invalidated pud entry will make sure no race will happen (either with software, like a concurrent zap, or hardware, like a/d bit lost). - pud_modify(): this helper applies a new pgprot to an existing huge pud mapping. For more information on why we need these two helpers, please refer to the corresponding pmd helpers in the mprotect() code path. Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: linuxppc-dev@lists.ozlabs.org Cc: Aneesh Kumar K.V Signed-off-by: Peter Xu --- arch/powerpc/include/asm/book3s/64/pgtable.h | 3 +++ arch/powerpc/mm/book3s64/pgtable.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index 519b1743a0f4..5da92ba68a45 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1124,6 +1124,7 @@ extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgpr= ot); extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot); +extern pud_t pud_modify(pud_t pud, pgprot_t newprot); extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd); extern void set_pud_at(struct mm_struct *mm, unsigned long addr, @@ -1384,6 +1385,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(s= truct mm_struct *mm, #define __HAVE_ARCH_PMDP_INVALIDATE extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long add= ress, pmd_t *pmdp); +extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long add= ress, + pud_t *pudp); =20 #define pmd_move_must_withdraw pmd_move_must_withdraw struct spinlock; diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/= pgtable.c index f4d8d3c40e5c..5a4a75369043 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -176,6 +176,17 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsi= gned long address, return __pmd(old_pmd); } =20 +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + unsigned long old_pud; + + VM_WARN_ON_ONCE(!pud_present(*pudp)); + old_pud =3D pud_hugepage_update(vma->vm_mm, address, pudp, _PAGE_PRESENT,= _PAGE_INVALID); + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return __pud(old_pud); +} + pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, int full) { @@ -259,6 +270,15 @@ pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) pmdv &=3D _HPAGE_CHG_MASK; return pmd_set_protbits(__pmd(pmdv), newprot); } + +pud_t pud_modify(pud_t pud, pgprot_t newprot) +{ + unsigned long pudv; + + pudv =3D pud_val(pud); + pudv &=3D _HPAGE_CHG_MASK; + return pud_set_protbits(__pud(pudv), newprot); +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 /* For use by kexec, called with MMU off */ --=20 2.45.0 From nobody Sat Feb 7 23:14:22 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E98C618E032 for ; Mon, 12 Aug 2024 18:12:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486363; cv=none; b=Mo8c5NsO++MiylIUcIwyjIco0IihwEnKI2nn22MfrTa8SgT1GPfpoOgWThbHZN4y1TNdSVf9u8FMz2Q9FIHn9VxkbSB7+19BU2G1eiIiLxhh4YHDn36oo/420TJ906sYB6pbuwnKlHRpqRwxl/kY9eJj+5ziGpvTSrRJ/L58apY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486363; c=relaxed/simple; bh=eVlNM3RWSSKYGV6tJtfXOIxVdytT5Dtgp/zet5o+aYg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mcqAKfIFB2nLRgj4HDoku4M1rBDW3E7EGWHZUrpXg0mUi/VLoyFtcg+ZQu+o3VjpMNvvTwRAbs9YmwjvG3VjznnVQgIZEvsXx6N+5rN5ZbQKnPCXD4XvTcWqaOYM47YJ0DLfCFcuV4Ue1sQgIvldkOEzRG/RhCFDbszN1sVynsU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hjqWdEIw; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hjqWdEIw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723486360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=51AsM4wPs8z1MhF0DnEEQJIHR/NiBzvmhfDXZZK2UA8=; b=hjqWdEIwtOeHy3+I0WAw2YGaQSp6lv+OQ5wjtuKSKgkHR1zVkp7ygtdVm3z904LFG3lcdo /PJihmCTA929zFA8ArfcPep5CtkhAZRwLDVKCM3CIZsNGDlDeP7MRfl2h3LNVWZwJNlp80 42AJ1ZzuZ+joGeuKPld3fmDvRLnfukc= Received: from mail-vs1-f69.google.com (mail-vs1-f69.google.com [209.85.217.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-635-AO90M49PPLqvhLtaJLhrTg-1; Mon, 12 Aug 2024 14:12:39 -0400 X-MC-Unique: AO90M49PPLqvhLtaJLhrTg-1 Received: by mail-vs1-f69.google.com with SMTP id ada2fe7eead31-4929d50431bso219676137.2 for ; Mon, 12 Aug 2024 11:12:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723486358; x=1724091158; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=51AsM4wPs8z1MhF0DnEEQJIHR/NiBzvmhfDXZZK2UA8=; b=A4rMFfh0HX4EXxyqfWL/t1+v2N3AhxFbhVWorfDpf+z0vjbwiNTw7h0HXXxFLXctO4 gd6kKuBB42ehOBVtRP0FGCsyPI31RDrJTMNo6nawhCbe6jkLiX3BNHD95LH7OEOeIJKK 9XHD1DzGAjR16BP4OdoRgk9j4LhNn09VJknC1s/Uoh4MkC8ely10fZv3r/smS/gL0m/v jCDQA6RA7SI5iopf+ahTJlnY3ZV/JZSW6al/iVGo+B39nvr1UltPi02p56NKPEhpfJ11 zIGlezH82NItcc8xDq+FC4fp/YJ+ZA7spZzb6l1pDj4a0RpE/AOgHCUpB9wFUOQO/7MO slNw== X-Gm-Message-State: AOJu0YwfHFVUt/+/0qVdbYTr8PMk8PC1jhqipsAyKPqMD2+FmXPqqPYT iUFuuuJSxUgHiNLz0OJhTLPoUCY7JNqR9xDlaObOcqGJ2zm0znVuIPUBHN+3Ub/cI5DqCG4sOza UNIOsSNWTkFnMehlqTjJ8xHthrJFjoRh1FzKwAIk/k54P6A+2xBe44l/MVeuGOgzO/gJ80Lakdp tDtUbt4yMwLFTMlWdDxDiy7XgcLqaPLiyl3jAo/uezxhg= X-Received: by 2002:a05:6102:3ecb:b0:48f:1db0:e268 with SMTP id ada2fe7eead31-49743b3bcadmr808484137.3.1723486358258; Mon, 12 Aug 2024 11:12:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHqoB9/0mESDqYSEBNTj9AtHyzcMfWYv34FSY6okF36kkcu051ygwC1i1HRAV7+0kWzcEuDbQ== X-Received: by 2002:a05:6102:3ecb:b0:48f:1db0:e268 with SMTP id ada2fe7eead31-49743b3bcadmr808432137.3.1723486357628; Mon, 12 Aug 2024 11:12:37 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7dee013sm268663985a.84.2024.08.12.11.12.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Aug 2024 11:12:37 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Nicholas Piggin , David Hildenbrand , Matthew Wilcox , Andrew Morton , James Houghton , Huang Ying , "Aneesh Kumar K . V" , peterx@redhat.com, Vlastimil Babka , Rick P Edgecombe , Hugh Dickins , Borislav Petkov , Christophe Leroy , Michael Ellerman , Rik van Riel , Dan Williams , Mel Gorman , x86@kernel.org, Ingo Molnar , linuxppc-dev@lists.ozlabs.org, Dave Hansen , Dave Jiang , Oscar Salvador , Thomas Gleixner Subject: [PATCH v5 4/7] mm/x86: Make pud_leaf() only care about PSE bit Date: Mon, 12 Aug 2024 14:12:22 -0400 Message-ID: <20240812181225.1360970-5-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240812181225.1360970-1-peterx@redhat.com> References: <20240812181225.1360970-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When working on mprotect() on 1G dax entries, I hit an zap bad pud error when zapping a huge pud that is with PROT_NONE permission. Here the problem is x86's pud_leaf() requires both PRESENT and PSE bits set to report a pud entry as a leaf, but that doesn't look right, as it's not following the pXd_leaf() definition that we stick with so far, where PROT_NONE entries should be reported as leaves. To fix it, change x86's pud_leaf() implementation to only check against PSE bit to report a leaf, irrelevant of whether PRESENT bit is set. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x86@kernel.org Acked-by: Dave Hansen Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- arch/x86/include/asm/pgtable.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e39311a89bf4..a2a3bd4c1bda 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1078,8 +1078,7 @@ static inline pmd_t *pud_pgtable(pud_t pud) #define pud_leaf pud_leaf static inline bool pud_leaf(pud_t pud) { - return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) =3D=3D - (_PAGE_PSE | _PAGE_PRESENT); + return pud_val(pud) & _PAGE_PSE; } =20 static inline int pud_bad(pud_t pud) --=20 2.45.0 From nobody Sat Feb 7 23:14:22 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 385DC18E764 for ; Mon, 12 Aug 2024 18:12:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486364; cv=none; b=evAJrt0VhXi278JMR2/EztarR2b9I2888S7OH4HqW3TJsQvoYL3XqVs4+KbfAjAEEjqVSia+VeiwkGGM/vWD2ihC1ay97KFtlbd7Vm5iJHuSxzsGND9FwU4V6e9QqJIpxoWce0mOdrNzQztqUKWclqEmX2gZReT9+QbQu8BZ9wE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486364; c=relaxed/simple; bh=Cg+h4Co4PHJshxw6DV3hzIA306WhOch+JjvPDHPW480=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LEiObStwsxaXUXTjgACd9sftEu9eB78mLWxFYcihzUL97neb5BkUKV5ffv8GrrsjGFXEB3bjGWib/9lq1o0WR0/9wwxKQ99UPQ+OUEVXg2sgUu12WZudW4xF/lfkCET5KuFfeqQ8ip0qWXKFucWLf1QDurbVMgcWGIyARvY3Hak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BfKe42/b; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BfKe42/b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723486362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vL+nA7AT6HH2fptU3fDtubaziCDql6jx+j3l3b3Xtg8=; b=BfKe42/bRHXNqVfrNHN/VS3QIpxmD86GOpGOXws/q/xmVthKvSHIPvubxCCd5lWASCfbxG RZS9QWi8gQaopCoav/0NNaq8Rlc7w1zAr72LCvd/UPCdU0787tJIP3z1u39I4MMuHpZ2H0 qXLpaNaXq/YeLb6UPSWaB1g2UJhr2hI= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-679-YACrwIVCO_yUtGVWGy__IQ-1; Mon, 12 Aug 2024 14:12:41 -0400 X-MC-Unique: YACrwIVCO_yUtGVWGy__IQ-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7a1e41eda5aso13487885a.3 for ; Mon, 12 Aug 2024 11:12:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723486360; x=1724091160; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vL+nA7AT6HH2fptU3fDtubaziCDql6jx+j3l3b3Xtg8=; b=oaZ1sBkePbQn09V2wwMe7pUZ6XogtvI20j/8zn2Yx/Q7OSW5U+zF0d19FJ+ojwVfHO 27VMwtO4L1yLAMQdmt7Z47yx2CZcgIlx7srnlEdr+B0urc/PjhhHj3L6GNVyeo7unwup 4j9Pio9wsV6ZyPpKk1rjaSTJPJ53FInZMt94gURrEBQJxzGCRERdKlUB+ZNsK3cFeXS9 bdpnRJx3qQc+rm1/GqMzlxSQa4ymhRvw43owsSGUfqbOZoOKiAAjuvQflAgNNEltfOe3 fHCuXRF79VNa2Q55d7fpgNwSCkKUryr3jQ7m/Stqoei0sg6Cc0EnOBC1fwTcShcssdGM UKQA== X-Gm-Message-State: AOJu0YxceXqVBAY9NK7QtGufwbXC6fnZAyxuGxUe5NhsLlTWIvEhnTfj B1fX/3FNYsIIIi8V5m36bAzBSFQ7D8niO0NhmzlvxR/afSG8OsAr1Y26J0ZmnrVoDTmeiQdEr1r gqs3N2eio8Ee/MYclK9buLXDGEXpqQbymKuvBRVsCldUc8glSRL77jW946pjY+z+VcGnri98xxg pyAHuc/FXQtOXAqMJ1Lw3BoDQ2kPc25ecv2TAAOlDdMR8= X-Received: by 2002:a05:620a:d95:b0:7a1:5683:b04b with SMTP id af79cd13be357-7a4e1625378mr68906285a.9.1723486360265; Mon, 12 Aug 2024 11:12:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHermK5CePcyhPQZfo7QtCxftIfNR9zVAKZznJ8AtnLMjyZZZpq0Zdfjnp4iDQgBs3zunmzLQ== X-Received: by 2002:a05:620a:d95:b0:7a1:5683:b04b with SMTP id af79cd13be357-7a4e1625378mr68901885a.9.1723486359781; Mon, 12 Aug 2024 11:12:39 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7dee013sm268663985a.84.2024.08.12.11.12.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Aug 2024 11:12:39 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Nicholas Piggin , David Hildenbrand , Matthew Wilcox , Andrew Morton , James Houghton , Huang Ying , "Aneesh Kumar K . V" , peterx@redhat.com, Vlastimil Babka , Rick P Edgecombe , Hugh Dickins , Borislav Petkov , Christophe Leroy , Michael Ellerman , Rik van Riel , Dan Williams , Mel Gorman , x86@kernel.org, Ingo Molnar , linuxppc-dev@lists.ozlabs.org, Dave Hansen , Dave Jiang , Oscar Salvador , Thomas Gleixner Subject: [PATCH v5 5/7] mm/x86: Implement arch_check_zapped_pud() Date: Mon, 12 Aug 2024 14:12:23 -0400 Message-ID: <20240812181225.1360970-6-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240812181225.1360970-1-peterx@redhat.com> References: <20240812181225.1360970-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce arch_check_zapped_pud() to sanity check shadow stack on PUD zaps. It has the same logic as the PMD helper. One thing to mention is, it might be a good idea to use page_table_check in the future for trapping wrong setups of shadow stack pgtable entries [1]. That is left for the future as a separate effort. [1] https://lore.kernel.org/all/59d518698f664e07c036a5098833d7b56b953305.ca= mel@intel.com Cc: "Edgecombe, Rick P" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x86@kernel.org Acked-by: David Hildenbrand Signed-off-by: Peter Xu --- arch/x86/include/asm/pgtable.h | 10 ++++++++++ arch/x86/mm/pgtable.c | 6 ++++++ include/linux/pgtable.h | 6 ++++++ mm/huge_memory.c | 4 +++- 4 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a2a3bd4c1bda..fdb8ac9e7030 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -174,6 +174,13 @@ static inline int pud_young(pud_t pud) return pud_flags(pud) & _PAGE_ACCESSED; } =20 +static inline bool pud_shstk(pud_t pud) +{ + return cpu_feature_enabled(X86_FEATURE_SHSTK) && + (pud_flags(pud) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) =3D=3D + (_PAGE_DIRTY | _PAGE_PSE); +} + static inline int pte_write(pte_t pte) { /* @@ -1667,6 +1674,9 @@ void arch_check_zapped_pte(struct vm_area_struct *vma= , pte_t pte); #define arch_check_zapped_pmd arch_check_zapped_pmd void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd); =20 +#define arch_check_zapped_pud arch_check_zapped_pud +void arch_check_zapped_pud(struct vm_area_struct *vma, pud_t pud); + #ifdef CONFIG_XEN_PV #define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young static inline bool arch_has_hw_nonleaf_pmd_young(void) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index f5931499c2d6..36e7139a61d9 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -926,3 +926,9 @@ void arch_check_zapped_pmd(struct vm_area_struct *vma, = pmd_t pmd) VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && pmd_shstk(pmd)); } + +void arch_check_zapped_pud(struct vm_area_struct *vma, pud_t pud) +{ + /* See note in arch_check_zapped_pte() */ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && pud_shstk(pud)); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 2a6a3cccfc36..780f3b439d98 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -447,6 +447,12 @@ static inline void arch_check_zapped_pmd(struct vm_are= a_struct *vma, } #endif =20 +#ifndef arch_check_zapped_pud +static inline void arch_check_zapped_pud(struct vm_area_struct *vma, pud_t= pud) +{ +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0024266dea0a..81c5da0708ed 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2293,12 +2293,14 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_= area_struct *vma, pud_t *pud, unsigned long addr) { spinlock_t *ptl; + pud_t orig_pud; =20 ptl =3D __pud_trans_huge_lock(pud, vma); if (!ptl) return 0; =20 - pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); + orig_pud =3D pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); + arch_check_zapped_pud(vma, orig_pud); tlb_remove_pud_tlb_entry(tlb, pud, addr); if (vma_is_special_huge(vma)) { spin_unlock(ptl); --=20 2.45.0 From nobody Sat Feb 7 23:14:22 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2AFC192B65 for ; Mon, 12 Aug 2024 18:12:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486367; cv=none; b=cyjQKPQm8RBH9eEfGwkVYioRP1gvAOqOqfU/6OVauEwpujKGhCJOQq56XGtBLdluMQe8Q98WnJUZIJNR5C52kSkg3lkpVYYB5eTLLIVpptihebRQLl6bOLBwbZe0yBqudEHrirItpG3W+2p4t9NXeOa7EFAqjW1KGc5IJS6QJtg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486367; c=relaxed/simple; bh=K7ZJhR/anPfS9VjsoLzJfQ2pZfXOc5WtMvlRrlxDC4w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gJo15WQ3fMxCpJ7MKvRsGFi8xXDfUcD3yX8M0njZ3ognyPcZSm3JFyovb2GmTwetwTn5nTFXlPYMBvudoCwYDHYTPyNP327mVnm6o3aO1VNhDqcu8QmRdgBQn8MyAZXYkFLHtcC7LNdhk7/bDEsUjM2C2Adhw+8IGrZ33JjwUkE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Jv3P17X8; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Jv3P17X8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723486364; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j48ti3LHCtaDfvs8kVCshb+r8YjKqoQPQoRoRoS7ZWU=; b=Jv3P17X8mjzSYLEnyLyZxS2Fcc+ifBAWbcXy1KCS1EcX5ZC/8zqvELCwQPpeP5BRKS5qn5 thI1MAAVu5OOFIlqjzrrz+mHxzHZJkgPkOCYs2XhwcqdMvZLKXsWql9tqn2zThzAKGvLr7 wdb3YGLzY5LOJgxB6idMR57Z+RO7te0= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-138-dd4hPUdXMiGNxMNTttk8bw-1; Mon, 12 Aug 2024 14:12:43 -0400 X-MC-Unique: dd4hPUdXMiGNxMNTttk8bw-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6b95c005dbcso13023196d6.1 for ; Mon, 12 Aug 2024 11:12:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723486362; x=1724091162; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j48ti3LHCtaDfvs8kVCshb+r8YjKqoQPQoRoRoS7ZWU=; b=HEzA7WKmVRz+I8n4jVRnrLitnS/cMKYooNTm6zvlhWB1annJvxeXyYeUEdnhFXxbdI x4kXvt+5JPCMp9j0My98Ha3b24JOcfazVHh3hzPRlwOKz4PT8+cbH2hsBzKsAY2tcdrH czidSi9Qua8B1Zkc8lspjKoHYbtaPCONyXso960tPAuk+77FEy3M68KucIGe8luyEjiE VdbX1kuqA4Umzv9hhVzqetIEpGAu4Mzr9jDFiNjicGcoY1X2ky0GeKvMoDv3d5ZHKxSz xIv9EafKUS3JZ2Ire7/pNhQQ1Yr+W0wWt2oidRuNb6agQJQNybgkVf4ahgEjPMGLZViU e/dA== X-Gm-Message-State: AOJu0Yzw9qUjVC7R4m1QGpuKz073AkSLcBnr7BhQGXNTsCeR0XeIe/gx o3SEBrRYzYr7cIlR6/bqvcPOGMD6Y3pfv8TFsydKQN/i/KCZfuNL2R8M3HzT7pW5fyLMuiFG0an 42ms0vjjksDYY3BFX7gZHCreSKEsN411yhXHw5QWWckTN/7mfEVHmEHNu9tJL5coDqwkPe3RCdu j3RtmP/sv4dmHP4P80b6pYkYE/fFHoWVaBypPAMPgMntY= X-Received: by 2002:a05:620a:4609:b0:79d:6273:9993 with SMTP id af79cd13be357-7a4e15be5b1mr69033285a.6.1723486362543; Mon, 12 Aug 2024 11:12:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFnMgvJOIkVw2s4SdK0L87wmxjCzEZ4UjgHz1yAaoa3V+0fWl6Gh90ffNusNtYhZ+374uYqxw== X-Received: by 2002:a05:620a:4609:b0:79d:6273:9993 with SMTP id af79cd13be357-7a4e15be5b1mr69028285a.6.1723486361844; Mon, 12 Aug 2024 11:12:41 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7dee013sm268663985a.84.2024.08.12.11.12.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Aug 2024 11:12:41 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Nicholas Piggin , David Hildenbrand , Matthew Wilcox , Andrew Morton , James Houghton , Huang Ying , "Aneesh Kumar K . V" , peterx@redhat.com, Vlastimil Babka , Rick P Edgecombe , Hugh Dickins , Borislav Petkov , Christophe Leroy , Michael Ellerman , Rik van Riel , Dan Williams , Mel Gorman , x86@kernel.org, Ingo Molnar , linuxppc-dev@lists.ozlabs.org, Dave Hansen , Dave Jiang , Oscar Salvador , Thomas Gleixner Subject: [PATCH v5 6/7] mm/x86: Add missing pud helpers Date: Mon, 12 Aug 2024 14:12:24 -0400 Message-ID: <20240812181225.1360970-7-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240812181225.1360970-1-peterx@redhat.com> References: <20240812181225.1360970-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some new helpers will be needed for pud entry updates soon. Introduce these helpers by referencing the pmd ones. Namely: - pudp_invalidate(): this helper invalidates a huge pud before a split happens, so that the invalidated pud entry will make sure no race will happen (either with software, like a concurrent zap, or hardware, like a/d bit lost). - pud_modify(): this helper applies a new pgprot to an existing huge pud mapping. For more information on why we need these two helpers, please refer to the corresponding pmd helpers in the mprotect() code path. When at it, simplify the pud_modify()/pmd_modify() comments on shadow stack pgtable entries to reference pte_modify() to avoid duplicating the whole paragraph three times. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x86@kernel.org Signed-off-by: Peter Xu --- arch/x86/include/asm/pgtable.h | 57 +++++++++++++++++++++++++++++----- arch/x86/mm/pgtable.c | 12 +++++++ 2 files changed, 61 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index fdb8ac9e7030..8d12bfad6a1d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -787,6 +787,12 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); } =20 +static inline pud_t pud_mkinvalid(pud_t pud) +{ + return pfn_pud(pud_pfn(pud), + __pgprot(pud_flags(pud) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); +} + static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); =20 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) @@ -834,14 +840,8 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t new= prot) pmd_result =3D __pmd(val); =20 /* - * To avoid creating Write=3D0,Dirty=3D1 PMDs, pte_modify() needs to avoi= d: - * 1. Marking Write=3D0 PMDs Dirty=3D1 - * 2. Marking Dirty=3D1 PMDs Write=3D0 - * - * The first case cannot happen because the _PAGE_CHG_MASK will filter - * out any Dirty bit passed in newprot. Handle the second case by - * going through the mksaveddirty exercise. Only do this if the old - * value was Write=3D1 to avoid doing this on Shadow Stack PTEs. + * Avoid creating shadow stack PMD by accident. See comment in + * pte_modify(). */ if (oldval & _PAGE_RW) pmd_result =3D pmd_mksaveddirty(pmd_result); @@ -851,6 +851,29 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t new= prot) return pmd_result; } =20 +static inline pud_t pud_modify(pud_t pud, pgprot_t newprot) +{ + pudval_t val =3D pud_val(pud), oldval =3D val; + pud_t pud_result; + + val &=3D _HPAGE_CHG_MASK; + val |=3D check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val =3D flip_protnone_guard(oldval, val, PHYSICAL_PUD_PAGE_MASK); + + pud_result =3D __pud(val); + + /* + * Avoid creating shadow stack PUD by accident. See comment in + * pte_modify(). + */ + if (oldval & _PAGE_RW) + pud_result =3D pud_mksaveddirty(pud_result); + else + pud_result =3D pud_clear_saveddirty(pud_result); + + return pud_result; +} + /* * mprotect needs to preserve PAT and encryption bits when updating * vm_page_prot @@ -1389,10 +1412,28 @@ static inline pmd_t pmdp_establish(struct vm_area_s= truct *vma, } #endif =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static inline pud_t pudp_establish(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp, pud_t pud) +{ + page_table_check_pud_set(vma->vm_mm, pudp, pud); + if (IS_ENABLED(CONFIG_SMP)) { + return xchg(pudp, pud); + } else { + pud_t old =3D *pudp; + WRITE_ONCE(*pudp, pud); + return old; + } +} +#endif + #define __HAVE_ARCH_PMDP_INVALIDATE_AD extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); =20 +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp); + /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 36e7139a61d9..5745a354a241 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -641,6 +641,18 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, u= nsigned long address, } #endif =20 +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + VM_WARN_ON_ONCE(!pud_present(*pudp)); + pud_t old =3D pudp_establish(vma, address, pudp, pud_mkinvalid(*pudp)); + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return old; +} +#endif + /** * reserve_top_address - reserves a hole in the top of kernel address space * @reserve - size of hole to reserve --=20 2.45.0 From nobody Sat Feb 7 23:14:22 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C550193063 for ; Mon, 12 Aug 2024 18:12:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486371; cv=none; b=LLrZfLq6VzyWkOBpVzclvISerdRNAFYEhL2uKSxyzycAw/j6GtnR4QVvQ2JAzkKlJ28fWHsBu9U0wk/a6xenG6kLA8c68nkSnzLrD9MZ+BBb0F7AbkdGxardy65DiXSOOFxNFhoqmdYQv9UHi50k3aSzkuPVe/JZISCOrg15jK4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723486371; c=relaxed/simple; bh=egk112t/6BSuiTyKSVUoJ65Hf838PJyUoM/Z74+luwA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZEwUr9e2Qxz3tAA15c6TjybyDmBr9sf+y0wp4Jd2NqVaGfb0KpAffrMcab+BeBk9iMLDmJ7JA78wr5uj3SsZ3shKB0ZiyfjFio10gQZbHmoEla8XIRnZu44AsiK/X2lb/mz/acxz8Rh0B+4A8cC+y6QXburJNJSPs7tK24opR/A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UEnKrTSo; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UEnKrTSo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723486368; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A/6PEtzQTa+drbYKkEnodTZuUMYm4tSA4vStMaJG0Ro=; b=UEnKrTSobfuJJMruYMfG9zUfFX85yzoth4kuyLLKQ07HEMeRx/b97SI2p5SlpRHwU+8QxI VV8I9h8C+l3x3wmY8CISO1QRU179+hw9J1pYTnrbKtPkYwsnOF2EXkFLeXCSRDpxCZDH5k GhDEBI2jap3KQG8peg6v4KHNtZUg5MM= Received: from mail-vs1-f70.google.com (mail-vs1-f70.google.com [209.85.217.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-282-Gb20Htx1MOilRfxudOpp-w-1; Mon, 12 Aug 2024 14:12:45 -0400 X-MC-Unique: Gb20Htx1MOilRfxudOpp-w-1 Received: by mail-vs1-f70.google.com with SMTP id ada2fe7eead31-4928d06cfebso184724137.1 for ; Mon, 12 Aug 2024 11:12:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723486364; x=1724091164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A/6PEtzQTa+drbYKkEnodTZuUMYm4tSA4vStMaJG0Ro=; b=TZBSCmRaRopGQoEG7SFmv1xruGDao+Akk8RErfxjX6sQ7js9DCaBI69nJpnQ0F23Dc F3/zfvW9VG4g0fuAuotKvnf0M6TcTjUkCjXK4oYyNtW8zSMJlet1WCMichp8CcZd3KWG UNzhkKb9e/au5HNHh/VMh75X9VYB+XznNqI3RexhtG8Sfi22vI3+ph8yuoxpRl9zlf7b cjZEzWmKMqhKWlqL6NMO5wDZvlSUsdyWeBggvV2LLISpANK0QbNpmODUqHLKgUZ/mhBw yK8oAkXlEySq6EUngcIWLudpCKVO61kvjZ85yMLwfXqAZ2KaYACFuGVhKeu9fyhWOLMC 1cIw== X-Gm-Message-State: AOJu0YyVXb2osORwr1YwkFjb/nA+hvekijYkxy07bl+eUGsKg5R+ENEV ZdVvZ5b2W35GoPLcp45Ks5z/Qt0iPDpFbemVQaYLb5F3LxvIWne/jyFFzd7kdNdwu7sWibbAvQq qd52zyDrRtBsjblnqDdk6LMKPxINWoNWAS3JuAOPpaGeJcn5KID33tY3sh1+a+sXLquZVvaxf61 za0ID6UzeHaINWFuk1XVgdD1C1ygn+T/PEjdlPRgY2tJU= X-Received: by 2002:a05:6102:38d1:b0:493:31f9:d14e with SMTP id ada2fe7eead31-4974398cf6amr785279137.2.1723486364510; Mon, 12 Aug 2024 11:12:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEGg++VQElyCH8BIocMaINRG5gLiISp0Xxb9L94RctCtM0eBivu+dVoRsXo8+SybwrNJQcpMw== X-Received: by 2002:a05:6102:38d1:b0:493:31f9:d14e with SMTP id ada2fe7eead31-4974398cf6amr785238137.2.1723486364018; Mon, 12 Aug 2024 11:12:44 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7dee013sm268663985a.84.2024.08.12.11.12.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Aug 2024 11:12:43 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Nicholas Piggin , David Hildenbrand , Matthew Wilcox , Andrew Morton , James Houghton , Huang Ying , "Aneesh Kumar K . V" , peterx@redhat.com, Vlastimil Babka , Rick P Edgecombe , Hugh Dickins , Borislav Petkov , Christophe Leroy , Michael Ellerman , Rik van Riel , Dan Williams , Mel Gorman , x86@kernel.org, Ingo Molnar , linuxppc-dev@lists.ozlabs.org, Dave Hansen , Dave Jiang , Oscar Salvador , Thomas Gleixner Subject: [PATCH v5 7/7] mm/mprotect: fix dax pud handlings Date: Mon, 12 Aug 2024 14:12:25 -0400 Message-ID: <20240812181225.1360970-8-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240812181225.1360970-1-peterx@redhat.com> References: <20240812181225.1360970-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is only relevant to the two archs that support PUD dax, aka, x86_64 and ppc64. PUD THPs do not yet exist elsewhere, and hugetlb PUDs do not count in this case. DAX have had PUD mappings for years, but change protection path never worked. When the path is triggered in any form (a simple test program would be: call mprotect() on a 1G dev_dax mapping), the kernel will report "bad pud". This patch should fix that. The new change_huge_pud() tries to keep everything simple. For example, it doesn't optimize write bit as that will need even more PUD helpers. It's not too bad anyway to have one more write fault in the worst case once for 1G range; may be a bigger thing for each PAGE_SIZE, though. Neither does it support userfault-wp bits, as there isn't such PUD mappings that is supported; file mappings always need a split there. The same to TLB shootdown: the pmd path (which was for x86 only) has the trick of using _ad() version of pmdp_invalidate*() which can avoid one redundant TLB, but let's also leave that for later. Again, the larger the mapping, the smaller of such effect. There's some difference on handling "retry" for change_huge_pud() (where it can return 0): it isn't like change_huge_pmd(), as the pmd version is safe with all conditions handled in change_pte_range() later, thanks to Hugh's new pte_offset_map_lock(). In short, change_pte_range() is simply smarter. For that, change_pud_range() will need proper retry if it races with something else when a huge PUD changed from under us. The last thing to mention is currently the PUD path ignores the huge pte numa counter (NUMA_HUGE_PTE_UPDATES), not only because DAX is not applicable to NUMA, but also that it's ambiguous on its own to decide how to account pud in this case. In one earlier version of this patchset I proposed to remove the counter as it doesn't even look right to do the accounting as of now [1], but then a further discussion suggests we can leave that for later, as that doesn't block this series if we choose to ignore that counter. That's what this patch does, by ignoring it. When at it, touch up the comment in pgtable_split_needed() to make it generic to either pmd or pud file THPs. [1] https://lore.kernel.org/all/20240715192142.3241557-3-peterx@redhat.com/ [2] https://lore.kernel.org/r/added2d0-b8be-4108-82ca-1367a388d0b1@redhat.c= om Cc: Dan Williams Cc: Matthew Wilcox Cc: Dave Jiang Cc: Hugh Dickins Cc: Kirill A. Shutemov Cc: Vlastimil Babka Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: Michael Ellerman Cc: Aneesh Kumar K.V Cc: Oscar Salvador Cc: x86@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepa= ges") Fixes: 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage") Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 24 +++++++++++++++++++ mm/huge_memory.c | 52 +++++++++++++++++++++++++++++++++++++++++ mm/mprotect.c | 39 ++++++++++++++++++++++++------- 3 files changed, 107 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ce44caa40eed..6370026689e0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -342,6 +342,17 @@ void split_huge_pmd_address(struct vm_area_struct *vma= , unsigned long address, void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address); =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags); +#else +static inline int +change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) { return 0; } +#endif + #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud =3D (__pud); \ @@ -585,6 +596,19 @@ static inline int next_order(unsigned long *orders, in= t prev) { return 0; } + +static inline void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ +} + +static inline int change_huge_pud(struct mmu_gather *tlb, + struct vm_area_struct *vma, pud_t *pudp, + unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 static inline int split_folio_to_list_to_order(struct folio *folio, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 81c5da0708ed..0aafd26d7a53 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2114,6 +2114,53 @@ int change_huge_pmd(struct mmu_gather *tlb, struct v= m_area_struct *vma, return ret; } =20 +/* + * Returns: + * + * - 0: if pud leaf changed from under us + * - 1: if pud can be skipped + * - HPAGE_PUD_NR: if pud was successfully processed + */ +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm =3D vma->vm_mm; + pud_t oldpud, entry; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PUD_SIZE); + + /* NUMA balancing doesn't apply to dax */ + if (cp_flags & MM_CP_PROT_NUMA) + return 1; + + /* + * Huge entries on userfault-wp only works with anonymous, while we + * don't have anonymous PUDs yet. + */ + if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) + return 1; + + ptl =3D __pud_trans_huge_lock(pudp, vma); + if (!ptl) + return 0; + + /* + * Can't clear PUD or it can race with concurrent zapping. See + * change_huge_pmd(). + */ + oldpud =3D pudp_invalidate(vma, addr, pudp); + entry =3D pud_modify(oldpud, newprot); + set_pud_at(mm, addr, pudp, entry); + tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); + + spin_unlock(ptl); + return HPAGE_PUD_NR; +} +#endif + #ifdef CONFIG_USERFAULTFD /* * The PT lock for src_pmd and dst_vma/src_vma (for reading) are locked by @@ -2344,6 +2391,11 @@ void __split_huge_pud(struct vm_area_struct *vma, pu= d_t *pud, spin_unlock(ptl); mmu_notifier_invalidate_range_end(&range); } +#else +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ +} #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ =20 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, diff --git a/mm/mprotect.c b/mm/mprotect.c index d423080e6509..446f8e5f10d9 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -302,8 +302,9 @@ pgtable_split_needed(struct vm_area_struct *vma, unsign= ed long cp_flags) { /* * pte markers only resides in pte level, if we need pte markers, - * we need to split. We cannot wr-protect shmem thp because file - * thp is handled differently when split by erasing the pmd so far. + * we need to split. For example, we cannot wr-protect a file thp + * (e.g. 2M shmem) because file thp is handled differently when + * split by erasing the pmd so far. */ return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma); } @@ -430,31 +431,53 @@ static inline long change_pud_range(struct mmu_gather= *tlb, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { struct mmu_notifier_range range; - pud_t *pud; + pud_t *pudp, pud; unsigned long next; long pages =3D 0, ret; =20 range.start =3D 0; =20 - pud =3D pud_offset(p4d, addr); + pudp =3D pud_offset(p4d, addr); do { +again: next =3D pud_addr_end(addr, end); - ret =3D change_prepare(vma, pud, pmd, addr, cp_flags); + ret =3D change_prepare(vma, pudp, pmd, addr, cp_flags); if (ret) { pages =3D ret; break; } - if (pud_none_or_clear_bad(pud)) + + pud =3D READ_ONCE(*pudp); + if (pud_none(pud)) continue; + if (!range.start) { mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0, vma->vm_mm, addr, end); mmu_notifier_invalidate_range_start(&range); } - pages +=3D change_pmd_range(tlb, vma, pud, addr, next, newprot, + + if (pud_leaf(pud)) { + if ((next - addr !=3D PUD_SIZE) || + pgtable_split_needed(vma, cp_flags)) { + __split_huge_pud(vma, pudp, addr); + goto again; + } else { + ret =3D change_huge_pud(tlb, vma, pudp, + addr, newprot, cp_flags); + if (ret =3D=3D 0) + goto again; + /* huge pud was handled */ + if (ret =3D=3D HPAGE_PUD_NR) + pages +=3D HPAGE_PUD_NR; + continue; + } + } + + pages +=3D change_pmd_range(tlb, vma, pudp, addr, next, newprot, cp_flags); - } while (pud++, addr =3D next, addr !=3D end); + } while (pudp++, addr =3D next, addr !=3D end); =20 if (range.start) mmu_notifier_invalidate_range_end(&range); --=20 2.45.0