From nobody Tue Dec 16 19:40:02 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B3D2208CA for ; Fri, 21 Jun 2024 14:25:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979913; cv=none; b=RfV9AEN4lCXMR9yeVKXQOHn17aBtSn61c6HAPPKplXtp1CQ0PiMpdPLzINndv9kNpMYQR7lJFBto2+kujIbRWqejV+WY2C5D9Ss6fijNyOnDeB2CJ64fb0O0+V1u/bX72nO2KCz+cD1K4pH6vEfeGkMg23F2GES8y9HcBd/ZCOU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979913; c=relaxed/simple; bh=B0IpMvJzke/973HZPlLZKBuCQcswwxmVEGuJbnrmXVU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c0roAcPq6Ij56PQ+vt+u4Be+wdfhWshXSnT3JxyXG5AxlQf2SdqDrfEzbkIppj56seuWf941pfNaK+QuHbhsSiC/HqvRxiNoVO4tKPHfcS7QJwBNXbF/3+oGYlVfejDnWQwsEAIS7QL7pJ4rNZsjDNMFw5p271sV0D/uFJ/BUSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hztr9O+9; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hztr9O+9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718979911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xrFUBbcc9D0ORnx0qoNz81Y3Ewmb+xGo+hBykZs/BiM=; b=hztr9O+9RQjLft0R3OQFTYkSEnXqTJGyf3snN6xN4n/3QIrB9Bz+kM/LP/axzC9pff2teG StaDaOhESiKbjq6OyqmnbjBjImQ9Xky+OdUVVVDHpbF88zAL1gv2M2FQqL6Vwc6rQTcM7t 1nyjHTOIbWeiNwp9x06DLKAkeie5/Tk= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-674-hujXe9HNPFqsONgCzwvAMQ-1; Fri, 21 Jun 2024 10:25:10 -0400 X-MC-Unique: hujXe9HNPFqsONgCzwvAMQ-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-797cde3c2f8so18599985a.1 for ; Fri, 21 Jun 2024 07:25:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718979909; x=1719584709; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xrFUBbcc9D0ORnx0qoNz81Y3Ewmb+xGo+hBykZs/BiM=; b=ELLKMvTBEqNx2+xN0CZ3Ig3G+/XtVevbKYPeaeJ12yh1JjgFlwixLspbzQg2Mb9WlM xLJfB++Hf8jJM2FfoW9QE/bYV9J5bDd4cjG36Xg99MAMp6F7kNSML8UE1TCrsoRQBvUm lBzDyxsauOpx+jCrIttjeHFsehEciOFAu+YZvkI5KDAdTYVBVPiUaJFFnh50UjnSGT12 Syv6anUzGFW57d/8gk9XzXfpYq0pVABVAtCioPDa9iPqDRjl8P22YjVSiA1DcaShQcv5 gk9HCylerZFZFIxGtA+H8l9XTUTCxr2ZHaVUwXtmo9Tn/eRyXBCFWqYS4wC4dUr5y0fE YysQ== X-Gm-Message-State: AOJu0Yy+R5dSOgHm1/6zvyDIDajl9Fz06Bh5bWySKU7YnaQc59ZBEsy0 wW7MPsb1Q/uHR5zvgsUZAs2lYTYeT7mqLXcm2AL8i2QhZPq4Ah6+yPTFyZkzNsqOgAgpe6YT0Cy qYL+4H/P75qjGG4k9PVNqY4IuRQUrOURGEM7TV/yqtNQrSNUT/oL9yjzBuREjTs6Jc5NCso6kvp RZEwN08EX9rOPw7JghgRuc1MQYySPsGy9pXl6MQf9HPYU= X-Received: by 2002:a05:620a:1a1f:b0:795:c5a1:cbac with SMTP id af79cd13be357-79bb3ee371bmr869909685a.5.1718979909549; Fri, 21 Jun 2024 07:25:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHBrJ7N3pmBZU3UlBHH/5YVIoMtOfK+G61+YRt17HXeINLiGOOyCA8EM9L/R3OdwJcsLTRnFw== X-Received: by 2002:a05:620a:1a1f:b0:795:c5a1:cbac with SMTP id af79cd13be357-79bb3ee371bmr869904185a.5.1718979908847; Fri, 21 Jun 2024 07:25:08 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79bce944cb2sm90564785a.125.2024.06.21.07.25.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 07:25:08 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: x86@kernel.org, Borislav Petkov , Dave Jiang , "Kirill A . Shutemov" , Ingo Molnar , Oscar Salvador , peterx@redhat.com, Matthew Wilcox , Vlastimil Babka , Dan Williams , Andrew Morton , Hugh Dickins , Michael Ellerman , Dave Hansen , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Christophe Leroy , Rik van Riel , Mel Gorman , "Aneesh Kumar K . V" , Nicholas Piggin , Huang Ying Subject: [PATCH 1/7] mm/dax: Dump start address in fault handler Date: Fri, 21 Jun 2024 10:24:58 -0400 Message-ID: <20240621142504.1940209-2-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240621142504.1940209-1-peterx@redhat.com> References: <20240621142504.1940209-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently the dax fault handler dumps the vma range when dynamic debugging enabled. That's mostly not useful. Dump the (aligned) address instead with the order info. Signed-off-by: Peter Xu --- drivers/dax/device.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index eb61598247a9..714174844ca5 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -235,9 +235,9 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *v= mf, unsigned int order) int id; struct dev_dax *dev_dax =3D filp->private_data; =20 - dev_dbg(&dev_dax->dev, "%s: %s (%#lx - %#lx) order:%d\n", current->comm, - (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read", - vmf->vma->vm_start, vmf->vma->vm_end, order); + dev_dbg(&dev_dax->dev, "%s: op=3D%s addr=3D%#lx order=3D%d\n", current->c= omm, + (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read", + vmf->address & ~((1UL << (order + PAGE_SHIFT)) - 1), order); =20 id =3D dax_read_lock(); if (order =3D=3D 0) --=20 2.45.0 From nobody Tue Dec 16 19:40:02 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1E3D22EF3 for ; Fri, 21 Jun 2024 14:25:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979919; cv=none; b=lr+7KeBFNiLK0k/Cvkucc3EtnLfpoFkkJqKBcQKvvusWxtvU2xgkSrS0e/OOlUfc6/em0+kkgFh6pQ9PIshE/cYdJb4ecN08bCbRMxRL7H7dPDouaANycrQpxAhEiPp6rC1Ch9OLAG5VhkBu/qcrBjiJCkncuJ7zR67tgRx/a0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979919; c=relaxed/simple; bh=Imr7BXwXpPmXA7Je8Mj7QUC4jwpI9lQNsZRoGr1DKP8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ujkCjRNHGXApU+BVTXU1tcDNhjMfHUyMDhSPrepZIq/Y0vh6VMgC8zYR2xuiglUVXPTQi/sf0AZ5dDD+CRMUd8xyiCSYuzpPyCcjY6sEn6vj6BNtjYUTBb8WOh8AX7jvOaDgANCKGnllO+DPR+Kry6zShbOiM1fPHhnQLbtmPVk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bg1VGXEa; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bg1VGXEa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718979913; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MJ/z3oUyfvIT45NLtsICo4unaxlXJrC1yYznR3c5j0M=; b=bg1VGXEaxL37U0T644gJ76xSoctDesBSH46+uVUQc7tL2t9VYQGSi5bGzOf/6jrAjMr1GF bvPJ7ZDDVLxwikSrMhsSpMCVrjIb/WWIB60B/iihSYhpX6s9EKPJNM1Sp8d9Gy6FNdUNA7 I9Uo+9VuUtNG9mo6pU7OBId68Pb+BjY= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-501-V5L1xgk_OOaJ62WsqJQQZQ-1; Fri, 21 Jun 2024 10:25:12 -0400 X-MC-Unique: V5L1xgk_OOaJ62WsqJQQZQ-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-798084906ddso13082585a.1 for ; Fri, 21 Jun 2024 07:25:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718979911; x=1719584711; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MJ/z3oUyfvIT45NLtsICo4unaxlXJrC1yYznR3c5j0M=; b=udtDWAJx5D+kUMOolU0ZCEB23afo7Vy4gEVjnSKqrCEnqI/Gv/2eV18s19xdAny+jl g2Cw3J4kkvABU6aMl/tR5zO92GdSzouu5KUcH6R3TqT6iwazFyIqDHmjiC5olOAqN+An UP9We8byfseJ9xsvn+BG2hbaHCwf3WT7TgdD1Il3MJ/AhAwA7TzfkPtWfGs8G2i9XTtB h0hpVsAlRndG+ZzvbzIspWli8qLdP3nRuo/KSpc/SMiJhHz8BazGv5DMCeB5SxT5TaoH krExnZJtsiQr6AqyH0C47Ck7LFzeDEiAblUwvoADY/mlk+kXcGuhXx5AqYUjz556mgkh BWmg== X-Gm-Message-State: AOJu0YwdbAnKm61AlOx2n9MAgEtwk7XGoEDdOW31Owy0oEIVqX0r/yOE ymLeu5Bs4c39Dm8HJAjkqg7mgf3s+Tyq1weHysGTez3BknCUj7VPlxC4mCcT3bHtEJcXUhYGBYi REU2sJX19Sl8kBW7FOVgUcCmcTPWFFQogql/pDp4XuJ3MSLdrApXbo1DP2jwSyke77GI4G5k2KB kmK/rNbuxkgO1NiZA2jg44MNNmN3YyxKDY4yt50qBunDk= X-Received: by 2002:a05:620a:4610:b0:79b:be0b:7815 with SMTP id af79cd13be357-79bbe0b7c0fmr735913185a.1.1718979911415; Fri, 21 Jun 2024 07:25:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE1JVg4Gm5x1JzvXAIal9/bHLuF/aojpegzJFLEbRVA7h1M8KnWp/bZUR1LNlUwbmLSWaEiNQ== X-Received: by 2002:a05:620a:4610:b0:79b:be0b:7815 with SMTP id af79cd13be357-79bbe0b7c0fmr735907585a.1.1718979910823; Fri, 21 Jun 2024 07:25:10 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79bce944cb2sm90564785a.125.2024.06.21.07.25.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 07:25:10 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: x86@kernel.org, Borislav Petkov , Dave Jiang , "Kirill A . Shutemov" , Ingo Molnar , Oscar Salvador , peterx@redhat.com, Matthew Wilcox , Vlastimil Babka , Dan Williams , Andrew Morton , Hugh Dickins , Michael Ellerman , Dave Hansen , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Christophe Leroy , Rik van Riel , Mel Gorman , "Aneesh Kumar K . V" , Nicholas Piggin , Huang Ying , Alex Thorlton Subject: [PATCH 2/7] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES Date: Fri, 21 Jun 2024 10:24:59 -0400 Message-ID: <20240621142504.1940209-3-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240621142504.1940209-1-peterx@redhat.com> References: <20240621142504.1940209-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In 2013, commit 72403b4a0fbd ("mm: numa: return the number of base pages altered by protection changes") introduced "numa_huge_pte_updates" vmstat entry, trying to capture how many huge ptes (in reality, PMD thps at that time) are marked by NUMA balancing. This patch proposes to remove it for some reasons. Firstly, the name is misleading. We can have more than one way to have a "huge pte" at least nowadays, and that's also the major goal of this patch, where it paves way for PUD handling in change protection code paths. PUDs are coming not only for dax (which has already came and yet broken..), but also for pfnmaps and hugetlb pages. The name will simply stop making sense when PUD will start to be involved in mprotect() world. It'll also make it not reasonable either if we boost the counter for both pmd/puds. In short, current accounting won't be right when PUD comes, so the scheme was only suitable at that point in time where PUD wasn't even possible. Secondly, the accounting was simply not right from the start as long as it was also affected by other call sites besides NUMA. mprotect() is one, while userfaultfd-wp also leverages change protection path to modify pgtables. If it wants to do right it needs to check the caller but it never did; at least mprotect() should be there even in 2013. It gives me the impression that nobody is seriously using this field, and it's also impossible to be serious. We may want to do it right if any NUMA developers would like it to exist, but we should do that with all above resolved, on both considering PUDs, but also on correct accountings. That should be able to be done on top when there's a real need of such. Cc: Huang Ying Cc: Mel Gorman Cc: Alex Thorlton Cc: Rik van Riel Signed-off-by: Peter Xu --- include/linux/vm_event_item.h | 1 - mm/mprotect.c | 8 +------- mm/vmstat.c | 1 - 3 files changed, 1 insertion(+), 9 deletions(-) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 73fa5fbf33a3..a76b75d6a8f4 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -59,7 +59,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, OOM_KILL, #ifdef CONFIG_NUMA_BALANCING NUMA_PTE_UPDATES, - NUMA_HUGE_PTE_UPDATES, NUMA_HINT_FAULTS, NUMA_HINT_FAULTS_LOCAL, NUMA_PAGE_MIGRATE, diff --git a/mm/mprotect.c b/mm/mprotect.c index 222ab434da54..21172272695e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -363,7 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, pmd_t *pmd; unsigned long next; long pages =3D 0; - unsigned long nr_huge_updates =3D 0; struct mmu_notifier_range range; =20 range.start =3D 0; @@ -411,11 +410,8 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, ret =3D change_huge_pmd(tlb, vma, pmd, addr, newprot, cp_flags); if (ret) { - if (ret =3D=3D HPAGE_PMD_NR) { + if (ret =3D=3D HPAGE_PMD_NR) pages +=3D HPAGE_PMD_NR; - nr_huge_updates++; - } - /* huge pmd was handled */ goto next; } @@ -435,8 +431,6 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, if (range.start) mmu_notifier_invalidate_range_end(&range); =20 - if (nr_huge_updates) - count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates); return pages; } =20 diff --git a/mm/vmstat.c b/mm/vmstat.c index 6e3347789eb2..be774893f69e 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1313,7 +1313,6 @@ const char * const vmstat_text[] =3D { =20 #ifdef CONFIG_NUMA_BALANCING "numa_pte_updates", - "numa_huge_pte_updates", "numa_hint_faults", "numa_hint_faults_local", "numa_pages_migrated", --=20 2.45.0 From nobody Tue Dec 16 19:40:02 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3BA639AF4 for ; Fri, 21 Jun 2024 14:25:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979922; cv=none; b=TvnYKOAqgd6jZwzuuWocTYnqn3EMpKZwVHDTdg72EHLSgc9ULjFUgfnH7t1ipsiJj3+KSkuLDU7Tk8hwy9YC6b6Aex4y3ZdDjvDnoetNCk24KHwdfPXcqe42Vny/RiRoeYUvrwcvFa04HfJJ7pzeQeu7O7HurFjakDLaIaCigk0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979922; c=relaxed/simple; bh=ARxQXTuoo2oZGe8F14hSzMvMkV1cj+qV2fEARS445Kw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GsCqyCs/2Bu4ERLY5Mwgc6xFAalvIF/bIqRof5bNJ0RdHzpWKu5NMTKZA2TJr9vM84VT2QWfYwfk6qeX10DPGxmdJ2mRQdWeFH17XQYLPwR4zr0qPtTaUjS2+CSMgHXS5GqNhtQnYZwXHEIkNSaHenPhVGwopKIJtGk+zBM09UQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=gznf5Upi; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gznf5Upi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718979916; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VpPwnVRxGccST937wjacfAT3ndiJzOREHKlh4Jm8zSE=; b=gznf5UpiXxq5RA66OhcfR4XPUsir7jojCHVbReLiq1IX4koGOhUX98GHU5FT/QqRxD6bD1 avqMKEtfQDQ/qHbaBxu4nuQ8yGs0hxS9zgHyfrc7cYugjQWYOTJK7xnoNdSMtNe32aweN7 ff1rePuyaIVtG+WLTOiJR/5AndjWm+Y= Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com [209.85.210.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-671-TSZba0aFPvah26RahCIktw-1; Fri, 21 Jun 2024 10:25:14 -0400 X-MC-Unique: TSZba0aFPvah26RahCIktw-1 Received: by mail-ot1-f70.google.com with SMTP id 46e09a7af769-6fd5c2cce29so391101a34.0 for ; Fri, 21 Jun 2024 07:25:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718979914; x=1719584714; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VpPwnVRxGccST937wjacfAT3ndiJzOREHKlh4Jm8zSE=; b=dJL0sLsLM3A9EdsMsPWrZGvuu22FyHOB/+3AcMM5TaO5J4PRMMdKwEWPL8x2/LDKIV z2TaC2DfSmZsiQy9Qe3V+E5xHxAlFrIdrosSo2gJm1XGQzDaeKKBndzscZi4grODpauJ al8JYiBr3FxyNYd4i8lj6ovmID6i6y7mCcS6tHiKhd/EwvOnBEVll5uTKovgrnNRPkXs pWwV6eDN9rz3X4VuNyS8987IXzvpWwnyp9AHE5vXnPah5jFlLUjeJ9VPl+Q85eTf5+bd 5+l2OsTuWvi1YgCuLlJiUCyRUMnlTw0yZO2DBkotdzvfRJHTi63d+TS4wy2dAtxRokR7 4yUg== X-Gm-Message-State: AOJu0Ywiqi66qim3l3WDaKK3ixuz3ifU/prWdApWrUfwABWa5lDwQvJ/ 2X9P4y2AuzZ9UONF5CRwO09v2iBKFDHYu8qZ5932dWJxjJxsxW5tjvfuf5+kmgEsu2AAwwCaKw5 pmlekCfalghm/FAI7zY2z4kUFgJRxQQ+r3lgIY3XH0BsQQ+RouvvVrdr1QOhxt5WhLgorWepmZw BWoI/3SuAz8Dtj5UgmKYT2AyYu504NTEotvSiqeuMkyTc= X-Received: by 2002:a4a:c60f:0:b0:5bd:af39:c9d9 with SMTP id 006d021491bc7-5c1ad9093ebmr9601690eaf.0.1718979913513; Fri, 21 Jun 2024 07:25:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFna6HwoEVcDVzacrmJp70i2Ye9mZl03svl3PALStAHNV8VRqUBmNMnAivUzvRTiL6Q97gHJQ== X-Received: by 2002:a4a:c60f:0:b0:5bd:af39:c9d9 with SMTP id 006d021491bc7-5c1ad9093ebmr9601627eaf.0.1718979912731; Fri, 21 Jun 2024 07:25:12 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79bce944cb2sm90564785a.125.2024.06.21.07.25.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 07:25:12 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: x86@kernel.org, Borislav Petkov , Dave Jiang , "Kirill A . Shutemov" , Ingo Molnar , Oscar Salvador , peterx@redhat.com, Matthew Wilcox , Vlastimil Babka , Dan Williams , Andrew Morton , Hugh Dickins , Michael Ellerman , Dave Hansen , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Christophe Leroy , Rik van Riel , Mel Gorman , "Aneesh Kumar K . V" , Nicholas Piggin , Huang Ying , kvm@vger.kernel.org, Sean Christopherson , Paolo Bonzini , David Rientjes Subject: [PATCH 3/7] mm/mprotect: Push mmu notifier to PUDs Date: Fri, 21 Jun 2024 10:25:00 -0400 Message-ID: <20240621142504.1940209-4-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240621142504.1940209-1-peterx@redhat.com> References: <20240621142504.1940209-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mprotect() does mmu notifiers in PMD levels. It's there since 2014 of commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to change_pmd_range"). At that time, the issue was that NUMA balancing can be applied on a huge range of VM memory, even if nothing was populated. The notification can be avoided in this case if no valid pmd detected, which includes either THP or a PTE pgtable page. Now to pave way for PUD handling, this isn't enough. We need to generate mmu notifications even on PUD entries properly. mprotect() is currently broken on PUD (e.g., one can easily trigger kernel error with dax 1G mappings already), this is the start to fix it. To fix that, this patch proposes to push such notifications to the PUD layers. There is risk on regressing the problem Rik wanted to resolve before, but I think it shouldn't really happen, and I still chose this solution because of a few reasons: 1) Consider a large VM that should definitely contain more than GBs of memory, it's highly likely that PUDs are also none. In this case there will have no regression. 2) KVM has evolved a lot over the years to get rid of rmap walks, which might be the major cause of the previous soft-lockup. At least TDP MMU already got rid of rmap as long as not nested (which should be the major use case, IIUC), then the TDP MMU pgtable walker will simply see empty VM pgtable (e.g. EPT on x86), the invalidation of a full empty region in most cases could be pretty fast now, comparing to 2014. 3) KVM has explicit code paths now to even give way for mmu notifiers just like this one, e.g. in commit d02c357e5bfa ("KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing"). It'll also avoid contentions that may also contribute to a soft-lockup. 4) Stick with PMD layer simply don't work when PUD is there... We need one way or another to fix PUD mappings on mprotect(). Pushing it to PUD should be the safest approach as of now, e.g. there's yet no sign of huge P4D coming on any known archs. Cc: kvm@vger.kernel.org Cc: Sean Christopherson Cc: Paolo Bonzini Cc: David Rientjes Cc: Rik van Riel Signed-off-by: Peter Xu --- mm/mprotect.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 21172272695e..fb8bf3ff7cd9 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -363,9 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, pmd_t *pmd; unsigned long next; long pages =3D 0; - struct mmu_notifier_range range; - - range.start =3D 0; =20 pmd =3D pmd_offset(pud, addr); do { @@ -383,14 +380,6 @@ static inline long change_pmd_range(struct mmu_gather = *tlb, if (pmd_none(*pmd)) goto next; =20 - /* invoke the mmu notifier if the pmd is populated */ - if (!range.start) { - mmu_notifier_range_init(&range, - MMU_NOTIFY_PROTECTION_VMA, 0, - vma->vm_mm, addr, end); - mmu_notifier_invalidate_range_start(&range); - } - _pmd =3D pmdp_get_lockless(pmd); if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { if ((next - addr !=3D HPAGE_PMD_SIZE) || @@ -428,9 +417,6 @@ static inline long change_pmd_range(struct mmu_gather *= tlb, cond_resched(); } while (pmd++, addr =3D next, addr !=3D end); =20 - if (range.start) - mmu_notifier_invalidate_range_end(&range); - return pages; } =20 @@ -438,10 +424,13 @@ static inline long change_pud_range(struct mmu_gather= *tlb, struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { + struct mmu_notifier_range range; pud_t *pud; unsigned long next; long pages =3D 0, ret; =20 + range.start =3D 0; + pud =3D pud_offset(p4d, addr); do { next =3D pud_addr_end(addr, end); @@ -450,10 +439,19 @@ static inline long change_pud_range(struct mmu_gather= *tlb, return ret; if (pud_none_or_clear_bad(pud)) continue; + if (!range.start) { + mmu_notifier_range_init(&range, + MMU_NOTIFY_PROTECTION_VMA, 0, + vma->vm_mm, addr, end); + mmu_notifier_invalidate_range_start(&range); + } pages +=3D change_pmd_range(tlb, vma, pud, addr, next, newprot, cp_flags); } while (pud++, addr =3D next, addr !=3D end); =20 + if (range.start) + mmu_notifier_invalidate_range_end(&range); + return pages; } =20 --=20 2.45.0 From nobody Tue Dec 16 19:40:02 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13C39208CA for ; Fri, 21 Jun 2024 14:25:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979920; cv=none; b=Vjuqfth0jxy3P//Muoq6e2lblYNuUX7fAe5E3fSO+bdzTaXPyTKET3/ISHYMmwzZZrNdEozwHtVNvUNG7ASbMzYSgHKJeWCEzqaUDorpThCQTGi4K/NdwpWxz2wFReMrkvy0cGDe4RKXJLosLb+g3Cw5N8A0M3z3iuf4AKvL7rA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979920; c=relaxed/simple; bh=RovpY+xujEdVG++6JrIkRPDs1Uhe4MHJI5kSZut9Xgo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R5MUVqH9q5SpRvgdBZuYw1fGRxcpWUUjnm+MkV4A/y+vXSH+0A8M4zPQg/m08ojjkquy09OPXogNZ7LurQfcVrC7y8JwkAfsExIQ1yuVzaitRm5YbuLDliRw/EOlX3QX80n94xC88C8HJGRRrq6B/cIL+XInfIn8ddk1kwKIdc8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=eIYjLUmo; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eIYjLUmo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718979918; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sMhnR88Gj51dgjDcNyzJVzhu8osLx9zEmj3hidZ3w4E=; b=eIYjLUmoU6DCJMCBkWIE7QrTgMaT8zFGNof3uHjw87tNi9k++8c63a+YhWV6P4s3sCBaJn hsFli/Q3rrj1inb38CJYjGqpsT5m44YL84jTMN+2IVHnxVOfvkELNbBrn4FWtUAcTymCT9 B3k+4Na9y6pnHCVtyc7owOVcYR79koc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-145-1NPjdcg3Obq8VmmUsQ_m9g-1; Fri, 21 Jun 2024 10:25:17 -0400 X-MC-Unique: 1NPjdcg3Obq8VmmUsQ_m9g-1 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-797a903cf8fso21897885a.2 for ; Fri, 21 Jun 2024 07:25:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718979916; x=1719584716; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sMhnR88Gj51dgjDcNyzJVzhu8osLx9zEmj3hidZ3w4E=; b=uVjc247ppTeKwL5siGj8mo+NkGtXw9mtZizEAHXRWouaDm48d9Y5TFEWi3shHbKNe6 9AT2KKtXOdOnUPJzrxw8ySeGFfyZEZxiiAKErOnbslkQeEv6FDnbCQRD9y4rGI7yGCHm 8Nz2t9NBZ6OK0Ya/O7hIa06FxCNXYOnC96a3pMLrbbELdyWC+jQoKiWS0VopW+P9KMXL fo9tlrQkD0jsNJ6wzQHaIpUWrF+Ispmq6uXZpxNCNELuYFfTjqmfYRIHkSpRZ4t0pptd Lhh3sSdP37B26k08LReaQJXqChWwzINJ3xaG3wo0TcS9Gh1ERR7p4kblmPgX6LcAwCLC y5ag== X-Gm-Message-State: AOJu0YxPy7b6zbn4WkDo+2Z4A3gNXzqL445M5YfeMo1X9B6+R3TrIes/ swg8iA/GTo5o6VGxuVVG4E1b03zqQtqXX/ru4k6Y1zMAY4c2UtR8vUX41/LNteWnQwSK6i1+Svh P57Eh5k46tNNC6Q83tyYN8BiFvwc13Vo01a9Ij+nrY46kzv2coPcAbjJyJKCh0dB7vvx+NaiJQb BnRGxGyMlVEzf9oAdsDbhreH0x3+M8WVvYZNBdua546l4= X-Received: by 2002:a05:620a:3951:b0:79b:be0b:77e9 with SMTP id af79cd13be357-79bbe0b7cbemr703659685a.1.1718979915768; Fri, 21 Jun 2024 07:25:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEKR5sMdbnn7x0tNnlB82FrZESiADWgCE3S9rPF+Z4OitOxuRlE3iKP8psvTE41pTKeNbOqHw== X-Received: by 2002:a05:620a:3951:b0:79b:be0b:77e9 with SMTP id af79cd13be357-79bbe0b7cbemr703653085a.1.1718979914845; Fri, 21 Jun 2024 07:25:14 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79bce944cb2sm90564785a.125.2024.06.21.07.25.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 07:25:14 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: x86@kernel.org, Borislav Petkov , Dave Jiang , "Kirill A . Shutemov" , Ingo Molnar , Oscar Salvador , peterx@redhat.com, Matthew Wilcox , Vlastimil Babka , Dan Williams , Andrew Morton , Hugh Dickins , Michael Ellerman , Dave Hansen , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Christophe Leroy , Rik van Riel , Mel Gorman , "Aneesh Kumar K . V" , Nicholas Piggin , Huang Ying Subject: [PATCH 4/7] mm/powerpc: Add missing pud helpers Date: Fri, 21 Jun 2024 10:25:01 -0400 Message-ID: <20240621142504.1940209-5-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240621142504.1940209-1-peterx@redhat.com> References: <20240621142504.1940209-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" These new helpers will be needed for pud entry updates soon. Namely: - pudp_invalidate() - pud_modify() Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: linuxppc-dev@lists.ozlabs.org Cc: Aneesh Kumar K.V Signed-off-by: Peter Xu --- arch/powerpc/include/asm/book3s/64/pgtable.h | 3 +++ arch/powerpc/mm/book3s64/pgtable.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/in= clude/asm/book3s/64/pgtable.h index 8f9432e3855a..fc628a984669 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1108,6 +1108,7 @@ extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgpr= ot); extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot); +extern pud_t pud_modify(pud_t pud, pgprot_t newprot); extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd); extern void set_pud_at(struct mm_struct *mm, unsigned long addr, @@ -1368,6 +1369,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(s= truct mm_struct *mm, #define __HAVE_ARCH_PMDP_INVALIDATE extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long add= ress, pmd_t *pmdp); +extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long add= ress, + pud_t *pudp); =20 #define pmd_move_must_withdraw pmd_move_must_withdraw struct spinlock; diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/= pgtable.c index 2975ea0841ba..c6ae969020e0 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -176,6 +176,17 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsi= gned long address, return __pmd(old_pmd); } =20 +pmd_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + unsigned long old_pud; + + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); + old_pmd =3D pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT,= _PAGE_INVALID); + flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); + return __pmd(old_pmd); +} + pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, int full) { @@ -259,6 +270,15 @@ pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) pmdv &=3D _HPAGE_CHG_MASK; return pmd_set_protbits(__pmd(pmdv), newprot); } + +pud_t pud_modify(pud_t pud, pgprot_t newprot) +{ + unsigned long pudv; + + pudv =3D pud_val(pud); + pudv &=3D _HPAGE_CHG_MASK; + return pud_set_protbits(__pud(pudv), newprot); +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 /* For use by kexec, called with MMU off */ --=20 2.45.0 From nobody Tue Dec 16 19:40:02 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8A593BBF6 for ; Fri, 21 Jun 2024 14:25:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979923; cv=none; b=UIyKWQ8JDhOXXA3JGUcYjrkuHEgCoTwxeSYfN+Eme6V6YW1HN/ECJvMAJ1desP7tBu09rIVQpa3IV+47889OA78Z0itEf0ZGBqD2gIu4B7uFotG3yIIKQx1XHom+lqGJLYLsu/QkqYF0QgB2bGgnYCP0ZM8uu3kANLgf3PH35xg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979923; c=relaxed/simple; bh=OWZDuzR1vWK1BTpws23bW9z6bZLcvZYrdBn8qhmt+jI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dOyy0dpPS4CmEO/lYVP2U6e8XcQ3MPNHUCEbTKnnLaWmkMXhNvbR2mOoDbmfXm1Ocro4nXYJH0P/wvoV9gnvVDetyQDHRH4xA2Q+HUjNQVeZeBfTxMd8KegbM5JJIKAy6wenH+s7JS9gFi+Qe8+gFqfW1EWv+FEQ+pikmpStLqc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=P2Vq+vF6; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="P2Vq+vF6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718979920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k2ztUKph00cBWbOmplix2Q4QYxHwnhS7TcGpCnDIOcM=; b=P2Vq+vF6xvnJHNzX3LgLv0Q97lx6QB1z+kDYWzuJl6ik1phHyEvA1dWPAxAANuFsv15WUf MM9UKTws0fV5qaAEVmmyq1d3E3CXefBvkj1eFJ56qES3CmROiPluxHXOn5vQ3Ky8nbSMF1 nvE7zBTcLQ+Bq21xeOKdra6QenZaZQM= Received: from mail-oa1-f69.google.com (mail-oa1-f69.google.com [209.85.160.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-471-4KDroyRkP5u_-Z_f9GLT3Q-1; Fri, 21 Jun 2024 10:25:19 -0400 X-MC-Unique: 4KDroyRkP5u_-Z_f9GLT3Q-1 Received: by mail-oa1-f69.google.com with SMTP id 586e51a60fabf-25cbd68532fso343133fac.3 for ; Fri, 21 Jun 2024 07:25:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718979918; x=1719584718; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k2ztUKph00cBWbOmplix2Q4QYxHwnhS7TcGpCnDIOcM=; b=VQopFV1nPKgBGfQJb/rnLjGLZpCXyP0CTyrUT4cfUL+Bzv2JWWqpOMVTP3rxzNBisC 5EjJcFoxtVyFxFsstDwAsm7D+YNGHo5HZkfu18BYYG5u90lUffbNsZWDmBItk70kqyCp 77RyiKYCBVn/mZQSeuVSR9TnNlo56QUTCdSnDj6mNK9UQvGVBGkuhHtqboVx/iuEpJNH nvtFldd22TPqrh5nRtzk1K4c+Hwp7FQRfJpxNW5LIBFJC4ezAVL6+RjhdPDvnmkkgi1B QmwOxk0Uvj8kTlu6L61HYq6LZ+OZRsfPMtbIDEzE+p55azoLNlL+nOywUpP2IvfEtSLQ Vfrg== X-Gm-Message-State: AOJu0YznVEzvGLa3OdjV9aRVVQOiKi2r++nKC6GASxiAWOadSvky8OdC LhoTYM5nU/wImtL3MWJh/9BRZOKEWEI2Xu9UyhzAZtvihf08OAKIPp1OYg0y1H20YPK0CUeyhMJ H5tGGlcTljkge4NimxKd7wJ94l0Qd+LJiNeFjekTywCSddKNOplLR9rcjKdC7KOJp+iwhO4Lub9 oMeRarj9DhCVrziqHNEMoUhwFpFeDzHcdfJ5cvlFAAP88= X-Received: by 2002:a05:6871:14f:b0:254:cae6:a812 with SMTP id 586e51a60fabf-25c94d8a94emr9065133fac.3.1718979917984; Fri, 21 Jun 2024 07:25:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEpfu3cPye3+7RpVIl2JRxNk+CTOiRpoiTdFmb6dpb/ERi8sD9JFsePfCkLHiFMYZR2Pz0U9w== X-Received: by 2002:a05:6871:14f:b0:254:cae6:a812 with SMTP id 586e51a60fabf-25c94d8a94emr9065078fac.3.1718979917305; Fri, 21 Jun 2024 07:25:17 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79bce944cb2sm90564785a.125.2024.06.21.07.25.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 07:25:16 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: x86@kernel.org, Borislav Petkov , Dave Jiang , "Kirill A . Shutemov" , Ingo Molnar , Oscar Salvador , peterx@redhat.com, Matthew Wilcox , Vlastimil Babka , Dan Williams , Andrew Morton , Hugh Dickins , Michael Ellerman , Dave Hansen , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Christophe Leroy , Rik van Riel , Mel Gorman , "Aneesh Kumar K . V" , Nicholas Piggin , Huang Ying Subject: [PATCH 5/7] mm/x86: Make pud_leaf() only cares about PSE bit Date: Fri, 21 Jun 2024 10:25:02 -0400 Message-ID: <20240621142504.1940209-6-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240621142504.1940209-1-peterx@redhat.com> References: <20240621142504.1940209-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" An entry should be reported as PUD leaf even if it's PROT_NONE, in which case PRESENT bit isn't there. I hit bad pud without this when testing dax 1G on zapping a PROT_NONE PUD. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x86@kernel.org Signed-off-by: Peter Xu Acked-by: Dave Hansen --- arch/x86/include/asm/pgtable.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 65b8e5bb902c..25fc6d809572 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1073,8 +1073,7 @@ static inline pmd_t *pud_pgtable(pud_t pud) #define pud_leaf pud_leaf static inline bool pud_leaf(pud_t pud) { - return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) =3D=3D - (_PAGE_PSE | _PAGE_PRESENT); + return pud_val(pud) & _PAGE_PSE; } =20 static inline int pud_bad(pud_t pud) --=20 2.45.0 From nobody Tue Dec 16 19:40:02 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A7C44D8AD for ; Fri, 21 Jun 2024 14:25:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979925; cv=none; b=bdsVSkAZDHEvfhbXn+RdhgSQ0gBtVHQOTXruj2xprF8/PPO0OfYILu+UA8t4t00+EqgblrsXO74uWe7Ty6xITXnp9C5r3YVP6jWJBkX4q89rwRvuUyyK/vJIERFfM44q9pEfTkodluiAMVQy6FHKQ110misZ5VKGoc8ddr/xoRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979925; c=relaxed/simple; bh=XjjA2tmFAAypYeLP8NdXol8grBVU43yRXPOg3GP+ySM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BxBdmATuHaXk04z6OVwlh058XbVlGD5jGlie8AWcS/6g8NDwAL0PD5sITxYSH9+diUB4kypazvYprfwdPl3ulqzB7j852wpYNT4/F1sSyCXC2k5FQmcRxPeJoUi0HOZ6f/SAvLxWrUV+Fttq1DOQxE3miixCeiDwzCh37QMOGEY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NnfCgak6; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NnfCgak6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718979923; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tqSfUWMuuFFpoXlX71uD6ModhPCWC1FSZp1xmlIx7N4=; b=NnfCgak6UafnbQ9z9xI2oOsEhyaT9T1OvvNMabvBSvUUUMClZyaJ01cAWsg6Bmb2FeHls2 6e9VEmxzewbeM7eYi6/D2AES00yi7G4iT3eUM0uT23C5Itkax7SbVSft5oDmdtEW+F+zGO /i9pYK8cCOqMI7Jxmj44xhK1bX/72rs= Received: from mail-ot1-f71.google.com (mail-ot1-f71.google.com [209.85.210.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-660--UcYftHGNaWP2vErBh2etw-1; Fri, 21 Jun 2024 10:25:22 -0400 X-MC-Unique: -UcYftHGNaWP2vErBh2etw-1 Received: by mail-ot1-f71.google.com with SMTP id 46e09a7af769-6f97a99d7dfso141990a34.2 for ; Fri, 21 Jun 2024 07:25:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718979920; x=1719584720; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tqSfUWMuuFFpoXlX71uD6ModhPCWC1FSZp1xmlIx7N4=; b=R0sTpobXsS9+wz9Fc60GHW2OKqPYGON0vE5b8Bmp+KATd0aU4bgZn20oxIDCXgcPGP aAKsfXZp6Nmtm4p18eRL+95P+ftHmi6IU21UOpQxQpr5OQQYu6PymL3uwgyK3VqhPv5i iKmS9TUGQvOVtn3iJg8kOui09Yn1MXwmoGiPt4EgKZBOA/0G8YvsCSi3JT6X+UT9pTIX cZCRL2tKmeSZ4zTQOczd94G/fzf0sgBjc5ngRjeAOrhuGhEVLS1MX371Un8fWl8z++tj ylkyaJTydI4zZ1hgQTr16VlQQNuG2j0aRQIxpKcTb6tGJ7PlxiczEXSeUDQK8sdWCDwL L4MQ== X-Gm-Message-State: AOJu0YzTMUZPI91ny4vZlWeDGYR3wD3tuiLK40NW4TZeC8g1an7zdPq9 aw2Lm97+DHOGhWjSKz7BlTYEEmxScFD0wvIBaQ08A09XfZGODlC/dF/Ic2caGCwdV7kh7U3DPa3 WfUNqexNIvGIJRJDQbRqN3O4QiBbkGTJfxJ/dovC04gv+/9mG+4VMiGRCIIrZSi568f7E3Q0qzg c8QUlB4e1PAHw25q33G+TYUo0iOLXo2IFrhacJrtJvN9A= X-Received: by 2002:a05:6358:8096:b0:19f:81fb:131c with SMTP id e5c5f4694b2df-1a1fd1d513bmr911568855d.0.1718979920317; Fri, 21 Jun 2024 07:25:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHOaIBFrX4PIBPbd7NrgasVtJd+jvRQiQvrolpQX5hE+bY+MZMCcoynbRsNLrRaLePwagbsiQ== X-Received: by 2002:a05:6358:8096:b0:19f:81fb:131c with SMTP id e5c5f4694b2df-1a1fd1d513bmr911563455d.0.1718979919599; Fri, 21 Jun 2024 07:25:19 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79bce944cb2sm90564785a.125.2024.06.21.07.25.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 07:25:18 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: x86@kernel.org, Borislav Petkov , Dave Jiang , "Kirill A . Shutemov" , Ingo Molnar , Oscar Salvador , peterx@redhat.com, Matthew Wilcox , Vlastimil Babka , Dan Williams , Andrew Morton , Hugh Dickins , Michael Ellerman , Dave Hansen , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Christophe Leroy , Rik van Riel , Mel Gorman , "Aneesh Kumar K . V" , Nicholas Piggin , Huang Ying Subject: [PATCH 6/7] mm/x86: Add missing pud helpers Date: Fri, 21 Jun 2024 10:25:03 -0400 Message-ID: <20240621142504.1940209-7-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240621142504.1940209-1-peterx@redhat.com> References: <20240621142504.1940209-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" These new helpers will be needed for pud entry updates soon. Namely: - pudp_invalidate() - pud_modify() Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x86@kernel.org Signed-off-by: Peter Xu --- arch/x86/include/asm/pgtable.h | 36 ++++++++++++++++++++++++++++++++++ arch/x86/mm/pgtable.c | 11 +++++++++++ 2 files changed, 47 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 25fc6d809572..3c23077adca6 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -775,6 +775,12 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); } =20 +static inline pud_t pud_mkinvalid(pud_t pud) +{ + return pfn_pud(pud_pfn(pud), + __pgprot(pud_flags(pud) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); +} + static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); =20 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) @@ -839,6 +845,21 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t new= prot) return pmd_result; } =20 +static inline pud_t pud_modify(pud_t pud, pgprot_t newprot) +{ + pudval_t val =3D pud_val(pud), oldval =3D val; + + /* + * NOTE: no need to consider shadow stack complexities because it + * doesn't support 1G mappings. + */ + val &=3D _HPAGE_CHG_MASK; + val |=3D check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val =3D flip_protnone_guard(oldval, val, PHYSICAL_PUD_PAGE_MASK); + + return __pud(val); +} + /* * mprotect needs to preserve PAT and encryption bits when updating * vm_page_prot @@ -1377,10 +1398,25 @@ static inline pmd_t pmdp_establish(struct vm_area_s= truct *vma, } #endif =20 +static inline pud_t pudp_establish(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp, pud_t pud) +{ + if (IS_ENABLED(CONFIG_SMP)) { + return xchg(pudp, pud); + } else { + pud_t old =3D *pudp; + WRITE_ONCE(*pudp, pud); + return old; + } +} + #define __HAVE_ARCH_PMDP_INVALIDATE_AD extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); =20 +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp); + /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 93e54ba91fbf..4e245a1526ad 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -641,6 +641,17 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, u= nsigned long address, } #endif =20 +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + VM_WARN_ON_ONCE(!pud_present(*pudp)); + pud_t old =3D pudp_establish(vma, address, pudp, pud_mkinvalid(*pudp)); + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return old; +} +#endif + /** * reserve_top_address - reserves a hole in the top of kernel address space * @reserve - size of hole to reserve --=20 2.45.0 From nobody Tue Dec 16 19:40:02 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7C5F433D6 for ; Fri, 21 Jun 2024 14:25:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979927; cv=none; b=CqwgE9Tuzwz8e7c72syJ+PdXCiErJE5xqVl0+kzMtwrKpq7wORtkoHm5XrOgEm30LclKdgt5klKiXjxZOe0vgZSOiOQwUGcMwa8Ls4rBqN/YPw/uq5nFu625OrcmEpyuwsHQbydUnS1S3aJVpPJYByiEEPe+KFJM390eTIFm1C8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718979927; c=relaxed/simple; bh=nOXxENP+SihR2L2Tw2z5KRQQ1L5Z5qj+bFBtQ9AMgag=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A5xMo1IVrdcJxqzsyib2HGQpO8dXjFWc30/hSsMPCfB/Si47sXQpX8b1hNmjBF2SYl6J8VzaHxy0CTmTdeD2cLie0tdIAUKDxqwrX2Nq3unqIusbCWL0gTXIoaMWEj2dxYtWdS3SLiwl/nIcA66mSMFO0jRAo4PZYaWyQ0CSHoM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QhDJRyda; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QhDJRyda" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718979924; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FloNdHDAMCUmgBKRbs+OcXuVLR4NlkKUkgqw/Oorwhs=; b=QhDJRyda6rNLejYdscfslEYaJxGeILdq153SuN9eyZ0LtXj6AiVP5gKTDJjuHEtS33WJWn 1gsisAKU5bPwDGucD3EtvtAvhf/r78i55N7iSW1OW7AJLxhtlMceL81kshrn8E43g0nJPn ZjLNyuYRJDlz60lYG8ctuFoc2fnu/4g= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-378-o5dTw0E7NY-6exYoyF7aiw-1; Fri, 21 Jun 2024 10:25:23 -0400 X-MC-Unique: o5dTw0E7NY-6exYoyF7aiw-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-43e2c5354f9so3561121cf.2 for ; Fri, 21 Jun 2024 07:25:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718979922; x=1719584722; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FloNdHDAMCUmgBKRbs+OcXuVLR4NlkKUkgqw/Oorwhs=; b=JO4jqCxIYySyHdzyVjXq3s5JZPIgVnaRZox5KAaRTv7nnTKDhyW1ekh7fpo5EID//5 BO99YbnpF407IjO6fnYt5yMg+7PbXIZTB9RqXdIf+6cvDyz+dCd856zRbCAibc72oHET 6jw97g7cG/aDwYvCJN2NZXIdaC6oVEguJhSDAfOJVNpiBx8YiQIe/jcs3LRZFE5ws9Mv lthZzWgfilS8elAWhF0Asla7kxfmlakGbbSdaQ/xrYRB0gPGecw5lktiarK4VUKZQweb PRx9OUMtXZD/EPQqoqCVVbpLPa2z7trqWVa/MCDwuaD8hMhFqtm2A3AMxvZdzQh/q6IB Y1ZA== X-Gm-Message-State: AOJu0YxV42dgSEWhtmoikB97b00+VAzlQ56xjIKxA5eqVGvYmtfnJ/CQ UDv3BPQdqY1r45NmQsSg7JiOW8Kxo2oytIRjYai3oUHGWYeE4PjxV5072phDwddlx9+L82pmyi1 AeguHuXuzZY2v1Sg/4bhcgH71fjRvu6no3HaKf/ZjY6aNnnUDu1Y2wT68O5AVnHRQpDACwFvo2v qUfGWdDItjqrfFm2qFk07sQ07acWer7bsfMFGr+RsS28c= X-Received: by 2002:a05:620a:4725:b0:79b:b8ef:160f with SMTP id af79cd13be357-79bb8ef1a25mr832758785a.0.1718979922257; Fri, 21 Jun 2024 07:25:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHKznxYW2FLJ0xJwWKaEtD8Tk/BHH+OIQZWLkXgxtzT/k39w+HZ9hDMrU5i6fKaXRUQoupH+Q== X-Received: by 2002:a05:620a:4725:b0:79b:b8ef:160f with SMTP id af79cd13be357-79bb8ef1a25mr832752985a.0.1718979921429; Fri, 21 Jun 2024 07:25:21 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79bce944cb2sm90564785a.125.2024.06.21.07.25.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 07:25:21 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: x86@kernel.org, Borislav Petkov , Dave Jiang , "Kirill A . Shutemov" , Ingo Molnar , Oscar Salvador , peterx@redhat.com, Matthew Wilcox , Vlastimil Babka , Dan Williams , Andrew Morton , Hugh Dickins , Michael Ellerman , Dave Hansen , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Christophe Leroy , Rik van Riel , Mel Gorman , "Aneesh Kumar K . V" , Nicholas Piggin , Huang Ying Subject: [PATCH 7/7] mm/mprotect: fix dax pud handlings Date: Fri, 21 Jun 2024 10:25:04 -0400 Message-ID: <20240621142504.1940209-8-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240621142504.1940209-1-peterx@redhat.com> References: <20240621142504.1940209-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is only relevant to the two archs that support PUD dax, aka, x86_64 and ppc64. PUD THPs do not yet exist elsewhere, and hugetlb PUDs do not count in this case. DAX have had PUD mappings for years, but change protection path never worked. When the path is triggered in any form (a simple test program would be: call mprotect() on a 1G dev_dax mapping), the kernel will report "bad pud". This patch should fix that. The new change_huge_pud() tries to keep everything simple. For example, it doesn't optimize write bit as that will need even more PUD helpers. It's not too bad anyway to have one more write fault in the worst case once for 1G range; may be a bigger thing for each PAGE_SIZE, though. Neither does it support userfault-wp bits, as there isn't such PUD mappings that is supported; file mappings always need a split there. The same to TLB shootdown: the pmd path (which was for x86 only) has the trick of using _ad() version of pmdp_invalidate*() which can avoid one redundant TLB, but let's also leave that for later. Again, the larger the mapping, the smaller of such effect. Another thing worth mention is this path needs to be careful on handling "retry" event for change_huge_pud() (where it can return 0): it isn't like change_huge_pmd(), as the pmd version is safe with all conditions handled in change_pte_range() later, thanks to Hugh's new pte_offset_map_lock(). In short, change_pte_range() is simply smarter than change_pmd_range() now after the shmem thp collapse rework. For that reason, change_pud_range() will need proper retry if it races with something else when a huge PUD changed from under us. Cc: Dan Williams Cc: Matthew Wilcox Cc: Dave Jiang Cc: Hugh Dickins Cc: Kirill A. Shutemov Cc: Vlastimil Babka Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: Michael Ellerman Cc: Aneesh Kumar K.V Cc: Oscar Salvador Cc: x86@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepa= ges") Fixes: 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage") Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 24 +++++++++++++++++++ mm/huge_memory.c | 52 +++++++++++++++++++++++++++++++++++++++++ mm/mprotect.c | 40 ++++++++++++++++++++++++------- 3 files changed, 108 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 212cca384d7e..b46673689f19 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -345,6 +345,17 @@ void split_huge_pmd_address(struct vm_area_struct *vma= , unsigned long address, void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address); =20 +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags); +#else +static inline int +change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) { return 0; } +#endif + #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud =3D (__pud); \ @@ -588,6 +599,19 @@ static inline int next_order(unsigned long *orders, in= t prev) { return 0; } + +static inline void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ +} + +static inline int change_huge_pud(struct mmu_gather *tlb, + struct vm_area_struct *vma, pud_t *pudp, + unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 static inline int split_folio_to_list_to_order(struct folio *folio, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0fffaa58a47a..af8d83f4ce02 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2091,6 +2091,53 @@ int change_huge_pmd(struct mmu_gather *tlb, struct v= m_area_struct *vma, return ret; } =20 +/* + * Returns: + * + * - 0: if pud leaf changed from under us + * - 1: if pud can be skipped + * - HPAGE_PUD_NR: if pud was successfully processed + */ +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm =3D vma->vm_mm; + pud_t oldpud, entry; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PUD_SIZE); + + /* NUMA balancing doesn't apply to dax */ + if (cp_flags & MM_CP_PROT_NUMA) + return 1; + + /* + * Huge entries on userfault-wp only works with anonymous, while we + * don't have anonymous PUDs yet. + */ + if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) + return 1; + + ptl =3D __pud_trans_huge_lock(pudp, vma); + if (!ptl) + return 0; + + /* + * Can't clear PUD or it can race with concurrent zapping. See + * change_huge_pmd(). + */ + oldpud =3D pudp_invalidate(vma, addr, pudp); + entry =3D pud_modify(oldpud, newprot); + set_pud_at(mm, addr, pudp, entry); + tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); + + spin_unlock(ptl); + return HPAGE_PUD_NR; +} +#endif + #ifdef CONFIG_USERFAULTFD /* * The PT lock for src_pmd and dst_vma/src_vma (for reading) are locked by @@ -2319,6 +2366,11 @@ void __split_huge_pud(struct vm_area_struct *vma, pu= d_t *pud, spin_unlock(ptl); mmu_notifier_invalidate_range_end(&range); } +#else +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ +} #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ =20 static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, diff --git a/mm/mprotect.c b/mm/mprotect.c index fb8bf3ff7cd9..694f13b83864 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -425,29 +425,53 @@ static inline long change_pud_range(struct mmu_gather= *tlb, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { struct mmu_notifier_range range; - pud_t *pud; + pud_t *pudp, pud; unsigned long next; long pages =3D 0, ret; =20 range.start =3D 0; =20 - pud =3D pud_offset(p4d, addr); + pudp =3D pud_offset(p4d, addr); do { +again: next =3D pud_addr_end(addr, end); - ret =3D change_prepare(vma, pud, pmd, addr, cp_flags); - if (ret) - return ret; - if (pud_none_or_clear_bad(pud)) + ret =3D change_prepare(vma, pudp, pmd, addr, cp_flags); + if (ret) { + pages =3D ret; + break; + } + + pud =3D READ_ONCE(*pudp); + if (pud_none(pud)) continue; + if (!range.start) { mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA, 0, vma->vm_mm, addr, end); mmu_notifier_invalidate_range_start(&range); } - pages +=3D change_pmd_range(tlb, vma, pud, addr, next, newprot, + + if (pud_leaf(pud)) { + if ((next - addr !=3D PUD_SIZE) || + pgtable_split_needed(vma, cp_flags)) { + __split_huge_pud(vma, pudp, addr); + goto again; + } else { + ret =3D change_huge_pud(tlb, vma, pudp, + addr, newprot, cp_flags); + if (ret =3D=3D 0) + goto again; + /* huge pud was handled */ + if (ret =3D=3D HPAGE_PUD_NR) + pages +=3D HPAGE_PUD_NR; + continue; + } + } + + pages +=3D change_pmd_range(tlb, vma, pudp, addr, next, newprot, cp_flags); - } while (pud++, addr =3D next, addr !=3D end); + } while (pudp++, addr =3D next, addr !=3D end); =20 if (range.start) mmu_notifier_invalidate_range_end(&range); --=20 2.45.0