From nobody Sun Feb 8 07:58:58 2026 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 386D41A4F37 for ; Tue, 13 Aug 2024 20:25:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723580745; cv=none; b=czyv3R5XRsZHp2z7FnxVx+oMeUmb+PZ2Fhcw40IPLIwyZFIUq/W27I643NIAYLmvJ8gbpjULLwtQEME1RCIvj/p4c8ss4GxjjhY9ZB3fIEAY8tc1GZQV6/5f1CoiBWNjx84LtcmRI3VxHGh7o5TrQMPkx1btPFQkyOk7NqOstWo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723580745; c=relaxed/simple; bh=NjEb/5kWU2ZKZsYUDCoXZhjVmSL4k/e2MGJXrVl34QQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=D+fN+MsWFr1BT/5RHLeUM+vJ6PaqaRg1UEyhl+c2gD8GRGmdjLm7ewIkAAdHhvUNcTAazABnTxMh3otDGfUxmdPaMx4fM54kHWPGDV3RxuA6V9CzZg+Kuq523aBM214PQxwkUjlKfe5D0UhGdWJYsRg2e+eHBeUy24e/10ex3Jc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dxTo4p+5; arc=none smtp.client-ip=209.85.128.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dxTo4p+5" Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-427fc9834deso9925e9.0 for ; Tue, 13 Aug 2024 13:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723580742; x=1724185542; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Xs2mtlmkE8RENg4xcjJxmH3AsBKwUjYWFgttp/hIQKk=; b=dxTo4p+5KkvQ9CsTVJgQfAcJ6atD/Bgl7HDK1yKBAWSd4rzYXBOR2R4aEdauCxMtvK OEon3H7Au0nLxfNEQ+/u2u+n2p++uDeMfMIQNpT1zl9mX8l2+cpqIekveBOX8V8lyDDI b4Qzy7IN1SoC7rXYCD5kDPWUZnXt1FdmznQvSIvV3NufpGXzZCK98oye3a7J9rEhXAEN lLG76oskUhWb08O/XyUa7LF9SRr76SQHpthLNiWutQEMK/fKX9p3DA7NV4cXxwcI3HtM 4l4qT/FByvIQGSSgcppm1g/f011ZQoVeiXOZSnWeYmGT7cI5xmxMwqo8kwaX6YjK9PGu AZpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723580742; x=1724185542; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xs2mtlmkE8RENg4xcjJxmH3AsBKwUjYWFgttp/hIQKk=; b=kiOq1VBVP6V4TXHsgjtP1/btCzdgnb0aWaKZsKfzbIBqS4eeToiaK+zx/wputK7dK+ pAiFcV8Z5o7tgZmCzXztOFBO1+d+DPiJMlXNznyuVSP47Aj4t5od3nvF5ymxGGu/U148 6WvjFfjzFR61VHxpt3hAkRrLXrY+3jYTjrsZI2yD98oYA/ADX2wQo34jtgBaSXkqY7wF a8WVBeBajQ7WwMnfIlkUCcE4hsjsh3BofAuH1CivxM5EEvdoWO1DOCaSXE3SGu8xDoGO sh7KE4fqHb99PUuqAjKnCuWLVXFsiJsXZpOQELPqn1Y32O0QCmtHNK7Uq8HqTBhEu6QJ 8MUw== X-Forwarded-Encrypted: i=1; AJvYcCUOQ6JmJvpSS+FmgNR3q6HCMpfxYgKPse9FQ/UfQungo8ju2BXkES8rNqEcFPM8NjEFYdtxVNvyq1yg/iGB+jq44I53ncPqCxwUMZy+ X-Gm-Message-State: AOJu0Yz3ELyDrWnODtC6ggKz6XtcZB/iCrANcjh0d/U5V3XQu7jA9jJT PT7svO2ZuEu1d6sbgVPz4M1c80lh5+EzJlOAipLlmkKdpl4WegUvNTaBmy2oTg== X-Google-Smtp-Source: AGHT+IFO4Kwu8LOnDCVoNHKE9IDgnB2lyd4k/4Eah6HV4M7+c2hknCrXNstVkl9ev/iPKDtfSvDJqg== X-Received: by 2002:a05:600c:1e24:b0:426:62a2:dfc with SMTP id 5b1f17b1804b1-429dec3dd98mr5415e9.5.1723580741697; Tue, 13 Aug 2024 13:25:41 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:a608:a4cb:f4c2:6573]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-429deb4c1bbsm910225e9.20.2024.08.13.13.25.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 13:25:41 -0700 (PDT) From: Jann Horn Date: Tue, 13 Aug 2024 22:25:21 +0200 Subject: [PATCH v2 1/2] userfaultfd: Fix checks for huge PMDs Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com> References: <20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com> In-Reply-To: <20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com> To: Andrew Morton , Pavel Emelianov , Andrea Arcangeli , Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Qi Zheng , Jann Horn , stable@vger.kernel.org X-Mailer: b4 0.15-dev X-Developer-Signature: v=1; a=ed25519-sha256; t=1723580736; l=3813; i=jannh@google.com; s=20240730; h=from:subject:message-id; bh=NjEb/5kWU2ZKZsYUDCoXZhjVmSL4k/e2MGJXrVl34QQ=; b=BJybKHoOx2uHAvfMX0klbtA/6hnFPEr8ifxsqo6HrIRCT1MpuWH3+pMSr68JQN+RXy0lPCIuT FleIsksNrFQCttylm7aL6gN/0i9yu5T6ZOyquKvGCEyUVqVI7VwWUvq X-Developer-Key: i=jannh@google.com; a=ed25519; pk=AljNtGOzXeF6khBXDJVVvwSEkVDGnnZZYqfWhP1V+C8= This fixes two issues. I discovered that the following race can occur: mfill_atomic other thread =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D pmdp_get_lockless() [reads none pmd] __pte_alloc [no-op] BUG_ON(pmd_none(*dst_pmd)) I have experimentally verified this in a kernel with extra mdelay() calls; the BUG_ON(pmd_none(*dst_pmd)) triggers. On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail"), this can't lead to anything worse than a BUG_ON(), since the page table access helpers are actually designed to deal with page tables concurrently disappearing; but on older kernels (<=3D6.4), I think we could probably theoretically race past the two BUG_ON= () checks and end up treating a hugepage as a page table. The second issue is that, as Qi Zheng pointed out, there are other types of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs (in particular, migration PMDs). On <=3D6.4, this is worse than the first issue: If mfill_atomic() runs on a PMD that contains a migration entry (which just requires winning a single, fairly wide race), it will pass the PMD to pte_offset_map_lock(), which assumes that the PMD points to a page table. Breakage follows: First, the kernel tries to take the PTE lock (which will crash or maybe worse if there is no "struct page" for the address bits in the migration entry PMD - I think at least on X86 there usually is no corresponding "struct page" thanks to the PTE inversion mitigation, amd64 looks different). If that didn't crash, the kernel would next try to write a PTE into what it wrongly thinks is a page table. As part of fixing these issues, get rid of the check for pmd_trans_huge() before __pte_alloc() - that's redundant, we're going to have to check for that after the __pte_alloc() anyway. Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels. Reported-by: Qi Zheng Closes: https://lore.kernel.org/r/59bf3c2e-d58b-41af-ab10-3e631d802229@byte= dance.com Cc: stable@vger.kernel.org Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY= |UFFDIO_ZEROPAGE preparation") Signed-off-by: Jann Horn Acked-by: David Hildenbrand Reviewed-by: Qi Zheng --- mm/userfaultfd.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e54e5c8907fa..290b2a0d84ac 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -787,21 +787,23 @@ static __always_inline ssize_t mfill_atomic(struct us= erfaultfd_ctx *ctx, } =20 dst_pmdval =3D pmdp_get_lockless(dst_pmd); - /* - * If the dst_pmd is mapped as THP don't - * override it and just be strict. - */ - if (unlikely(pmd_trans_huge(dst_pmdval))) { - err =3D -EEXIST; - break; - } if (unlikely(pmd_none(dst_pmdval)) && unlikely(__pte_alloc(dst_mm, dst_pmd))) { err =3D -ENOMEM; break; } - /* If an huge pmd materialized from under us fail */ - if (unlikely(pmd_trans_huge(*dst_pmd))) { + dst_pmdval =3D pmdp_get_lockless(dst_pmd); + /* + * If the dst_pmd is THP don't override it and just be strict. + * (This includes the case where the PMD used to be THP and + * changed back to none after __pte_alloc().) + */ + if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) || + pmd_devmap(dst_pmdval))) { + err =3D -EEXIST; + break; + } + if (unlikely(pmd_bad(dst_pmdval))) { err =3D -EFAULT; break; } --=20 2.46.0.76.ge559c4bf1a-goog From nobody Sun Feb 8 07:58:58 2026 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30A6F1A7047 for ; Tue, 13 Aug 2024 20:25:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723580746; cv=none; b=i11GydO4Qey7Wcp08tKpX/GIB9fWK4n5KX0Kak5x4ZMipzdnYtBiIt/a/L+5JGFjj2n6XCz5w9+4sT7iHEwJUhcQM+VIrIybP6L+zP35zyp9AAZQRYu2ZHysQx8uP2vjH9JPrADGJE7siuE+fjahpyT6eEBeobuj6aURvhxhxkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723580746; c=relaxed/simple; bh=tzIAzt339urOnQP09jTAjXTel5g3L+lCAuxLVJJcPg8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=p8O79PXa3iwMxQcDspdR93yCL5/SNpscPhXl9MstZavlc3hu5r3R2KuOYD+lhYvlUnASSxmFpMiRrW+01tjDyzre3HUWjNca5mDbjbX/F8sv2y8z3+0Eq/cPzRg/JiVo8DDcRMFIoJhbUOtc4BW4AaCyUSE576pt+k9gk2BoXjI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Nq6cFsC7; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Nq6cFsC7" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-428e12f6e56so9965e9.0 for ; Tue, 13 Aug 2024 13:25:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723580743; x=1724185543; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=C8hC/8u5rHQIE5OTDIzO5nzE5ccZrI6SAOcdv8SAfDg=; b=Nq6cFsC7R9m1nkBGwXL8XV5lNp1CMHOmel6arhe4m3i2d0zk2qFJ/9rhLIOb5jUjem 180RAWApLDXqQ0v/ghxPslCcWt56gG1+NS3CyYRhEsULlQx9MMrhJLi1i/o2Xn4EHDoM TB/lz+NJ5tnf+0Y73MxL+A9FZWmfCat9Tcg/UEkuFsP9pht7SBTHtZEHmGf57Dpm0zCR FijL/PozcaZFA7abHphrB0tjAtaomVleSzOdkEUPQmPTxTgEbvLLYpmOao+X2JVoCzMJ Z7WbJUGWRSO2mfgg2gIPqQ04nFkUVOPB1VxozmIQkqLSPMLzpit63ZOjDRVVmEw2RyEs 93bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723580743; x=1724185543; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C8hC/8u5rHQIE5OTDIzO5nzE5ccZrI6SAOcdv8SAfDg=; b=BvuVL4ukaAuPdvd8/xff6EEuG40WYou4wIqvAOkWXuihrO6x5PwlbalWetmIl7jpEB rxorBEpyVZMsEGk5H8bipRqLkOInB23TQXcy7NdgYYslupwqnkg47fnPVbHwRDKjF7Jn LrV8cJUwHQhzNn5jPkvQSGdntl4GAyogYmCX+MsBLKzlfLl2liBe81PGtqbkyLywSyO6 LNG5U/AP3qu8f4q/hkTsauPDb0C/mof7oXNbFk2bUZq73E667OQ8VUAPrb1BnLycYViq FYUi/N7TdRSRbxFi1Neeb6k/JZZlD44jbRinFI6jKN3SrYq1PsxZsdbmJ81ifkO8SdaC ddsA== X-Forwarded-Encrypted: i=1; AJvYcCX0ErbFBbK0jYTbCDdTwRlogbug21uh4nDA1i/2HlEZQiTz6SKfa+wqs+l8bG9+DgzXG+jEyVQheC+U2H1wBE0l2DRTURFrQxzVr8kk X-Gm-Message-State: AOJu0YyT3vTKowJswKCyIxpDjRUouwGPE6wVYwlg+0t43aAl6G4hhfXN gjGAysQmO2BGDfAHHh7gdbvDy5AbWdxIQDwJtGqeJH2T9NgFeLsCpt/3AIxxQA== X-Google-Smtp-Source: AGHT+IGLLbA7isZyCSmnyKczKWq9+cyJGJEJpYUawegxPAI3pwvA6i9O/zVRBIs345jqiM6wxH4Ilg== X-Received: by 2002:a05:600c:501f:b0:426:7018:2e2f with SMTP id 5b1f17b1804b1-429dec3d255mr5555e9.5.1723580742844; Tue, 13 Aug 2024 13:25:42 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:a608:a4cb:f4c2:6573]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-429d877f234sm26530545e9.1.2024.08.13.13.25.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 13:25:42 -0700 (PDT) From: Jann Horn Date: Tue, 13 Aug 2024 22:25:22 +0200 Subject: [PATCH v2 2/2] userfaultfd: Don't BUG_ON() if khugepaged yanks our page table Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20240813-uffd-thp-flip-fix-v2-2-5efa61078a41@google.com> References: <20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com> In-Reply-To: <20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com> To: Andrew Morton , Pavel Emelianov , Andrea Arcangeli , Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Qi Zheng , Jann Horn , stable@vger.kernel.org X-Mailer: b4 0.15-dev X-Developer-Signature: v=1; a=ed25519-sha256; t=1723580736; l=1397; i=jannh@google.com; s=20240730; h=from:subject:message-id; bh=tzIAzt339urOnQP09jTAjXTel5g3L+lCAuxLVJJcPg8=; b=U5pU3rpcbs7XfZhd6TtUWxvTgQAQAflkgJ5K+vZYgkKpqqGS3XYHTVRAJkLG94mCVkVX5F7fz XY0Lb4AZrHXC6jrhYrjoIfRyPG3EWQwDdn3PUdv78rzxnTrCvAGxNWq X-Developer-Key: i=jannh@google.com; a=ed25519; pk=AljNtGOzXeF6khBXDJVVvwSEkVDGnnZZYqfWhP1V+C8= Since khugepaged was changed to allow retracting page tables in file mappings without holding the mmap lock, these BUG_ON()s are wrong - get rid of them. We could also remove the preceding "if (unlikely(...))" block, but then we could reach pte_offset_map_lock() with transhuge pages not just for file mappings but also for anonymous mappings - which would probably be fine but I think is not necessarily expected. Cc: stable@vger.kernel.org Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or = vma lock") Reviewed-by: Qi Zheng Acked-by: David Hildenbrand Signed-off-by: Jann Horn --- mm/userfaultfd.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 290b2a0d84ac..acc56c75ba99 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -807,9 +807,10 @@ static __always_inline ssize_t mfill_atomic(struct use= rfaultfd_ctx *ctx, err =3D -EFAULT; break; } - - BUG_ON(pmd_none(*dst_pmd)); - BUG_ON(pmd_trans_huge(*dst_pmd)); + /* + * For shmem mappings, khugepaged is allowed to remove page + * tables under us; pte_offset_map_lock() will deal with that. + */ =20 err =3D mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, src_addr, flags, &folio); --=20 2.46.0.76.ge559c4bf1a-goog