From nobody Sun Feb 8 04:13:10 2026 Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB4063570BB for ; Tue, 6 Jan 2026 18:04:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767722677; cv=none; b=IeLg2nsCfF/XpNkKUcHfjJ9RU5go54aE0U5ukqn10nlW2/AtCymh79lc7LWmRXMK96fWPP1G4s9Qfh/O4G+DJ9dCzJKZCXDT0F9Ex7Hi0ZT2zlw4f4zm5GwziGkvbCIZXaIBG6NQeMCY+1pWnQ6jlitIiQ57zBsOTtuUW6NCRf8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767722677; c=relaxed/simple; bh=CIaLlpESH0ghOFiv3rXPRFD1JpOkEcEkdqDhNOnkYAc=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=ZuV6REiwEGDtIX1m/6CSCGSPA0Ma4nQau/3OhWNErlw3WrUt0iTrM5ZVddP/KeaSoOpncBQFBDnSh2ZBLn5MKmSJMm4ir3hqTsbKKIMoZg29EGVyqV1SOwmc229j5hbQijs9nY9lLP1ava9YiM3SQp6suJ6sGZ7CORMCDg//WJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=citrix.com; spf=pass smtp.mailfrom=citrix.com; dkim=pass (1024-bit key) header.d=citrix.com header.i=@citrix.com header.b=QHNAkLBf; arc=none smtp.client-ip=209.85.128.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=citrix.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=citrix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=citrix.com header.i=@citrix.com header.b="QHNAkLBf" Received: by mail-wm1-f44.google.com with SMTP id 5b1f17b1804b1-47d1d8a49f5so8566845e9.3 for ; Tue, 06 Jan 2026 10:04:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citrix.com; s=google; t=1767722673; x=1768327473; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Var1b/I8bDakbx+YOmwdIeXekCHIb3hIZrcAPdgbBnI=; b=QHNAkLBfh/ykKw+JikBz82kvg7zuY8hpwcE1us+UDBHuhOKR4JCnU4hfa0FZOnr/B3 yL1htzWNOIvcAL5zWx68rQOX11RUSN3ECJmWmamkPK5w2LH7DzSqn7r005IH7dYkYcGQ qhHKmEamRrK6IAeYD2ngORzwLfjbzMCTiZ99I= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767722673; x=1768327473; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Var1b/I8bDakbx+YOmwdIeXekCHIb3hIZrcAPdgbBnI=; b=SmP5luAveU/sn7V+a6CuJaxpZGwCleWK6PqdnFHL9nqQWtVYX6EAtShWaDb4nEUa7l WNtkxDArpYcKmMlY9xEqSvvshMhjcM+4tJMPVBm/l3OkWm9HNGxFAMaOMjaqrRCWIOI9 ReHucoxQYi+JamWNbsC5QFQNYW7mJXH6/C+Czb/99KNivL6dh0R97113WVsKRNeSRQSw s5X6wuKQIq2JWlKHioWE9/ClJMI9R2ncTodIIWedqp9WyftJTDay90qYS1nkDubrT71N Am4EPmfGyCSxYNb9k3DbS3i68s5htWtFqml/bQkM28auIHlqItcnYp6Wl9JGY5x71di5 QzTw== X-Gm-Message-State: AOJu0YxRyANXB8Fqr8POF2sKIxTwHEoBxScmN8cR849F//wEi7P+QRy9 X+TzJBIclQWc5W2GSuMOh0BS90tJYmTSJf6/dftJF2WOQp123JNSayFJ45RIzGdrWp8H5qwXUVp P8MTg X-Gm-Gg: AY/fxX4ziX51UpwPCqJhacC3+CVvJ+oQCkXElM5W3H8nBty7xl7wUqqvrYeZpxvvdyV 87EP5wVqSzUihplwTYk0lF8pyIfi0laec4SM9gHldL76wPsT9t+yEfJGVz6V5+owJHnAElM8sVc VQuxSPeMaVPxZwyZLLrVusM0AEd7WRXsI41y4a1ok0swH3UNMUQywOr3RE+1LzKvKs4deQNfmUw Mu0KjuPE6ZciEGbj9lZef6WWguGVOJbKdb7B4NsOwt2yrtk8qLrn0YSumzwg8KREXtCn+fOyVoq YuViv/Gaw6gG3P1NBugXKYrpr2RuPkqjE+mAQ5WYIkNyza9jlGtb/UNv6aBKvwc9nfb1oPKUSck bs1lGDVm+Athw1X1+JY151XhHaR19uPmrK/O1grFEYMuwAbqCpOn+9sdPRHA+V+8qNIx73NfgtV OcYQ+joAx+ElZiN62hXY8ZT55VjmT9fQF+EBP08eDDAiE49WkeFw517XTK46U1m+I1IVmP4ddE X-Google-Smtp-Source: AGHT+IGj3t37tVGD6XzizE/B54S4NC2zv2fPVKWhPjPBCg2hJAgUbAGeQ89xeM8bg0BAJ6AnfJBrZw== X-Received: by 2002:a05:600c:34c6:b0:47d:3ffa:980e with SMTP id 5b1f17b1804b1-47d7f09b759mr35979615e9.28.1767722672877; Tue, 06 Jan 2026 10:04:32 -0800 (PST) Received: from localhost.localdomain (host-92-26-102-188.as13285.net. [92.26.102.188]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-432bd5ee243sm5519630f8f.31.2026.01.06.10.04.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jan 2026 10:04:32 -0800 (PST) From: Andrew Cooper To: LKML Cc: Andrew Cooper , Marco Elver , Alexander Potapenko , Dmitry Vyukov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Andrew Morton , Jann Horn , kasan-dev@googlegroups.com Subject: [PATCH] x86/kfence: Avoid writing L1TF-vulnerable PTEs Date: Tue, 6 Jan 2026 18:04:26 +0000 Message-Id: <20260106180426.710013-1-andrew.cooper3@citrix.com> X-Mailer: git-send-email 2.39.5 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For native, the choice of PTE is fine. There's real memory backing the non-present PTE. However, for XenPV, Xen complains: (XEN) d1 L1TF-vulnerable L1e 8010000018200066 - Shadowing To explain, some background on XenPV pagetables: Xen PV guests are control their own pagetables; they choose the new PTE value, and use hypercalls to make changes so Xen can audit for safety. In addition to a regular reference count, Xen also maintains a type reference count. e.g. SegDesc (referenced by vGDT/vLDT), Writable (referenced with _PAGE_RW) or L{1..4} (referenced by vCR3 or a lower pagetable level). This is in order to prevent e.g. a page being inserted into the pagetables for which the guest has a writable mapping. For non-present mappings, all other bits become software accessible, and typically contain metadata rather a real frame address. There is nothing that a reference count could sensibly be tied to. As such, even if Xen could recognise the address as currently safe, nothing would prevent that frame from changing owner to another VM in the future. When Xen detects a PV guest writing a L1TF-PTE, it responds by activating shadow paging. This is normally only used for the live phase of migration, and comes with a reasonable overhead. KFENCE only cares about getting #PF to catch wild accesses; it doesn't care about the value for non-present mappings. Use a fully inverted PTE, to avoid hitting the slow path when running under Xen. While adjusting the logic, take the opportunity to skip all actions if the PTE is already in the right state, half the number PVOps callouts, and skip TLB maintenance on a !P -> P transition which benefits non-Xen cases too. Fixes: 1dc0da6e9ec0 ("x86, kfence: enable KFENCE for x86") Tested-by: Marco Elver Signed-off-by: Andrew Cooper Reviewed-by: Alexander Potapenko --- CC: Alexander Potapenko CC: Marco Elver CC: Dmitry Vyukov CC: Thomas Gleixner CC: Ingo Molnar CC: Borislav Petkov CC: Dave Hansen CC: x86@kernel.org CC: "H. Peter Anvin" CC: Andrew Morton CC: Jann Horn CC: kasan-dev@googlegroups.com CC: linux-kernel@vger.kernel.org v1: * First public posting. This went to security@ first just in case, and then I got districted with other things ahead of public posting. --- arch/x86/include/asm/kfence.h | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h index ff5c7134a37a..acf9ffa1a171 100644 --- a/arch/x86/include/asm/kfence.h +++ b/arch/x86/include/asm/kfence.h @@ -42,10 +42,34 @@ static inline bool kfence_protect_page(unsigned long ad= dr, bool protect) { unsigned int level; pte_t *pte =3D lookup_address(addr, &level); + pteval_t val; =20 if (WARN_ON(!pte || level !=3D PG_LEVEL_4K)) return false; =20 + val =3D pte_val(*pte); + + /* + * protect requires making the page not-present. If the PTE is + * already in the right state, there's nothing to do. + */ + if (protect !=3D !!(val & _PAGE_PRESENT)) + return true; + + /* + * Otherwise, invert the entire PTE. This avoids writing out an + * L1TF-vulnerable PTE (not present, without the high address bits + * set). + */ + set_pte(pte, __pte(~val)); + + /* + * If the page was protected (non-present) and we're making it + * present, there is no need to flush the TLB at all. + */ + if (!protect) + return true; + /* * We need to avoid IPIs, as we may get KFENCE allocations or faults * with interrupts disabled. Therefore, the below is best-effort, and @@ -53,11 +77,6 @@ static inline bool kfence_protect_page(unsigned long add= r, bool protect) * lazy fault handling takes care of faults after the page is PRESENT. */ =20 - if (protect) - set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); - else - set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); - /* * Flush this CPU's TLB, assuming whoever did the allocation/free is * likely to continue running on this CPU. base-commit: 7f98ab9da046865d57c102fd3ca9669a29845f67 --=20 2.39.5