From nobody Thu Dec 18 00:07:22 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F26082165EA for ; Thu, 9 Jan 2025 14:08:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736431713; cv=none; b=bmmluEtM7G4ZemHf4A1/lzeVhHgmiU74b/jGjY1bagQp4JzRPsPdM/Ov0nwZi5c1D1iTwprErOUgNdaVNcoXKqDSQwqH/3RXoQ5mF50Hidlq+zFL5vxIuzJniHE+SdY40xcebvT8aLSG3v0OUlPVrujGMWyFwkinpeZSx76wCGM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736431713; c=relaxed/simple; bh=zDsIFhPAwITcTn1KoMkiuUhegSn2QXu70egLTjdASl8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X9mzDCmGJSE1S5KaPcoqRfjG0nTNFS1mfFKsjMN4vbrT8c830ynrETbQLhCPt4Vj9B0Mtl/EpUSA9EdE5+DtVnQ60Kg7XO3EECj01is5kISAoAwhf3BX3TI9p+/CBRQGiR4pl4gbPSGhphYLdLZO0x3/sOJwudfpzOMcg6LH/NU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=eEewo2Dq; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="eEewo2Dq" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=UXXMy8NjgmXapI75A+lfqUhBwTskcOXo5TKctXbko48=; b=eEewo2DqTqYsZN0oqse9fjK050 Bkp6wVR+qVdk9ZRZSMunW1WYpzMyKt0yh2LCYqLhNpBiMAmQ5BXufoemNej9jBhlVSpklEAgHxuxU 0hZPf5RkIyhWuUqxzTjdZ8duCJA16X3u5Yi/Vvn0rd7ZpnhyRW7EdLJtAdx/HNg/l/S5wCrc2u4xv uQ1FFZWzIqiLIlaYIJ77sDhTe8rUa5l53vQlSPNoP7KD0ZQy6PN2P2EtjRGL7tlEcfGezlOOgehl0 3jFiKFQTgMNa/u58uh16wSpIbLNbghwn1TnE5Ty3XbboyMYnykVXQ0hkg9+hv0hXLHyCMwEOUCO5E UHlfmhow==; Received: from [2001:8b0:10b:1::ebe] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tVtCZ-00000006IQH-30ko; Thu, 09 Jan 2025 14:07:59 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tVtCY-0000000Bx6B-35Gp; Thu, 09 Jan 2025 14:07:58 +0000 From: David Woodhouse To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , David Woodhouse , Sourabh Jain , Hari Bathini , Michael Ellerman , Thomas Zimmermann , Andrew Morton , Baoquan He , Yuntao Wang , David Kaplan , Tao Liu , "Kirill A . Shutemov" , Kai Huang , Ard Biesheuvel , Josh Poimboeuf , Breno Leitao , Wei Yang , Rong Xu , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , linux-kernel@vger.kernel.org, kexec@lists.infradead.org, Simon Horman , Dave Young , Peter Zijlstra , bsz@amazon.de, nathan@kernel.org Subject: [PATCH v2 4/9] x86/kexec: Fix stack and handling of re-entry point for ::preserve_context Date: Thu, 9 Jan 2025 14:04:16 +0000 Message-ID: <20250109140757.2841269-5-dwmw2@infradead.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250109140757.2841269-1-dwmw2@infradead.org> References: <20250109140757.2841269-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse A ::preserve_context kimage can be invoked more than once, and the entry point can be different every time. When the callee returns to the kernel, it leaves the address of its entry point for next time on the stack. That being the case, one might reasonably assume that the caller would allocate space for it on the stack fram before actually performing the 'call' into the callee. Apparently not, though. Ever since the kjump code was first added in 2009, it has set up a *new* stack at the top of the swap_page scratch page, then just performed the 'call' without allocating any space for the re-entry address to be returned. It then reads the re-entry point for next time from 0(%rsp) which is actually the first qword of the page *after* the swap page, which might not exist at all! And if the callee has written to that, then it will have corrupted memory it doesn't own. Correct this by pushing the entry point of the callee onto the stack before calling it. The callee may then adjust it, or not, as it sees fit, and subsequent invocations should work correctly either way. Remove a stray push of zero to the *relocate_kernel* stack, which may have been intended for this purpose, but which was actually just noise. Also, loading the stack for the callee relied on the address of the swap page being in %r10 without ever documenting that fact. Recent code changes made that no longer true, so load it directly from the local kexec_pa_swap_page variable instead. Fixes: b3adabae8a96 ("x86/kexec: Drop page_list argument from relocate_kern= el()") Signed-off-by: David Woodhouse --- arch/x86/kernel/relocate_kernel_64.S | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocat= e_kernel_64.S index 3ca3bf6b3f49..a95691b42c5c 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -113,8 +113,6 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped) * %r13 original CR4 when relocate_kernel() was invoked */ =20 - /* set return address to 0 if not preserving context */ - pushq $0 /* store the start address on the stack */ pushq %rdx =20 @@ -208,12 +206,19 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped) =20 .Lrelocate: popq %rdx + + /* Use the swap page for the callee's stack */ + movq kexec_pa_swap_page(%rip), %r10 leaq PAGE_SIZE(%r10), %rsp + + /* push the existing entry point onto the callee's stack */ + pushq %rdx + ANNOTATE_RETPOLINE_SAFE call *%rdx =20 /* get the re-entry point of the peer system */ - movq 0(%rsp), %rbp + popq %rbp leaq relocate_kernel(%rip), %r8 movq kexec_pa_swap_page(%rip), %r10 movq pa_backup_pages_map(%rip), %rdi @@ -247,6 +252,7 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped) lgdt saved_context_gdt_desc(%rax) #endif =20 + /* relocate_kernel() returns the re-entry point for next time */ movq %rbp, %rax =20 popf --=20 2.47.0