[PATCH] vt: discard stale unicode buffer on alt screen exit after resize

Liav Mordouch posted 1 patch 5 days, 21 hours ago
There is a newer version of this series
drivers/tty/vt/vt.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
[PATCH] vt: discard stale unicode buffer on alt screen exit after resize
Posted by Liav Mordouch 5 days, 21 hours ago
Hi,

Following up on the bug report I sent earlier. I've done some further digging
and found the root cause, and I'm including a patch that fixes it. I've tested
it and confirmed it resolves the crash on my system.

Root cause analysis:

The crash I reported in csi_J is caused by a size mismatch between
vc_uni_lines and the actual console dimensions. Here's what happens:

  1. Console starts at 80x25. vc_uni_lines is allocated for 80x25
     (25 pointers + 25*80 u32s, contiguous).

  2. Something enters the alternate screen (\033[?1049h). enter_alt_screen()
     saves vc_uni_lines (80x25) into vc_saved_uni_lines and sets
     vc_uni_lines = NULL.

  3. Console gets resized to 240x67 (amdgpu fbcon takes over). vc_do_resize()
     checks "if (vc->vc_uni_lines)" -- it's NULL, so it skips reallocation.
     But vc_saved_uni_lines (still 80x25) is never touched by the resize.

  4. Something leaves the alternate screen (\033[?1049l). leave_alt_screen()
     calls vc_uniscr_set(vc, vc->vc_saved_uni_lines), blindly restoring the
     stale 80x25 buffer. Now vc_uni_lines points to an 80x25 allocation but
     vc_rows=67, vc_cols=240.

  5. Next clear screen (csi_J CSI_J_VISIBLE) calls
     vc_uniscr_clear_lines(vc, 0, 67), which iterates 67 rows with
     memset32(..., 240). For rows 0-24, each row only has space for 80 u32s
     so memset32 overflows into adjacent rows. At row 25, vc_uni_lines[25]
     reads from the data area (past the 25-entry pointer array), getting
     0x0000002000000020 (two u32 space values read as a pointer) -- page
     fault, crash.

This is confirmed by the registers from the oops:
  RDI = 0x0000002000000020  (bogus pointer = two space chars)
  RCX = 0xf0 = 240          (vc_cols, count for memset32)
  RSI = 0xc8 = 200 = 25*8   (byte offset = row index 25)
  RDX = 0x218 = 536 = 67*8  (loop end = 67 rows)
  RAX = 0x20                 (space character, fill value)

The same faulting address (0x0000002000000020) showed up across 3 separate
crash boots, which makes sense since the memory layout is deterministic.

The fix: in leave_alt_screen(), check if the console dimensions changed while
in the alternate screen. If they did, discard the stale vc_saved_uni_lines
instead of restoring it. The unicode screen will be lazily rebuilt via
vc_uniscr_check() when next needed. This is consistent with how the regular
screen buffer restore in the same function already handles dimension
mismatches using min(saved_rows, current_rows).

I added a comment in the patch explaining the non-obvious reason for the
conditional -- without it, a future reader would have no context for why
vc_saved_uni_lines isn't unconditionally restored, and might "simplify" the
code back into the buggy version.

Tested on my system (Gentoo, 6.19.10, AMD Ryzen 5 5600X, RX 7800 XT with
amdgpu). Previously crashed 4 out of 5 boots, now boots cleanly with the
patch applied.

Note: writing of this email was assisted by AI for grammar and flow. Sorry in
advance if anything reads off.

---
 drivers/tty/vt/vt.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -1907,6 +1907,7 @@
 	unsigned int rows = min(vc->vc_saved_rows, vc->vc_rows);
 	unsigned int cols = min(vc->vc_saved_cols, vc->vc_cols);
 	u16 *src, *dest;
+	bool uni_lines_stale;
 
 	if (vc->vc_saved_screen == NULL)
 		return; /* Not inside an alt-screen */
@@ -1915,7 +1916,18 @@
 		dest = ((u16 *)vc->vc_origin) + r * vc->vc_cols;
 		memcpy(dest, src, 2 * cols);
 	}
-	vc_uniscr_set(vc, vc->vc_saved_uni_lines);
+	/*
+	 * If the console was resized while in the alternate screen,
+	 * vc_saved_uni_lines was allocated for the old dimensions.
+	 * Restoring it would cause out-of-bounds accesses. Discard it
+	 * and let the unicode screen be lazily rebuilt.
+	 */
+	uni_lines_stale = vc->vc_saved_rows != vc->vc_rows ||
+			  vc->vc_saved_cols != vc->vc_cols;
+	if (uni_lines_stale)
+		vc_uniscr_free(vc->vc_saved_uni_lines);
+	else
+		vc_uniscr_set(vc, vc->vc_saved_uni_lines);
 	vc->vc_saved_uni_lines = NULL;
 	restore_cur(vc);
 	/* Update the entire screen */