From nobody Mon Jun 8 19:53:24 2026 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AF2A37C90E for ; Wed, 27 May 2026 03:43:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853419; cv=none; b=CHhGQ58Z6+pSLGVobNEgO0cZNC4i7wTn+0owNj90TWS1CGWVvgn1EQuLUVsg8qe0F3L1zI2fXeyLZ0wxjfSle8sXPv+UFnqyROjtpNy4AzuK9LDEoYXHlVKK69thqjJzwLHatx1DdSaqe0us7wGXNzuwg9DwQEtE/4Y+4w21A+A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779853419; c=relaxed/simple; bh=EXjVBb5Ig2IT9EKs+PFWuEm3SeLDgepen+UPXHOpmX0=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=J62oAvONvnVyt3EkYv6QzpY0thCdurUZ0dxWldvOajXqbcR3LMwL8512rODjpLCP1QgDQ0yL2uGtoTgh1xfQmQA+AsdMLI76NsuAdQ/zNgxxvu48xyb2EWCXoMpFvHr5dCNBfg5+AHSem4xhkWSGflDTQSA643Ata5DvyfGlozQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=Vya2D7X3; arc=none smtp.client-ip=115.124.30.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="Vya2D7X3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1779853406; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=L6mup5lqdeUhczhQwBcW6+kxK4b2li0F0B6Fr9vK1Aw=; b=Vya2D7X3kmO9lrw84KKi7onyx3q85JbFFr4zCETQ/QHOGO6iSlA8DyBDx4v2Qiqm7XlvDuXXWrTlXV2ShKdXx7iKEQ6XM6XjT0nE3tCoDSw0I74v6qwXSm04KL9WtOZWNX6DzSrCriK3Yq78n96QUYE3eSMSSHoKlClV987kzr8= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033032089153;MF=feng.tang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0X3i7a6f_1779853405; Received: from localhost(mailfrom:feng.tang@linux.alibaba.com fp:SMTPD_---0X3i7a6f_1779853405 cluster:ay36) by smtp.aliyun-inc.com; Wed, 27 May 2026 11:43:26 +0800 From: Feng Tang To: Andrew Morton , Petr Mladek , Steven Rostedt , paulmck@kernel.org, linux-kernel@vger.kernel.org Cc: Feng Tang Subject: [PATCH] lib/sys_info: add a simple timer based memory corruption detector Date: Wed, 27 May 2026 11:43:24 +0800 Message-Id: <20260527034324.51136-1-feng.tang@linux.alibaba.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During debugging some bios/hardware related nasty memory corruption issues, we found using periodic timer to monitor specific dram/mmio physical address is very useful for debugging, which acts like a basic software watchpoint. For those bugs, who (and when) change(corrupt) those dram or mmio register is hard to trace, and sometimes even hardware jtag debugger can't help (say the physical address watchpoint doesn't work). The biggest shortcoming is it can never capture the exact point like a hardware watchpoint, no matter how small the timer interval is set, the idea is trying to approach the point, hoping the caught context have enough debug info (which did help us in solving bios/hardware bugs) The working flow is simple: after suspected address is identified, start periodic timer polling it to catch if its value is changed to target 'magic' value, then halt the cpu (better limit to have only one cpu online), or panic, or print out system information, so that the error environment is frozen for further check , or let kexec/kdump to record the vmore, etc. All the settings are module parameters: watch_interval_ms: SW watchpoint check interval in ms paddr_dram_to_watch: Physical dram address to monitor. target_dram_val: Expected value at the dram address that triggers the wat= chpoint. paddr_mmio_to_watch: Physical mmio address to monitor. Must be 32-bit ali= gned. target_mmio_val: Expected value at the mmio address that triggers the wat= chpoint. panic_on_hit: Trigger kernel panic when watchpoint condition hits. hang_on_hit: halt the CPU (wait for HW debugger) This RFC is trying to show the idea and get feedback, and there are some todos: * merge the dram/mmio interface to auto detect it's dram or mmio * support runtime changing the address * move the starting point earlier in boot phase * currently is monitoring 'changing to a value', add support for 'changing from a value' Signed-off-by: Feng Tang --- lib/sys_info.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) diff --git a/lib/sys_info.c b/lib/sys_info.c index f32a06ec9ed4..90ddcf786b98 100644 --- a/lib/sys_info.c +++ b/lib/sys_info.c @@ -164,3 +164,107 @@ void sys_info(unsigned long si_mask) { __sys_info(si_mask ? : kernel_si_mask); } + +#ifdef CONFIG_SW_WATCHPOINT + +/* default 100 ms interval */ +static unsigned long watch_interval_ms =3D 100; +module_param(watch_interval_ms, ulong, 0644); +MODULE_PARM_DESC(watch_interval_ms, "SW watchpoint check interval in ms"); + +static unsigned long paddr_dram_to_watch; +module_param(paddr_dram_to_watch, ulong, 0644); +MODULE_PARM_DESC(paddr_dram_to_watch, "Physical DRAM address to watch"); + +static unsigned long *vaddr_dram; + +static unsigned long target_dram_val; +module_param(target_dram_val, ulong, 0644); +MODULE_PARM_DESC(target_dram_val, "Target DRAM value to trigger watchpoint= "); + +/* The MMIO address should be 32b aligned */ +static unsigned long paddr_mmio_to_watch; +module_param(paddr_mmio_to_watch, ulong, 0644); +MODULE_PARM_DESC(paddr_mmio_to_watch, "Physical MMIO address to watch (32b= it aligned)"); + +static unsigned int *vaddr_mmio; + +static unsigned int target_mmio_val; +module_param(target_mmio_val, uint, 0644); +MODULE_PARM_DESC(target_mmio_val, "Target MMIO value to trigger watchpoint= "); + +static bool panic_on_hit; +module_param(panic_on_hit, bool, 0644); +MODULE_PARM_DESC(panic_on_hit, "Panic when watchpoint hits"); + +static bool hang_on_hit; +module_param(hang_on_hit, bool, 0644); +MODULE_PARM_DESC(hang_on_hit, "Hang when watchpoint hits"); + +/* Stop the watchpoint timer after first hit */ +static bool check_once =3D true; +module_param(check_once, bool, 0644); +MODULE_PARM_DESC(check_once, "Stop watching after first hit"); + +static struct timer_list sw_watchpoint_timer; + +static void sw_watchpoint_timer_fn(struct timer_list *unused) +{ + bool hit =3D false; + + if (vaddr_mmio && (*vaddr_mmio =3D=3D target_mmio_val)) { + pr_info("MMIO [@0x%lx] hit the target value [0x%x]!\n", + paddr_mmio_to_watch, target_mmio_val); + hit =3D true; + } + + if (vaddr_dram && (*vaddr_dram =3D=3D target_dram_val)) { + pr_info("DRAM [@0x%lx] hit the target value [0x%lx]!\n", + paddr_dram_to_watch, target_dram_val); + hit =3D true; + } + + if (hit) { + sys_info(0); + + /* Useful for attaching HW debugger */ + if (hang_on_hit) { + pr_warn("Will dead loop on this CPU\n"); + while (1); + } + + /* Could be used to trigger kexec/kdump */ + if (panic_on_hit) + panic("SW watchpoint hit!"); + + if (check_once) + return; + } + + mod_timer(&sw_watchpoint_timer, jiffies + msecs_to_jiffies(watch_interval= _ms)); +} + +static int __init sw_watchpoint_timer_init(void) +{ + if (paddr_mmio_to_watch) { + vaddr_mmio =3D ioremap(paddr_mmio_to_watch & PAGE_MASK, PAGE_SIZE); + if (!vaddr_mmio) + return -ENOMEM; + + vaddr_mmio +=3D (paddr_mmio_to_watch % PAGE_SIZE) / 4; + } + + if (paddr_dram_to_watch) { + vaddr_dram =3D phys_to_virt(paddr_dram_to_watch); + if (!vaddr_dram) + return -ENOMEM; + } + + timer_setup(&sw_watchpoint_timer, sw_watchpoint_timer_fn, 0); + sw_watchpoint_timer.expires =3D jiffies + msecs_to_jiffies(watch_interval= _ms); + add_timer(&sw_watchpoint_timer); + + return 0; +} +core_initcall(sw_watchpoint_timer_init); +#endif base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d --=20 2.39.5 (Apple Git-154)