From nobody Tue Dec 2 02:30:06 2025 Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 753E92EBBB4 for ; Wed, 19 Nov 2025 15:45:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763567125; cv=none; b=gww0qx6HCjhntiJSi3OxTf/FQ/zihF5pVhk0Y3kT5ltmg8Vagb48A6fNMLxpv4hirSPKBMICBLVlKjlu8EQUMQ4mnZqD4zZhmJfO4klxTcPmPsNbGGVnsuDCSUVU6brhMm7RhRJw+M3q6zoxXHNnOmD3my7VypewXPJaDbFA4xc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763567125; c=relaxed/simple; bh=QqY5tRrVi6FWEAug1ZGmtO2F5zShkTvAx5jKQW4Bskg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aC0AZWJ1xUjSBQULBYxMV5LlpEHSAWxxDfB2FxJ45Gnt10Gv3v3KXE8FDfX1vLntY0b0f+KIjWsZTInL/DX8wymxMNOoYp63mpTubV88bAJ7f1Pk7bXNoM9pSkPpxttSergAcMP/uy5eHCZ6jod+T4OTsNkrTX5tYNuArERsbWY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=jyjZVzEc; arc=none smtp.client-ip=209.85.221.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="jyjZVzEc" Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-42bb288c219so3582813f8f.1 for ; Wed, 19 Nov 2025 07:45:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1763567119; x=1764171919; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7qGk+s0IRVQRO0bT8nLdtgaCkCUOcAGvIaTFPbWEFSY=; b=jyjZVzEcYKohC0QlFzh99Knd2x89PWxeRJC6e34e7YPRGB/CS5oTHIcriedZFMgjcy 97sXsc13kaT8U4/vj9kis+cc4zwbOBbH9cM6VKQw79djrMDg+VXlXlVYLfXiH4PcrI+N TR7fmNTfakXu26BBGWBPCzwbZWvbM5LbAYQrzSpxb+lZC1xfn9kDuSk16qY+IaSP5YjY L07GSsBbZV9nbiRntUw44xSARc9G7F2qg62E1H17wJTc1Zih7S8Gl//kr18DpYsHebnf kXTNbZdAG7hJOh1zXQ5tRG7X9JqgGpqaAoyTkhA84+VsFjk02Jw3sSBDRIshJXjrG8q3 uwuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763567119; x=1764171919; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=7qGk+s0IRVQRO0bT8nLdtgaCkCUOcAGvIaTFPbWEFSY=; b=i3g+G/7ACvi5N3YRrxw9rKJ9tV+Mv1mJh9d3eomKRclwnBanzLzG7Tnh5kV8AZs7LS cSWyzIX7mLNmf40tBAxSfMh07tEOiFVYYR2JNE18/Kv9Txh70ZcjJlxsydVGHdM8o9KB fhbzgQvhYycaLNcAa1kfuHvrjxF+f6teYM/yxJncRkvolB8G2c/QYX/G7C65w5/Ry51a uw3uxBBNzzt+mq/cFOE8EmQWunYgOvw7HsTiDRlkzlE3BXt4sxC9YXEHTltWVsxx3Ki+ d3Wzdxxdjvdj3Y0nUy519u8+ibiCHa95kZ8DPj03577kTTr7XroePP8HOc23VPdY+zdN XBuw== X-Forwarded-Encrypted: i=1; AJvYcCV5c1vvUMlqzhsBFc1gBh4O71Y9sl973+Lrwxdf0B3AWy5+iVTqcoRpKVec7PFXUXGXkKLdE4q15B73Rqw=@vger.kernel.org X-Gm-Message-State: AOJu0Yx353lzvFgsTMRD+2emlvmyZ/DyQACXo+F1s26hbkG9EaD26eTY FuLGLRieX2zSrVUXb4XSxhuHroTE/4ruynHRU+8CACjmH7s1xvey9JdH2FKkloilaaA= X-Gm-Gg: ASbGncuhY5DR1qDYFpAZ251q8bto1xAieUVlJ9RHa0ahWjWs92X0qJvQJHXLFK/kORS xSgWt63RgfO+EePCSMiV2ZvCaJYRCX5jNEdgKGj44SHlk2RAvO9auAdivJdwuKC2x1WrPCwRj52 TmJW3GUoqeoLUAs7gRRPGh3Vwsuw1RmMHxSrQ2YQztdJbfbtHVYVLjUzRXFC7m9Wspx13+jbcTA 8NVxcolppXI72x4A4uHJlarBUgpVhGmqvHXVFxUqiit87GB5DDu8GrvoxB4B8ZAqmlOmKh6CJ8S EsSWNyZOmgMcNA9DtUA/kNdzrrgz3PnY9iIUFJaCYDNIh5tnOUdQDH+KziwISm6qQOt1eikcmA0 fSME2armjDr04nP9aefZQFMIAaAgnEzkSQlb8K6h/ExdoQW1/bcKIzpIEmMmRAE2Y3ZA7ff4qMP 9K2GwGsK4WJnGSGg3bauE= X-Google-Smtp-Source: AGHT+IEOAIlRNytX3BFrROSCN9C1ojEnrle3QifpbuwqR2P8RkIwfO+aGW98GH73tk57TYlzz5GB2w== X-Received: by 2002:a05:6000:2890:b0:42b:3ad7:fdd4 with SMTP id ffacd0b85a97d-42b5933e3a6mr22918889f8f.3.1763567117642; Wed, 19 Nov 2025 07:45:17 -0800 (PST) Received: from eugen-station.. ([82.76.24.202]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42b53dea1c9sm38765632f8f.0.2025.11.19.07.45.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Nov 2025 07:45:17 -0800 (PST) From: Eugen Hristev To: linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, tglx@linutronix.de, andersson@kernel.org, pmladek@suse.com, rdunlap@infradead.org, corbet@lwn.net, david@redhat.com, mhocko@suse.com Cc: tudor.ambarus@linaro.org, mukesh.ojha@oss.qualcomm.com, linux-arm-kernel@lists.infradead.org, linux-hardening@vger.kernel.org, jonechou@google.com, rostedt@goodmis.org, linux-doc@vger.kernel.org, devicetree@vger.kernel.org, linux-remoteproc@vger.kernel.org, linux-arch@vger.kernel.org, tony.luck@intel.com, kees@kernel.org, Eugen Hristev Subject: [PATCH 01/26] kernel: Introduce meminspect Date: Wed, 19 Nov 2025 17:44:02 +0200 Message-ID: <20251119154427.1033475-2-eugen.hristev@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251119154427.1033475-1-eugen.hristev@linaro.org> References: <20251119154427.1033475-1-eugen.hristev@linaro.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Inspection mechanism allows registration of a specific memory area(or object) for later inspection purpose. Ranges are being added into an inspection table, which can be requested and analyzed by specific drivers. Drivers would interface any hardware mechanism that will allow inspection of the data, including but not limited to: dumping for debugging, creating a coredump, analysis, or statistical information. Drivers can register a notifier to know when new objects are registered, or to traverse existing inspection table. Inspection table is created ahead of time such that it can be later used regardless of the state of the kernel (running, frozen, crashed, or any particular state). Signed-off-by: Eugen Hristev --- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/meminspect.rst | 139 ++++++++ MAINTAINERS | 7 + include/asm-generic/vmlinux.lds.h | 13 + include/linux/meminspect.h | 261 ++++++++++++++ init/Kconfig | 2 + kernel/Makefile | 1 + kernel/meminspect/Kconfig | 20 ++ kernel/meminspect/Makefile | 3 + kernel/meminspect/meminspect.c | 470 +++++++++++++++++++++++++ 10 files changed, 917 insertions(+) create mode 100644 Documentation/dev-tools/meminspect.rst create mode 100644 include/linux/meminspect.h create mode 100644 kernel/meminspect/Kconfig create mode 100644 kernel/meminspect/Makefile create mode 100644 kernel/meminspect/meminspect.c diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/in= dex.rst index 4b8425e348ab..8ce605de8ee6 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -38,6 +38,7 @@ Documentation/process/debugging/index.rst gpio-sloppy-logic-analyzer autofdo propeller + meminspect =20 =20 .. only:: subproject and html diff --git a/Documentation/dev-tools/meminspect.rst b/Documentation/dev-too= ls/meminspect.rst new file mode 100644 index 000000000000..2a0bd4d6e448 --- /dev/null +++ b/Documentation/dev-tools/meminspect.rst @@ -0,0 +1,139 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +meminspect +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This document provides information about the meminspect feature. + +Overview +=3D=3D=3D=3D=3D=3D=3D=3D + +meminspect is a mechanism that allows the kernel to register a chunk of +memory into a table, to be used at a later time for a specific +inspection purpose like debugging, memory dumping or statistics. + +meminspect allows drivers to traverse the inspection table on demand, +or to register a notifier to be called whenever a new entry is being added +or removed. + +The reasoning for meminspect is also to minimize the required information +in case of a kernel problem. For example a traditional debug method involv= es +dumping the whole kernel memory and then inspecting it. Meminspect allows = the +users to select which memory is of interest, in order to help this specific +use case in production, where memory and connectivity are limited. + +Although the kernel has multiple internal mechanisms, meminspect fits +a particular model which is not covered by the others. + +meminspect Internals +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +API +--- + +Static memory can be registered at compile time, by instructing the compil= er +to create a separate section with annotation info. +For each such annotated memory (variables usually), a dedicated struct +is being created with the required information. +To achieve this goal, some basic APIs are available: + + MEMINSPECT_ENTRY(idx, sym, sz) +is the basic macro that takes an ID, the symbol, and a size. + +To make it easier, some wrappers are also defined: + MEMINSPECT_SIMPLE_ENTRY(sym) +will use the dedicated MEMINSPECT_ID_##sym with a size equal to sizeof(sym) + + MEMINSPECT_NAMED_ENTRY(name, sym) +will be a simple entry that has an id that cannot be derived from the sym, +so a name has to be provided + + MEMINSPECT_AREA_ENTRY(sym, sz) +this will register sym, but with the size given as sz, useful for e.g. +arrays which do not have a fixed size at compile time. + +For dynamically allocated memory, or for other cases, the following APIs +are being defined: + meminspect_register_id_pa(enum meminspect_uid id, phys_addr_t zone, + size_t size, unsigned int type); +which takes the ID and the physical address. +Similarly there are variations: + meminspect_register_pa() omits the ID + meminspect_register_id_va() requires the ID but takes a virtual address + meminspect_register_va() omits the ID and requires a virtual address + +If the ID is not given, the next avialable dynamic ID is allocated. + +To unregister a dynamic entry, some APIs are being defined: + meminspect_unregister_pa(phys_addr_t zone, size_t size); + meminspect_unregister_id(enum meminspect_uid id); + meminspect_unregister_va(va, size); + +All of the above have a lock variant that ensures the lock on the table +is taken. + + +meminspect drivers +------------------ + +Drivers are free to traverse the table by using a dedicated function +meminspect_traverse(void *priv, MEMINSPECT_ITERATOR_CB cb) +The callback will be called for each entry in the table. + +Drivers can also register a notifier with + meminspect_notifier_register() +and unregister with + meminspect_notifier_unregister() +to be called when a new entry is being added or removed. + +Data structures +--------------- + +The regions are being stored in a simple fixed size array. It avoids +memory allocation overhead. This is not performance critical nor does +allocating a few hundred entries create a memory consumption problem. + +The static variables registered into meminspect are being annotated into +a dedicated .inspect_table memory section. This is then walked by meminspe= ct +at a later time and each variable is then copied to the whole inspect tabl= e. + +meminspect Initialization +------------------------- + +At any time, meminspect will be ready to accept region registration +from any part of the kernel. The table does not require any initialization. +In case CONFIG_CRASH_DUMP is enabled, meminspect will create an ELF header +corresponding to a core dump image, in which each region is added as a +program header. In this scenario, the first region is this ELF header, and +the second region is the vmcoreinfo ELF note. +By using this mechanism, all the meminspect table, if dumped, can be +concatenated to obtain a core image that is loadable with the `crash` tool. + +meminspect example +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +A simple scenario for meminspect is the following: +The kernel registers the linux_banner variable into meminspect with +a simple annotation like: + + MEMINSPECT_SIMPLE_ENTRY(linux_banner); + +The meminspect late initcall will parse the compilation time created table +and copy the entry information into the inspection table. +At a later point, any interested driver can call the traverse function to +find out all entries in the table. +A specific driver will then note into a specific table the address of the +banner and the size of it. +The specific table is then written to a shared memory area that can be +read by upper level firmware. +When the kernel freezes (hypothetically), the kernel will no longer feed +the watchdog. The watchdog will trigger a higher exception level interrupt +which will be handled by the upper level firmware. This firmware will then +read the shared memory table and find an entry with the start and size of +the banner. It will then copy it for debugging purpose. The upper level +firmware will then be able to provide useful debugging information, +like in this example, the banner. + +As seen here, meminspect facilitates the interaction between the kernel +and a specific firmware. diff --git a/MAINTAINERS b/MAINTAINERS index 545a4776795e..2cb2cc427c90 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16157,6 +16157,13 @@ F: arch/*/include/asm/sync_core.h F: include/uapi/linux/membarrier.h F: kernel/sched/membarrier.c =20 +MEMINSPECT +M: Eugen Hristev +S: Maintained +F: Documentation/dev-tools/meminspect.rst +F: include/linux/meminspect.h +F: kernel/meminspect/* + MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION M: Mike Rapoport L: linux-mm@kvack.org diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinu= x.lds.h index 8a9a2e732a65..713135d72c34 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -489,6 +489,8 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELL= ER_CLANG) FW_LOADER_BUILT_IN_DATA \ TRACEDATA \ \ + MEMINSPECT_TABLE \ + \ PRINTK_INDEX \ \ /* Kernel symbol table: Normal symbols */ \ @@ -893,6 +895,17 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPEL= LER_CLANG) #define TRACEDATA #endif =20 +#ifdef CONFIG_MEMINSPECT +#define MEMINSPECT_TABLE \ + . =3D ALIGN(8); \ + .inspect_table : AT(ADDR(.inspect_table) - LOAD_OFFSET) { \ + BOUNDED_SECTION_POST_LABEL(.inspect_table, \ + __inspect_table,, _end) \ + } +#else +#define MEMINSPECT_TABLE +#endif + #ifdef CONFIG_PRINTK_INDEX #define PRINTK_INDEX \ .printk_index : AT(ADDR(.printk_index) - LOAD_OFFSET) { \ diff --git a/include/linux/meminspect.h b/include/linux/meminspect.h new file mode 100644 index 000000000000..e58b00079156 --- /dev/null +++ b/include/linux/meminspect.h @@ -0,0 +1,261 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _MEMINSPECT_H +#define _MEMINSPECT_H + +#include + +enum meminspect_uid { + MEMINSPECT_ID_STATIC =3D 0, + MEMINSPECT_ID_ELF, + MEMINSPECT_ID_VMCOREINFO, + MEMINSPECT_ID_CONFIG, + MEMINSPECT_ID__totalram_pages, + MEMINSPECT_ID___cpu_possible_mask, + MEMINSPECT_ID___cpu_present_mask, + MEMINSPECT_ID___cpu_online_mask, + MEMINSPECT_ID___cpu_active_mask, + MEMINSPECT_ID_mem_section, + MEMINSPECT_ID_jiffies_64, + MEMINSPECT_ID_linux_banner, + MEMINSPECT_ID_nr_threads, + MEMINSPECT_ID_nr_irqs, + MEMINSPECT_ID_tainted_mask, + MEMINSPECT_ID_taint_flags, + MEMINSPECT_ID_node_states, + MEMINSPECT_ID___per_cpu_offset, + MEMINSPECT_ID_nr_swapfiles, + MEMINSPECT_ID_init_uts_ns, + MEMINSPECT_ID_printk_rb_static, + MEMINSPECT_ID_printk_rb_dynamic, + MEMINSPECT_ID_prb, + MEMINSPECT_ID_prb_descs, + MEMINSPECT_ID_prb_infos, + MEMINSPECT_ID_prb_data, + MEMINSPECT_ID_high_memory, + MEMINSPECT_ID_init_mm, + MEMINSPECT_ID_init_mm_pgd, + MEMINSPECT_ID__sinittext, + MEMINSPECT_ID__einittext, + MEMINSPECT_ID__end, + MEMINSPECT_ID__text, + MEMINSPECT_ID__stext, + MEMINSPECT_ID__etext, + MEMINSPECT_ID_kallsyms_num_syms, + MEMINSPECT_ID_kallsyms_relative_base, + MEMINSPECT_ID_kallsyms_offsets, + MEMINSPECT_ID_kallsyms_names, + MEMINSPECT_ID_kallsyms_token_table, + MEMINSPECT_ID_kallsyms_token_index, + MEMINSPECT_ID_kallsyms_markers, + MEMINSPECT_ID_kallsyms_seqs_of_names, + MEMINSPECT_ID_swapper_pg_dir, + MEMINSPECT_ID_DYNAMIC, + MEMINSPECT_ID_MAX =3D 201, +}; + +#define MEMINSPECT_TYPE_REGULAR 0 + +#define MEMINSPECT_NOTIFIER_ADD 0 +#define MEMINSPECT_NOTIFIER_REMOVE 1 + +/** + * struct inspect_entry - memory inspect entry information + * @id: unique id for this entry + * @va: virtual address for the memory (pointer) + * @pa: physical address for the memory + * @size: size of the memory area of this entry + * @type: type of the entry (class) + */ +struct inspect_entry { + enum meminspect_uid id; + void *va; + phys_addr_t pa; + size_t size; + unsigned int type; +}; + +typedef void (*MEMINSPECT_ITERATOR_CB)(void *priv, const struct inspect_en= try *ie); + +#ifdef CONFIG_MEMINSPECT +/* .inspect_table section table markers*/ +extern const struct inspect_entry __inspect_table[]; +extern const struct inspect_entry __inspect_table_end[]; + +/* + * Annotate a static variable into inspection table. + * Can be called multiple times for the same ID, in which case + * multiple table entries will be created + */ +#define MEMINSPECT_ENTRY(idx, sym, sz) \ + static const struct inspect_entry __UNIQUE_ID(__inspect_entry_##idx) \ + __used __section(".inspect_table") =3D { .id =3D idx, \ + .va =3D (void *)&(sym), \ + .size =3D (sz), \ + } +/* + * A simple entry is just a variable, the size of the entry is the variabl= e size + * The variable can also be a pointer, the pointer itself is being added i= n this + * case. + */ +#define MEMINSPECT_SIMPLE_ENTRY(sym) \ + MEMINSPECT_ENTRY(MEMINSPECT_ID_##sym, sym, sizeof(sym)) +/* + * In the case when `sym` is not a variable, but a member of a struct e.g., + * and we cannot derive a name from it, a name must be provided. + */ +#define MEMINSPECT_NAMED_ENTRY(name, sym) \ + MEMINSPECT_ENTRY(MEMINSPECT_ID_##name, sym, sizeof(sym)) +/* + * Create a more complex entry, by registering an arbitrary memory starting + * at sym. The size is provided as a parameter. + * This is used e.g. when the symbol is a start of an unknown sized array. + */ +#define MEMINSPECT_AREA_ENTRY(sym, sz) \ + MEMINSPECT_ENTRY(MEMINSPECT_ID_##sym, sym, sz) + +/* Iterate through .inspect_table section entries */ +#define for_each_meminspect_entry(__entry) \ + for (__entry =3D __inspect_table; \ + __entry < __inspect_table_end; \ + __entry++) + +#else +#define MEMINSPECT_ENTRY(...) +#define MEMINSPECT_SIMPLE_ENTRY(...) +#define MEMINSPECT_NAMED_ENTRY(...) +#define MEMINSPECT_AREA_ENTRY(...) +#endif + +#ifdef CONFIG_MEMINSPECT + +/* + * Dynamic helpers to register entries. + * These do not lock the table, so use with caution. + */ +void meminspect_register_id_pa(enum meminspect_uid id, phys_addr_t zone, + size_t size, unsigned int type); +void meminspect_table_lock(void); +void meminspect_table_unlock(void); + +#define meminspect_register_pa(...) \ + meminspect_register_id_pa(MEMINSPECT_ID_DYNAMIC, __VA_ARGS__, MEMINSPECT_= TYPE_REGULAR) + +#define meminspect_register_id_va(id, va, size) \ + meminspect_register_id_pa(id, virt_to_phys(va), size, MEMINSPECT_TYPE_REG= ULAR) + +#define meminspect_register_va(...) \ + meminspect_register_id_va(MEMINSPECT_ID_DYNAMIC, __VA_ARGS__) + +void meminspect_unregister_pa(phys_addr_t zone, size_t size); +void meminspect_unregister_id(enum meminspect_uid id); + +#define meminspect_unregister_va(va, size) \ + meminspect_unregister_pa(virt_to_phys(va), size) + +void meminspect_traverse(void *priv, MEMINSPECT_ITERATOR_CB cb); + +/* + * Producers, or registrators, are advised to use the locked API below + */ +#define meminspect_lock_register_pa(...) \ + { \ + meminspect_table_lock(); \ + meminspect_register_pa(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_register_id_va(...) \ + { \ + meminspect_table_lock(); \ + meminspect_register_id_va(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_register_va(...) \ + { \ + meminspect_table_lock(); \ + meminspect_register_va(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_unregister_pa(...) \ + { \ + meminspect_table_lock(); \ + meminspect_unregister_pa(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_unregister_va(...) \ + { \ + meminspect_table_lock(); \ + meminspect_unregister_va(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_unregister_id(...) \ + { \ + meminspect_table_lock(); \ + meminspect_unregister_id(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +#define meminspect_lock_traverse(...) \ + { \ + meminspect_table_lock(); \ + meminspect_traverse(__VA_ARGS__); \ + meminspect_table_unlock(); \ + } + +int meminspect_notifier_register(struct notifier_block *n); +int meminspect_notifier_unregister(struct notifier_block *n); + +#else +static inline void meminspect_register_id_pa(enum meminspect_uid id, + phys_addr_t zone, + size_t size, unsigned int type) +{ +} + +static inline void meminspect_table_lock(void) +{ +} + +static inline void meminspect_table_unlock(void) +{ +} + +static inline void meminspect_unregister(phys_addr_t zone, size_t size) +{ +} + +static inline void meminspect_unregister_id(enum meminspect_uid id) +{ +} + +static inline void meminspect_traverse(MEMINSPECT_ITERATOR_CB cb) +{ +} + +static inline int meminspect_notifier_register(struct notifier_block *n) +{ + return 0; +} + +static inline int meminspect_notifier_unregister(struct notifier_block *n) +{ + return 0; +} + +#define meminspect_register_pa(...) +#define meminspect_register_id_va(...) +#define meminspect_register_va(...) +#define meminspect_lock_register_pa(...) +#define meminspect_lock_register_va(...) +#define meminspect_lock_register_id_va(...) +#define meminspect_lock_traverse(...) +#define meminspect_lock_unregister_va(...) +#define meminspect_lock_unregister_pa(...) +#define meminspect_lock_unregister_id(...) +#endif + +#endif diff --git a/init/Kconfig b/init/Kconfig index cab3ad28ca49..d48647419944 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2138,6 +2138,8 @@ config TRACEPOINTS =20 source "kernel/Kconfig.kexec" =20 +source "kernel/meminspect/Kconfig" + endmenu # General setup =20 source "arch/Kconfig" diff --git a/kernel/Makefile b/kernel/Makefile index df3dd8291bb6..83ec5310dfd1 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -50,6 +50,7 @@ obj-y +=3D locking/ obj-y +=3D power/ obj-y +=3D printk/ obj-y +=3D irq/ +obj-y +=3D meminspect/ obj-y +=3D rcu/ obj-y +=3D livepatch/ obj-y +=3D dma/ diff --git a/kernel/meminspect/Kconfig b/kernel/meminspect/Kconfig new file mode 100644 index 000000000000..8680fbf0e285 --- /dev/null +++ b/kernel/meminspect/Kconfig @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: GPL-2.0 + +config MEMINSPECT + bool "Allow the kernel to register memory regions for inspection purpose" + help + Inspection mechanism allows registration of a specific memory + area(or object) for later inspection purpose. + Ranges are being added into an inspection table, which can be + requested and analyzed by specific drivers. + Drivers would interface any hardware mechanism that will allow + inspection of the data, including but not limited to: dumping + for debugging, creating a coredump, analysis, or statistical + information. + Inspection table is created ahead of time such that it can be later + used regardless of the state of the kernel (running, frozen, crashed, + or any particular state). + + Note that modules using this feature must be rebuilt if option + changes. + diff --git a/kernel/meminspect/Makefile b/kernel/meminspect/Makefile new file mode 100644 index 000000000000..09fd55e6d9cf --- /dev/null +++ b/kernel/meminspect/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_MEMINSPECT) +=3D meminspect.o diff --git a/kernel/meminspect/meminspect.c b/kernel/meminspect/meminspect.c new file mode 100644 index 000000000000..0d9ad65ba92e --- /dev/null +++ b/kernel/meminspect/meminspect.c @@ -0,0 +1,470 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include + +static DEFINE_MUTEX(meminspect_lock); +static struct inspect_entry inspect_entries[MEMINSPECT_ID_MAX]; + +ATOMIC_NOTIFIER_HEAD(meminspect_notifier_list); + +#ifdef CONFIG_CRASH_DUMP + +#define CORE_STR "CORE" + +static struct elfhdr *ehdr; +static size_t elf_offset; +static bool elf_hdr_ready; + +static void append_kcore_note(char *notes, size_t *i, const char *name, + unsigned int type, const void *desc, + size_t descsz) +{ + struct elf_note *note =3D (struct elf_note *)¬es[*i]; + + note->n_namesz =3D strlen(name) + 1; + note->n_descsz =3D descsz; + note->n_type =3D type; + *i +=3D sizeof(*note); + memcpy(¬es[*i], name, note->n_namesz); + *i =3D ALIGN(*i + note->n_namesz, 4); + memcpy(¬es[*i], desc, descsz); + *i =3D ALIGN(*i + descsz, 4); +} + +static void append_kcore_note_nodesc(char *notes, size_t *i, const char *n= ame, + unsigned int type, size_t descsz) +{ + struct elf_note *note =3D (struct elf_note *)¬es[*i]; + + note->n_namesz =3D strlen(name) + 1; + note->n_descsz =3D descsz; + note->n_type =3D type; + *i +=3D sizeof(*note); + memcpy(¬es[*i], name, note->n_namesz); + *i =3D ALIGN(*i + note->n_namesz, 4); +} + +static struct elf_phdr *elf_phdr_entry_addr(struct elfhdr *ehdr, int idx) +{ + struct elf_phdr *ephdr =3D (struct elf_phdr *)((size_t)ehdr + ehdr->e_pho= ff); + + return &ephdr[idx]; +} + +static int clear_elfheader(const struct inspect_entry *e) +{ + struct elf_phdr *phdr; + struct elf_phdr *tmp_phdr; + unsigned int phidx; + unsigned int i; + + for (i =3D 0; i < ehdr->e_phnum; i++) { + phdr =3D elf_phdr_entry_addr(ehdr, i); + if (phdr->p_paddr =3D=3D e->pa && + phdr->p_memsz =3D=3D ALIGN(e->size, 4)) + break; + } + + if (i =3D=3D ehdr->e_phnum) { + pr_debug("Cannot find program header entry in elf\n"); + return -EINVAL; + } + + phidx =3D i; + + /* Clear program header */ + tmp_phdr =3D elf_phdr_entry_addr(ehdr, phidx); + for (i =3D phidx; i < ehdr->e_phnum - 1; i++) { + tmp_phdr =3D elf_phdr_entry_addr(ehdr, i + 1); + phdr =3D elf_phdr_entry_addr(ehdr, i); + memcpy(phdr, tmp_phdr, sizeof(*phdr)); + phdr->p_offset =3D phdr->p_offset - ALIGN(e->size, 4); + } + memset(tmp_phdr, 0, sizeof(*tmp_phdr)); + ehdr->e_phnum--; + + elf_offset -=3D ALIGN(e->size, 4); + + return 0; +} + +static void update_elfheader(const struct inspect_entry *e) +{ + struct elf_phdr *phdr; + + phdr =3D elf_phdr_entry_addr(ehdr, ehdr->e_phnum++); + + phdr->p_type =3D PT_LOAD; + phdr->p_offset =3D elf_offset; + phdr->p_vaddr =3D (elf_addr_t)e->va; + if (e->pa) + phdr->p_paddr =3D (elf_addr_t)e->pa; + else + phdr->p_paddr =3D (elf_addr_t)virt_to_phys(e->va); + phdr->p_filesz =3D phdr->p_memsz =3D ALIGN(e->size, 4); + phdr->p_flags =3D PF_R | PF_W; + + elf_offset +=3D ALIGN(e->size, 4); +} + +/* + * This function prepares the elf header for the coredump image. + * Initially there is a single program header for the elf NOTE. + * The note contains the usual core dump information, and the vmcoreinfo. + */ +static int init_elfheader(void) +{ + struct elf_phdr *phdr; + void *notes; + unsigned int elfh_size, buf_sz; + unsigned int phdr_off; + size_t note_len, i =3D 0; + struct page *p; + + struct elf_prstatus prstatus =3D {}; + struct elf_prpsinfo prpsinfo =3D { + .pr_sname =3D 'R', + .pr_fname =3D "vmlinux", + }; + + /* + * Header buffer contains: + * ELF header, Note entry with PR status, PR ps info, and vmcoreinfo. + * Also, MEMINSPECT_ID_MAX program headers. + */ + elfh_size =3D sizeof(*ehdr); + elfh_size +=3D sizeof(struct elf_prstatus); + elfh_size +=3D sizeof(struct elf_prpsinfo); + elfh_size +=3D sizeof(VMCOREINFO_NOTE_NAME); + elfh_size +=3D ALIGN(vmcoreinfo_size, 4); + elfh_size +=3D (sizeof(*phdr)) * (MEMINSPECT_ID_MAX); + + elfh_size =3D ALIGN(elfh_size, 4); + + /* Length of the note is made of : + * 3 elf notes structs (prstatus, prpsinfo, vmcoreinfo) + * 3 notes names (2 core strings, 1 vmcoreinfo name) + * sizeof each note + */ + note_len =3D (3 * sizeof(struct elf_note) + + 2 * ALIGN(sizeof(CORE_STR), 4) + + VMCOREINFO_NOTE_NAME_BYTES + + ALIGN(sizeof(struct elf_prstatus), 4) + + ALIGN(sizeof(struct elf_prpsinfo), 4) + + ALIGN(vmcoreinfo_size, 4)); + + buf_sz =3D elfh_size + note_len - ALIGN(vmcoreinfo_size, 4); + + /* Never freed */ + p =3D dma_alloc_from_contiguous(NULL, buf_sz >> PAGE_SHIFT, + get_order(buf_sz), true); + if (!p) + return -ENOMEM; + + ehdr =3D dma_common_contiguous_remap(p, buf_sz, + pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL)), + __builtin_return_address(0)); + if (!ehdr) { + dma_release_from_contiguous(NULL, p, buf_sz >> PAGE_SHIFT); + return -ENOMEM; + } + + memset(ehdr, 0, elfh_size); + + /* Assign Program headers offset, it's right after the elf header. */ + phdr =3D (struct elf_phdr *)(ehdr + 1); + phdr_off =3D sizeof(*ehdr); + + memcpy(ehdr->e_ident, ELFMAG, SELFMAG); + ehdr->e_ident[EI_CLASS] =3D ELF_CLASS; + ehdr->e_ident[EI_DATA] =3D ELF_DATA; + ehdr->e_ident[EI_VERSION] =3D EV_CURRENT; + ehdr->e_ident[EI_OSABI] =3D ELF_OSABI; + ehdr->e_type =3D ET_CORE; + ehdr->e_machine =3D ELF_ARCH; + ehdr->e_version =3D EV_CURRENT; + ehdr->e_ehsize =3D sizeof(*ehdr); + ehdr->e_phentsize =3D sizeof(*phdr); + + elf_offset =3D elfh_size; + + notes =3D (void *)(((char *)ehdr) + elf_offset); + + /* we have a single program header now */ + ehdr->e_phnum =3D 1; + + phdr->p_type =3D PT_NOTE; + phdr->p_offset =3D elf_offset; + phdr->p_filesz =3D note_len; + + /* advance elf offset */ + elf_offset +=3D note_len; + + strscpy(prpsinfo.pr_psargs, saved_command_line, + sizeof(prpsinfo.pr_psargs)); + + append_kcore_note(notes, &i, CORE_STR, NT_PRSTATUS, &prstatus, + sizeof(prstatus)); + append_kcore_note(notes, &i, CORE_STR, NT_PRPSINFO, &prpsinfo, + sizeof(prpsinfo)); + append_kcore_note_nodesc(notes, &i, VMCOREINFO_NOTE_NAME, 0, + ALIGN(vmcoreinfo_size, 4)); + + ehdr->e_phoff =3D phdr_off; + + /* This is the first coredump region, the ELF header */ + meminspect_register_id_pa(MEMINSPECT_ID_ELF, page_to_phys(p), + buf_sz, MEMINSPECT_TYPE_REGULAR); + + /* + * The second region is the vmcoreinfo, which goes right after. + * It's being registered through vmcoreinfo. + */ + + return 0; +} +#endif + +/** + * meminspect_unregister_id() - Unregister region from inspection table. + * @id: region's id in the table + * + * Return: None + */ +void meminspect_unregister_id(enum meminspect_uid id) +{ + struct inspect_entry *e; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + e =3D &inspect_entries[id]; + if (!e->id) + return; + + atomic_notifier_call_chain(&meminspect_notifier_list, + MEMINSPECT_NOTIFIER_REMOVE, e); +#ifdef CONFIG_CRASH_DUMP + if (elf_hdr_ready) + clear_elfheader(e); +#endif + memset(e, 0, sizeof(*e)); +} +EXPORT_SYMBOL_GPL(meminspect_unregister_id); + +/** + * meminspect_unregister_pa() - Unregister region from inspection table. + * @pa: Physical address of the memory region to remove + * @size: Size of the memory region to remove + * + * Return: None + */ +void meminspect_unregister_pa(phys_addr_t pa, size_t size) +{ + struct inspect_entry *e; + enum meminspect_uid i; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + for (i =3D MEMINSPECT_ID_STATIC; i < MEMINSPECT_ID_MAX; i++) { + e =3D &inspect_entries[i]; + if (e->pa !=3D pa) + continue; + if (e->size !=3D size) + continue; + meminspect_unregister_id(e->id); + return; + } +} +EXPORT_SYMBOL_GPL(meminspect_unregister_pa); + +/** + * meminspect_register_id_pa() - Register region into inspection table + * with given ID and physical address. + * @req_id: Requested unique meminspect_uid that identifies the region + * This can be MEMINSPECT_ID_DYNAMIC, in which case the function will + * find an unused ID and register with it. + * @pa: physical address of the memory region + * @size: region size + * @type: region type + * + * Return: None + */ +void meminspect_register_id_pa(enum meminspect_uid req_id, phys_addr_t pa, + size_t size, unsigned int type) +{ + struct inspect_entry *e; + enum meminspect_uid uid =3D req_id; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + if (uid < MEMINSPECT_ID_STATIC || uid >=3D MEMINSPECT_ID_MAX) + return; + + if (uid =3D=3D MEMINSPECT_ID_DYNAMIC) + while (uid < MEMINSPECT_ID_MAX) { + if (!inspect_entries[uid].id) + break; + uid++; + } + + if (uid =3D=3D MEMINSPECT_ID_MAX) + return; + + e =3D &inspect_entries[uid]; + + if (e->id) + meminspect_unregister_id(e->id); + + e->pa =3D pa; + e->va =3D phys_to_virt(pa); + e->size =3D size; + e->id =3D uid; + e->type =3D type; +#ifdef CONFIG_CRASH_DUMP + if (elf_hdr_ready) + update_elfheader(e); +#endif + atomic_notifier_call_chain(&meminspect_notifier_list, + MEMINSPECT_NOTIFIER_ADD, e); +} +EXPORT_SYMBOL_GPL(meminspect_register_id_pa); + +/** + * meminspect_table_lock() - Lock the mutex on the inspection table + * + * Return: None + */ +void meminspect_table_lock(void) +{ + mutex_lock(&meminspect_lock); +} +EXPORT_SYMBOL_GPL(meminspect_table_lock); + +/** + * meminspect_table_unlock() - Unlock the mutex on the inspection table + * + * Return: None + */ +void meminspect_table_unlock(void) +{ + mutex_unlock(&meminspect_lock); +} +EXPORT_SYMBOL_GPL(meminspect_table_unlock); + +/** + * meminspect_traverse() - Traverse the meminspect table and call the + * callback function for each valid entry. + * @priv: private data to be called to the callback + * @cb: meminspect iterator callback that should be called for each entry + * + * Return: None + */ +void meminspect_traverse(void *priv, MEMINSPECT_ITERATOR_CB cb) +{ + const struct inspect_entry *e; + int i; + + WARN_ON(!mutex_is_locked(&meminspect_lock)); + + for (i =3D MEMINSPECT_ID_STATIC; i < MEMINSPECT_ID_MAX; i++) { + e =3D &inspect_entries[i]; + if (e->id) + cb(priv, e); + } +} +EXPORT_SYMBOL_GPL(meminspect_traverse); + +/** + * meminspect_notifier_register() - Register a notifier to meminspect table + * @n: notifier block to register. This will be called whenever an entry + * is being added or removed. + * + * Return: errno + */ +int meminspect_notifier_register(struct notifier_block *n) +{ + return atomic_notifier_chain_register(&meminspect_notifier_list, n); +} +EXPORT_SYMBOL_GPL(meminspect_notifier_register); + +/** + * meminspect_notifier_unregister() - Unregister a previously registered + * notifier from meminspect table. + * @n: notifier block to unregister. + * + * Return: errno + */ +int meminspect_notifier_unregister(struct notifier_block *n) +{ + return atomic_notifier_chain_unregister(&meminspect_notifier_list, n); +} +EXPORT_SYMBOL_GPL(meminspect_notifier_unregister); + +#ifdef CONFIG_CRASH_DUMP +static int __init meminspect_prepare_crashdump(void) +{ + const struct inspect_entry *e; + int ret; + enum meminspect_uid i; + + ret =3D init_elfheader(); + + if (ret < 0) + return ret; + + /* + * Some regions may have been registered very early. + * Update the elf header for all existing regions, + * except for MEMINSPECT_ID_ELF and MEMINSPECT_ID_VMCOREINFO, + * those are included in the ELF header upon its creation. + */ + for (i =3D MEMINSPECT_ID_VMCOREINFO + 1; i < MEMINSPECT_ID_MAX; i++) { + e =3D &inspect_entries[i]; + if (e->id) + update_elfheader(e); + } + + elf_hdr_ready =3D true; + + return 0; +} +#endif + +static int __init meminspect_prepare_table(void) +{ + const struct inspect_entry *e; + enum meminspect_uid i; + + meminspect_table_lock(); + /* + * First, copy all entries from the compiler built table + * In case some entries are registered multiple times, + * the last chronological entry will be stored. + * Previusly registered entries will be dropped. + */ + for_each_meminspect_entry(e) { + inspect_entries[e->id] =3D *e; + } +#ifdef CONFIG_CRASH_DUMP + meminspect_prepare_crashdump(); +#endif + /* if we have early notifiers registered, call them now */ + for (i =3D MEMINSPECT_ID_STATIC; i < MEMINSPECT_ID_MAX; i++) + if (inspect_entries[i].id) + atomic_notifier_call_chain(&meminspect_notifier_list, + MEMINSPECT_NOTIFIER_ADD, + &inspect_entries[i]); + meminspect_table_unlock(); + + pr_debug("Memory inspection table initialized"); + + return 0; +} +late_initcall(meminspect_prepare_table); --=20 2.43.0