From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31C3F2DC339;
	Thu, 18 Dec 2025 20:42:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090567; cv=none;
 b=P1pdwJoeNgiBvUx2Q52uTMNAzFN0VnF6gFzpuKXd3ejpmqgLA4/rb3hpeJiKVedUoH5qyPYynSusKVqpetDQYbS08tzk/HJaLeePbpvTpFkpM/3US9ehVXlE9wZrU/6pfSVGdyy0oiTup8dM3+LoK1A5xOYCU0RRjAQJ/PAjH2c=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090567; c=relaxed/simple;
	bh=9NMc/Q7aNJkd1GmaQWWzsh1IwkmCBCQ1oNUHN/qbtOE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=SR5BUoLt7NIbThXrTjmBTCn0fz15tQwGn0AwO70Uj71IY6Umh+fuOaREQSjTlw19QcundvI2rGvOz0CRux88Wl1bhcOZ9MEMMoUiamAt2CGkX5vGjwvZuXmqwVFBSLI0Gnk6+TJNe+iFI69OeCHvte3bVcVSoC/uz9bWxfEa8vk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=UwiyQujg; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="UwiyQujg"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3EB38C16AAE;
	Thu, 18 Dec 2025 20:42:46 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090567;
	bh=9NMc/Q7aNJkd1GmaQWWzsh1IwkmCBCQ1oNUHN/qbtOE=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=UwiyQujglDn+exdsGTSJv/gyPtwLNMpqMNpsqEvJU3/EizlQD+LVkqxRtqis2T/Un
	 W2ZDP80RXvrEl45b2pqZ3gTtVOntnL6L0DSDlXzCsD1v7gYvT4qG5ZtbdUwfmNiPPD
	 Lfvse18ZGg6bMwV82idsOLqqYvZ5gIEP/9GmV717WDF+rrxAtpDUpxTHQ4ltpC/avq
	 vD3dqU8MMxrAWBt01aDHipTiZLxlts7asjkea6Txl2ql/CFFJlyDf0t5EiTuiOGIbT
	 W4n+bShvpAU6tiyvqb+z41JmpPIGqv44vq5A/yg4HM9nMsfyZS+J8UDsP1hz1/GP4H
	 c/AJRaxhK8RBg==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 01/15] kernel/api: introduce kernel API specification
 framework
Date: Thu, 18 Dec 2025 15:42:23 -0500
Message-ID: <20251218204239.4159453-2-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Add a framework for formally documenting kernel APIs with inline
specifications. This framework provides:

- Structured API documentation with parameter specifications, return
  values, error conditions, and execution context requirements
- Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS)
- Export of specifications via debugfs for tooling integration
- Support for both internal kernel APIs and system calls

The framework stores specifications in a dedicated ELF section and
provides infrastructure for:
- Compile-time validation of specifications
- Runtime querying of API documentation
- Machine-readable export formats
- Integration with existing SYSCALL_DEFINE macros

This commit introduces the core infrastructure without modifying any
existing APIs. Subsequent patches will add specifications to individual
subsystems.

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .gitignore                                  |    1 +
 Documentation/dev-tools/kernel-api-spec.rst |  507 ++++++
 MAINTAINERS                                 |    9 +
 include/asm-generic/vmlinux.lds.h           |   28 +
 include/linux/kernel_api_spec.h             | 1597 +++++++++++++++++++
 include/linux/syscall_api_spec.h            |  198 +++
 include/linux/syscalls.h                    |   38 +
 init/Kconfig                                |    2 +
 kernel/Makefile                             |    3 +
 kernel/api/Kconfig                          |   35 +
 kernel/api/Makefile                         |   29 +
 kernel/api/kernel_api_spec.c                | 1185 ++++++++++++++
 scripts/generate_api_specs.sh               |   18 +
 13 files changed, 3650 insertions(+)
 create mode 100644 Documentation/dev-tools/kernel-api-spec.rst
 create mode 100644 include/linux/kernel_api_spec.h
 create mode 100644 include/linux/syscall_api_spec.h
 create mode 100644 kernel/api/Kconfig
 create mode 100644 kernel/api/Makefile
 create mode 100644 kernel/api/kernel_api_spec.c
 create mode 100755 scripts/generate_api_specs.sh

diff --git a/.gitignore b/.gitignore
index 3a7241c941f5e..7130001e444f1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -12,6 +12,7 @@
 #
 .*
 *.a
+*.apispec.h
 *.asn1.[ch]
 *.bin
 *.bz2
diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/de=
v-tools/kernel-api-spec.rst
new file mode 100644
index 0000000000000..3a63f6711e27b
--- /dev/null
+++ b/Documentation/dev-tools/kernel-api-spec.rst
@@ -0,0 +1,507 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Kernel API Specification Framework
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+:Author: Sasha Levin <sashal@kernel.org>
+:Date: June 2025
+
+.. contents:: Table of Contents
+   :depth: 3
+   :local:
+
+Introduction
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The Kernel API Specification Framework (KAPI) provides a comprehensive sys=
tem for
+formally documenting, validating, and introspecting kernel APIs. This fram=
ework
+addresses the long-standing challenge of maintaining accurate, machine-rea=
dable
+documentation for the thousands of internal kernel APIs and system calls.
+
+Purpose and Goals
+-----------------
+
+The framework aims to:
+
+1. **Improve API Documentation**: Provide structured, inline documentation=
 that
+   lives alongside the code and is maintained as part of the development p=
rocess.
+
+2. **Enable Runtime Validation**: Optionally validate API usage at runtime=
 to catch
+   common programming errors during development and testing.
+
+3. **Support Tooling**: Export API specifications in machine-readable form=
ats for
+   use by static analyzers, documentation generators, and development tool=
s.
+
+4. **Enhance Debugging**: Provide detailed API information at runtime thro=
ugh debugfs
+   for debugging and introspection.
+
+5. **Formalize Contracts**: Explicitly document API contracts including pa=
rameter
+   constraints, execution contexts, locking requirements, and side effects.
+
+Architecture Overview
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Components
+----------
+
+The framework consists of several key components:
+
+1. **Core Framework** (``kernel/api/kernel_api_spec.c``)
+
+   - API specification registration and storage
+   - Runtime validation engine
+   - Specification lookup and querying
+
+2. **DebugFS Interface** (``kernel/api/kapi_debugfs.c``)
+
+   - Runtime introspection via ``/sys/kernel/debug/kapi/``
+   - JSON and XML export formats
+   - Per-API detailed information
+
+3. **IOCTL Support** (``kernel/api/ioctl_validation.c``)
+
+   - Extended framework for IOCTL specifications
+   - Automatic validation wrappers
+   - Structure field validation
+
+4. **Specification Macros** (``include/linux/kernel_api_spec.h``)
+
+   - Declarative macros for API documentation
+   - Type-safe parameter specifications
+   - Context and constraint definitions
+
+Data Model
+----------
+
+The framework uses a hierarchical data model::
+
+    kernel_api_spec
+    =E2=94=9C=E2=94=80=E2=94=80 Basic Information
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 name (API function name)
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 version (specification version)
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 description (human-readable de=
scription)
+    =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 kernel_version (when API was i=
ntroduced)
+    =E2=94=82
+    =E2=94=9C=E2=94=80=E2=94=80 Parameters (up to 16)
+    =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 kapi_param_spec
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 name
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 type (int, pointer, string=
, etc.)
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 direction (in, out, inout)
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 constraints (range, mask, =
enum values)
+    =E2=94=82       =E2=94=94=E2=94=80=E2=94=80 validation rules
+    =E2=94=82
+    =E2=94=9C=E2=94=80=E2=94=80 Return Value
+    =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 kapi_return_spec
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 type
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 success conditions
+    =E2=94=82       =E2=94=94=E2=94=80=E2=94=80 validation rules
+    =E2=94=82
+    =E2=94=9C=E2=94=80=E2=94=80 Error Conditions (up to 32)
+    =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 kapi_error_spec
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 error code
+    =E2=94=82       =E2=94=9C=E2=94=80=E2=94=80 condition description
+    =E2=94=82       =E2=94=94=E2=94=80=E2=94=80 recovery advice
+    =E2=94=82
+    =E2=94=9C=E2=94=80=E2=94=80 Execution Context
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 allowed contexts (process, int=
errupt, etc.)
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 locking requirements
+    =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 preemption/interrupt state
+    =E2=94=82
+    =E2=94=94=E2=94=80=E2=94=80 Side Effects
+        =E2=94=9C=E2=94=80=E2=94=80 memory allocation
+        =E2=94=9C=E2=94=80=E2=94=80 state changes
+        =E2=94=94=E2=94=80=E2=94=80 signal handling
+
+Usage Guide
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Basic API Specification
+-----------------------
+
+To document a kernel API, use the specification macros in the implementati=
on file:
+
+.. code-block:: c
+
+    #include <linux/kernel_api_spec.h>
+
+    KAPI_DEFINE_SPEC(kmalloc_spec, kmalloc, "3.0")
+    KAPI_DESCRIPTION("Allocate kernel memory")
+    KAPI_PARAM(0, size, KAPI_TYPE_SIZE_T, KAPI_DIR_IN,
+               "Number of bytes to allocate")
+    KAPI_PARAM_RANGE(0, 0, KMALLOC_MAX_SIZE)
+    KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+               "Allocation flags (GFP_*)")
+    KAPI_PARAM_MASK(1, __GFP_BITS_MASK)
+    KAPI_RETURN(KAPI_TYPE_POINTER, "Pointer to allocated memory or NULL")
+    KAPI_ERROR(ENOMEM, "Out of memory")
+    KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SOFTIRQ | KAPI_CTX_HARDIRQ)
+    KAPI_SIDE_EFFECT("Allocates memory from kernel heap")
+    KAPI_LOCK_NOT_REQUIRED("Any lock")
+    KAPI_END_SPEC
+
+    void *kmalloc(size_t size, gfp_t flags)
+    {
+        /* Implementation */
+    }
+
+System Call Specification
+-------------------------
+
+System calls use specialized macros:
+
+.. code-block:: c
+
+    KAPI_DEFINE_SYSCALL_SPEC(open_spec, open, "1.0")
+    KAPI_DESCRIPTION("Open a file")
+    KAPI_PARAM(0, pathname, KAPI_TYPE_USER_STRING, KAPI_DIR_IN,
+               "Path to file")
+    KAPI_PARAM_PATH(0, PATH_MAX)
+    KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN,
+               "Open flags (O_*)")
+    KAPI_PARAM(2, mode, KAPI_TYPE_MODE_T, KAPI_DIR_IN,
+               "File permissions (if creating)")
+    KAPI_RETURN(KAPI_TYPE_INT, "File descriptor or -1")
+    KAPI_ERROR(EACCES, "Permission denied")
+    KAPI_ERROR(ENOENT, "File does not exist")
+    KAPI_ERROR(EMFILE, "Too many open files")
+    KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+    KAPI_SIGNAL(EINTR, "Open can be interrupted by signal")
+    KAPI_END_SYSCALL_SPEC
+
+IOCTL Specification
+-------------------
+
+IOCTLs have extended support for structure validation:
+
+.. code-block:: c
+
+    KAPI_DEFINE_IOCTL_SPEC(vidioc_querycap_spec, VIDIOC_QUERYCAP,
+                           "VIDIOC_QUERYCAP",
+                           sizeof(struct v4l2_capability),
+                           sizeof(struct v4l2_capability),
+                           "video_fops")
+    KAPI_DESCRIPTION("Query device capabilities")
+    KAPI_IOCTL_FIELD(driver, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+                     "Driver name", 16)
+    KAPI_IOCTL_FIELD(card, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT,
+                     "Device name", 32)
+    KAPI_IOCTL_FIELD(version, KAPI_TYPE_U32, KAPI_DIR_OUT,
+                     "Driver version")
+    KAPI_IOCTL_FIELD(capabilities, KAPI_TYPE_FLAGS, KAPI_DIR_OUT,
+                     "Device capabilities")
+    KAPI_END_IOCTL_SPEC
+
+Runtime Validation
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Enabling Validation
+-------------------
+
+Runtime validation is controlled by kernel configuration:
+
+1. Enable ``CONFIG_KAPI_SPEC`` to build the framework
+2. Enable ``CONFIG_KAPI_RUNTIME_CHECKS`` for runtime validation
+3. Optionally enable ``CONFIG_KAPI_SPEC_DEBUGFS`` for debugfs interface
+
+Validation Modes
+----------------
+
+The framework supports several validation modes:
+
+.. code-block:: c
+
+    /* Enable validation for specific API */
+    kapi_enable_validation("kmalloc");
+
+    /* Enable validation for all APIs */
+    kapi_enable_all_validation();
+
+    /* Set validation level */
+    kapi_set_validation_level(KAPI_VALIDATE_FULL);
+
+Validation Levels:
+
+- ``KAPI_VALIDATE_NONE``: No validation
+- ``KAPI_VALIDATE_BASIC``: Type and NULL checks only
+- ``KAPI_VALIDATE_NORMAL``: Basic + range and constraint checks
+- ``KAPI_VALIDATE_FULL``: All checks including custom validators
+
+Custom Validators
+-----------------
+
+APIs can register custom validation functions:
+
+.. code-block:: c
+
+    static bool validate_buffer_size(const struct kapi_param_spec *spec,
+                                     const void *value, void *context)
+    {
+        size_t size =3D *(size_t *)value;
+        struct my_context *ctx =3D context;
+
+        return size > 0 && size <=3D ctx->max_buffer_size;
+    }
+
+    KAPI_PARAM_CUSTOM_VALIDATOR(0, validate_buffer_size)
+
+DebugFS Interface
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The debugfs interface provides runtime access to API specifications:
+
+Directory Structure
+-------------------
+
+::
+
+    /sys/kernel/debug/kapi/
+    =E2=94=9C=E2=94=80=E2=94=80 apis/                    # All registered =
APIs
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 kmalloc/
+    =E2=94=82   =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 specification   # =
Human-readable spec
+    =E2=94=82   =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 json           # J=
SON format
+    =E2=94=82   =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 xml            # X=
ML format
+    =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 open/
+    =E2=94=82       =E2=94=94=E2=94=80=E2=94=80 ...
+    =E2=94=9C=E2=94=80=E2=94=80 summary                  # Overview of all=
 APIs
+    =E2=94=9C=E2=94=80=E2=94=80 validation/              # Validation cont=
rols
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 enabled             # Global e=
nable/disable
+    =E2=94=82   =E2=94=9C=E2=94=80=E2=94=80 level               # Validati=
on level
+    =E2=94=82   =E2=94=94=E2=94=80=E2=94=80 stats               # Validati=
on statistics
+    =E2=94=94=E2=94=80=E2=94=80 export/                  # Bulk export opt=
ions
+        =E2=94=9C=E2=94=80=E2=94=80 all.json            # All specs in JSON
+        =E2=94=94=E2=94=80=E2=94=80 all.xml             # All specs in XML
+
+Usage Examples
+--------------
+
+Query specific API::
+
+    $ cat /sys/kernel/debug/kapi/apis/kmalloc/specification
+    API: kmalloc
+    Version: 3.0
+    Description: Allocate kernel memory
+
+    Parameters:
+      [0] size (size_t, in): Number of bytes to allocate
+          Range: 0 - 4194304
+      [1] flags (flags, in): Allocation flags (GFP_*)
+          Mask: 0x1ffffff
+
+    Returns: pointer - Pointer to allocated memory or NULL
+
+    Errors:
+      ENOMEM: Out of memory
+
+    Context: process, softirq, hardirq
+
+    Side Effects:
+      - Allocates memory from kernel heap
+
+Export all specifications::
+
+    $ cat /sys/kernel/debug/kapi/export/all.json > kernel-apis.json
+
+Enable validation for specific API::
+
+    $ echo 1 > /sys/kernel/debug/kapi/apis/kmalloc/validate
+
+Performance Considerations
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
+
+Memory Overhead
+---------------
+
+Each API specification consumes approximately 2-4KB of memory. With thousa=
nds
+of kernel APIs, this can add up to several megabytes. Consider:
+
+1. Building with ``CONFIG_KAPI_SPEC=3Dn`` for production kernels
+2. Using ``__init`` annotations for APIs only used during boot
+3. Implementing lazy loading for rarely used specifications
+
+Runtime Overhead
+----------------
+
+When ``CONFIG_KAPI_RUNTIME_CHECKS`` is enabled:
+
+- Each validated API call adds 50-200ns overhead
+- Complex validations (custom validators) may add more
+- Use validation only in development/testing kernels
+
+Optimization Strategies
+-----------------------
+
+1. **Compile-time optimization**: When validation is disabled, all
+   validation code is optimized away by the compiler.
+
+2. **Selective validation**: Enable validation only for specific APIs
+   or subsystems under test.
+
+3. **Caching**: The framework caches validation results for repeated
+   calls with identical parameters.
+
+Documentation Generation
+------------------------
+
+The framework exports specifications via debugfs that can be used
+to generate documentation. Tools for automatic documentation generation
+from specifications are planned for future development.
+
+IDE Integration
+---------------
+
+Modern IDEs can use the JSON export for:
+
+- Parameter hints
+- Type checking
+- Context validation
+- Error code documentation
+
+Testing Framework
+-----------------
+
+The framework includes test helpers::
+
+    #ifdef CONFIG_KAPI_TESTING
+    /* Verify API behaves according to specification */
+    kapi_test_api("kmalloc", test_cases);
+    #endif
+
+Best Practices
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Writing Specifications
+----------------------
+
+1. **Be Comprehensive**: Document all parameters, errors, and side effects
+2. **Keep Updated**: Update specs when API behavior changes
+3. **Use Examples**: Include usage examples in descriptions
+4. **Validate Constraints**: Define realistic constraints for parameters
+5. **Document Context**: Clearly specify allowed execution contexts
+
+Maintenance
+-----------
+
+1. **Version Specifications**: Increment version when API changes
+2. **Deprecation**: Mark deprecated APIs and suggest replacements
+3. **Cross-reference**: Link related APIs in descriptions
+4. **Test Specifications**: Verify specs match implementation
+
+Common Patterns
+---------------
+
+**Optional Parameters**::
+
+    KAPI_PARAM(2, optional_arg, KAPI_TYPE_POINTER, KAPI_DIR_IN,
+               "Optional argument (may be NULL)")
+    KAPI_PARAM_OPTIONAL(2)
+
+**Variable Arguments**::
+
+    KAPI_PARAM(1, fmt, KAPI_TYPE_FORMAT_STRING, KAPI_DIR_IN,
+               "Printf-style format string")
+    KAPI_PARAM_VARIADIC(2, "Format arguments")
+
+**Callback Functions**::
+
+    KAPI_PARAM(1, callback, KAPI_TYPE_FUNCTION_PTR, KAPI_DIR_IN,
+               "Callback function")
+    KAPI_PARAM_CALLBACK(1, "int (*)(void *data)", "data")
+
+Troubleshooting
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Common Issues
+-------------
+
+**Specification Not Found**::
+
+    kernel: KAPI: Specification for 'my_api' not found
+
+    Solution: Ensure KAPI_DEFINE_SPEC is in the same translation unit
+    as the function implementation.
+
+**Validation Failures**::
+
+    kernel: KAPI: Validation failed for kmalloc parameter 'size':
+            value 5242880 exceeds maximum 4194304
+
+    Solution: Check parameter constraints or adjust specification if
+    the constraint is incorrect.
+
+**Build Errors**::
+
+    error: 'KAPI_TYPE_UNKNOWN' undeclared
+
+    Solution: Include <linux/kernel_api_spec.h> and ensure
+    CONFIG_KAPI_SPEC is enabled.
+
+Debug Options
+-------------
+
+Enable verbose debugging::
+
+    echo 8 > /proc/sys/kernel/printk
+    echo 1 > /sys/kernel/debug/kapi/debug/verbose
+
+Future Directions
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Planned Features
+----------------
+
+1. **Automatic Extraction**: Tool to extract specifications from existing
+   kernel-doc comments
+
+2. **Contract Verification**: Static analysis to verify implementation
+   matches specification
+
+3. **Performance Profiling**: Measure actual API performance against
+   documented expectations
+
+4. **Fuzzing Integration**: Use specifications to guide intelligent
+   fuzzing of kernel APIs
+
+5. **Version Compatibility**: Track API changes across kernel versions
+
+Research Areas
+--------------
+
+1. **Formal Verification**: Use specifications for mathematical proofs
+   of correctness
+
+2. **Runtime Monitoring**: Detect specification violations in production
+   with minimal overhead
+
+3. **API Evolution**: Analyze how kernel APIs change over time
+
+4. **Security Applications**: Use specifications for security policy
+   enforcement
+
+Contributing
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Submitting Specifications
+-------------------------
+
+1. Add specifications to the same file as the API implementation
+2. Follow existing patterns and naming conventions
+3. Test with CONFIG_KAPI_RUNTIME_CHECKS enabled
+4. Verify debugfs output is correct
+5. Run scripts/checkpatch.pl on your changes
+
+Review Criteria
+---------------
+
+Specifications will be reviewed for:
+
+1. **Completeness**: All parameters and errors documented
+2. **Accuracy**: Specification matches implementation
+3. **Clarity**: Descriptions are clear and helpful
+4. **Consistency**: Follows framework conventions
+5. **Performance**: No unnecessary runtime overhead
+
+Contact
+-------
+
+- Maintainer: Sasha Levin <sashal@kernel.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index 5b11839cba9de..14cd8b3c95e40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13647,6 +13647,15 @@ W:	https://linuxtv.org
 T:	git git://linuxtv.org/media.git
 F:	drivers/media/radio/radio-keene*
=20
+KERNEL API SPECIFICATION FRAMEWORK (KAPI)
+M:	Sasha Levin <sashal@kernel.org>
+L:	linux-api@vger.kernel.org
+S:	Maintained
+F:	Documentation/admin-guide/kernel-api-spec.rst
+F:	include/linux/kernel_api_spec.h
+F:	kernel/api/
+F:	scripts/extract-kapi-spec.sh
+
 KERNEL AUTOMOUNTER
 M:	Ian Kent <raven@themaw.net>
 L:	autofs@vger.kernel.org
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinu=
x.lds.h
index 8ca130af301fc..658a14f8bf309 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -296,6 +296,33 @@
 #define TRACE_SYSCALLS()
 #endif
=20
+#ifdef CONFIG_KAPI_SPEC
+/*
+ * KAPI_SPECS - Include kernel API specifications in current section
+ *
+ * The .kapi_specs input section has 32-byte alignment requirement from
+ * the compiler, so we must align to 32 bytes before setting the start
+ * symbol to avoid padding between the symbol and actual data.
+ */
+#define KAPI_SPECS()				\
+	. =3D ALIGN(32);				\
+	__start_kapi_specs =3D .;			\
+	KEEP(*(.kapi_specs))			\
+	__stop_kapi_specs =3D .;
+
+/* For placing KAPI specs in a dedicated section */
+#define KAPI_SPECS_SECTION()			\
+	.kapi_specs : AT(ADDR(.kapi_specs) - LOAD_OFFSET) {	\
+		. =3D ALIGN(32);			\
+		__start_kapi_specs =3D .;		\
+		KEEP(*(.kapi_specs))		\
+		__stop_kapi_specs =3D .;		\
+	}
+#else
+#define KAPI_SPECS()
+#define KAPI_SPECS_SECTION()
+#endif
+
 #ifdef CONFIG_BPF_EVENTS
 #define BPF_RAW_TP() STRUCT_ALIGN();				\
 	BOUNDED_SECTION_BY(__bpf_raw_tp_map, __bpf_raw_tp)
@@ -485,6 +512,7 @@
 		. =3D ALIGN(8);						\
 		BOUNDED_SECTION_BY(__tracepoints_ptrs, ___tracepoints_ptrs) \
 		*(__tracepoints_strings)/* Tracepoints: strings */	\
+		KAPI_SPECS()						\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\
diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spe=
c.h
new file mode 100644
index 0000000000000..b3460d156602e
--- /dev/null
+++ b/include/linux/kernel_api_spec.h
@@ -0,0 +1,1597 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * kernel_api_spec.h - Kernel API Formal Specification Framework
+ *
+ * This framework provides structures and macros to formally specify kerne=
l APIs
+ * in both human and machine-readable formats. It supports comprehensive d=
ocumentation
+ * of function signatures, parameters, return values, error conditions, an=
d constraints.
+ */
+
+#ifndef _LINUX_KERNEL_API_SPEC_H
+#define _LINUX_KERNEL_API_SPEC_H
+
+#include <linux/types.h>
+#include <linux/stringify.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+
+struct sigaction;
+
+#define KAPI_MAX_PARAMS		16
+#define KAPI_MAX_ERRORS		32
+#define KAPI_MAX_CONSTRAINTS	32
+#define KAPI_MAX_SIGNALS	32
+#define KAPI_MAX_NAME_LEN	128
+#define KAPI_MAX_DESC_LEN	512
+#define KAPI_MAX_CAPABILITIES	8
+#define KAPI_MAX_SOCKET_STATES	16
+#define KAPI_MAX_PROTOCOL_BEHAVIORS	8
+#define KAPI_MAX_NET_ERRORS	16
+#define KAPI_MAX_SOCKOPTS	16
+#define KAPI_MAX_ADDR_FAMILIES	8
+
+/* Magic numbers for section validation (ASCII mnemonics) */
+#define KAPI_MAGIC_PARAMS	0x4B415031	/* 'KAP1' */
+#define KAPI_MAGIC_RETURN	0x4B415232	/* 'KAR2' */
+#define KAPI_MAGIC_ERRORS	0x4B414533	/* 'KAE3' */
+#define KAPI_MAGIC_LOCKS	0x4B414C34	/* 'KAL4' */
+#define KAPI_MAGIC_CONSTRAINTS	0x4B414335	/* 'KAC5' */
+#define KAPI_MAGIC_INFO		0x4B414936	/* 'KAI6' */
+#define KAPI_MAGIC_SIGNALS	0x4B415337	/* 'KAS7' */
+#define KAPI_MAGIC_SIGMASK	0x4B414D38	/* 'KAM8' */
+#define KAPI_MAGIC_STRUCTS	0x4B415439	/* 'KAT9' */
+#define KAPI_MAGIC_EFFECTS	0x4B414641	/* 'KAFA' */
+#define KAPI_MAGIC_TRANS	0x4B415442	/* 'KATB' */
+#define KAPI_MAGIC_CAPS		0x4B414343	/* 'KACC' */
+
+/**
+ * enum kapi_param_type - Parameter type classification
+ * @KAPI_TYPE_VOID: void type
+ * @KAPI_TYPE_INT: Integer types (int, long, etc.)
+ * @KAPI_TYPE_UINT: Unsigned integer types
+ * @KAPI_TYPE_PTR: Pointer types
+ * @KAPI_TYPE_STRUCT: Structure types
+ * @KAPI_TYPE_UNION: Union types
+ * @KAPI_TYPE_ENUM: Enumeration types
+ * @KAPI_TYPE_FUNC_PTR: Function pointer types
+ * @KAPI_TYPE_ARRAY: Array types
+ * @KAPI_TYPE_FD: File descriptor - validated in process context
+ * @KAPI_TYPE_USER_PTR: User space pointer - validated for access and size
+ * @KAPI_TYPE_PATH: Pathname - validated for access and path limits
+ * @KAPI_TYPE_CUSTOM: Custom/complex types
+ */
+enum kapi_param_type {
+	KAPI_TYPE_VOID =3D 0,
+	KAPI_TYPE_INT,
+	KAPI_TYPE_UINT,
+	KAPI_TYPE_PTR,
+	KAPI_TYPE_STRUCT,
+	KAPI_TYPE_UNION,
+	KAPI_TYPE_ENUM,
+	KAPI_TYPE_FUNC_PTR,
+	KAPI_TYPE_ARRAY,
+	KAPI_TYPE_FD,		/* File descriptor - validated in process context */
+	KAPI_TYPE_USER_PTR,	/* User space pointer - validated for access and size=
 */
+	KAPI_TYPE_PATH,		/* Pathname - validated for access and path limits */
+	KAPI_TYPE_CUSTOM,
+};
+
+/**
+ * enum kapi_param_flags - Parameter attribute flags
+ * @KAPI_PARAM_IN: Input parameter
+ * @KAPI_PARAM_OUT: Output parameter
+ * @KAPI_PARAM_INOUT: Input/output parameter
+ * @KAPI_PARAM_OPTIONAL: Optional parameter (can be NULL)
+ * @KAPI_PARAM_CONST: Const qualified parameter
+ * @KAPI_PARAM_VOLATILE: Volatile qualified parameter
+ * @KAPI_PARAM_USER: User space pointer
+ * @KAPI_PARAM_DMA: DMA-capable memory required
+ * @KAPI_PARAM_ALIGNED: Alignment requirements
+ */
+enum kapi_param_flags {
+	KAPI_PARAM_IN		=3D (1 << 0),
+	KAPI_PARAM_OUT		=3D (1 << 1),
+	KAPI_PARAM_INOUT	=3D (1 << 2),
+	KAPI_PARAM_OPTIONAL	=3D (1 << 3),
+	KAPI_PARAM_CONST	=3D (1 << 4),
+	KAPI_PARAM_VOLATILE	=3D (1 << 5),
+	KAPI_PARAM_USER		=3D (1 << 6),
+	KAPI_PARAM_DMA		=3D (1 << 7),
+	KAPI_PARAM_ALIGNED	=3D (1 << 8),
+};
+
+/**
+ * enum kapi_context_flags - Function execution context flags
+ * @KAPI_CTX_PROCESS: Can be called from process context
+ * @KAPI_CTX_SOFTIRQ: Can be called from softirq context
+ * @KAPI_CTX_HARDIRQ: Can be called from hardirq context
+ * @KAPI_CTX_NMI: Can be called from NMI context
+ * @KAPI_CTX_ATOMIC: Must be called in atomic context
+ * @KAPI_CTX_SLEEPABLE: May sleep
+ * @KAPI_CTX_PREEMPT_DISABLED: Requires preemption disabled
+ * @KAPI_CTX_IRQ_DISABLED: Requires interrupts disabled
+ */
+enum kapi_context_flags {
+	KAPI_CTX_PROCESS	=3D (1 << 0),
+	KAPI_CTX_SOFTIRQ	=3D (1 << 1),
+	KAPI_CTX_HARDIRQ	=3D (1 << 2),
+	KAPI_CTX_NMI		=3D (1 << 3),
+	KAPI_CTX_ATOMIC		=3D (1 << 4),
+	KAPI_CTX_SLEEPABLE	=3D (1 << 5),
+	KAPI_CTX_PREEMPT_DISABLED =3D (1 << 6),
+	KAPI_CTX_IRQ_DISABLED	=3D (1 << 7),
+};
+
+/**
+ * enum kapi_lock_type - Lock types used/required by the function
+ * @KAPI_LOCK_NONE: No locking requirements
+ * @KAPI_LOCK_MUTEX: Mutex lock
+ * @KAPI_LOCK_SPINLOCK: Spinlock
+ * @KAPI_LOCK_RWLOCK: Read-write lock
+ * @KAPI_LOCK_SEQLOCK: Sequence lock
+ * @KAPI_LOCK_RCU: RCU lock
+ * @KAPI_LOCK_SEMAPHORE: Semaphore
+ * @KAPI_LOCK_CUSTOM: Custom locking mechanism
+ */
+enum kapi_lock_type {
+	KAPI_LOCK_NONE =3D 0,
+	KAPI_LOCK_MUTEX,
+	KAPI_LOCK_SPINLOCK,
+	KAPI_LOCK_RWLOCK,
+	KAPI_LOCK_SEQLOCK,
+	KAPI_LOCK_RCU,
+	KAPI_LOCK_SEMAPHORE,
+	KAPI_LOCK_CUSTOM,
+};
+
+/**
+ * enum kapi_constraint_type - Types of parameter constraints
+ * @KAPI_CONSTRAINT_NONE: No constraint
+ * @KAPI_CONSTRAINT_RANGE: Numeric range constraint
+ * @KAPI_CONSTRAINT_MASK: Bitmask constraint
+ * @KAPI_CONSTRAINT_ENUM: Enumerated values constraint
+ * @KAPI_CONSTRAINT_ALIGNMENT: Alignment constraint (must be aligned to sp=
ecified boundary)
+ * @KAPI_CONSTRAINT_POWER_OF_TWO: Value must be a power of two
+ * @KAPI_CONSTRAINT_PAGE_ALIGNED: Value must be page-aligned
+ * @KAPI_CONSTRAINT_NONZERO: Value must be non-zero
+ * @KAPI_CONSTRAINT_USER_STRING: Userspace null-terminated string with len=
gth range
+ * @KAPI_CONSTRAINT_USER_PATH: Userspace pathname string (validated for ac=
cessibility and PATH_MAX)
+ * @KAPI_CONSTRAINT_USER_PTR: Userspace pointer (validated for accessibili=
ty and size)
+ * @KAPI_CONSTRAINT_CUSTOM: Custom validation function
+ */
+enum kapi_constraint_type {
+	KAPI_CONSTRAINT_NONE =3D 0,
+	KAPI_CONSTRAINT_RANGE,
+	KAPI_CONSTRAINT_MASK,
+	KAPI_CONSTRAINT_ENUM,
+	KAPI_CONSTRAINT_ALIGNMENT,
+	KAPI_CONSTRAINT_POWER_OF_TWO,
+	KAPI_CONSTRAINT_PAGE_ALIGNED,
+	KAPI_CONSTRAINT_NONZERO,
+	KAPI_CONSTRAINT_USER_STRING,
+	KAPI_CONSTRAINT_USER_PATH,
+	KAPI_CONSTRAINT_USER_PTR,
+	KAPI_CONSTRAINT_CUSTOM,
+};
+
+/**
+ * struct kapi_param_spec - Parameter specification
+ * @name: Parameter name
+ * @type_name: Type name as string
+ * @type: Parameter type classification
+ * @flags: Parameter attribute flags
+ * @size: Size in bytes (for arrays/buffers)
+ * @alignment: Required alignment
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag parameters)
+ * @enum_values: Array of valid enumerated values
+ * @enum_count: Number of valid enumerated values
+ * @constraint_type: Type of constraint applied
+ * @validate: Custom validation function
+ * @description: Human-readable description
+ * @constraints: Additional constraints description
+ * @size_param_idx: Index of parameter that determines size (-1 if fixed s=
ize)
+ * @size_multiplier: Multiplier for size calculation (e.g., sizeof(struct))
+ */
+struct kapi_param_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	char type_name[KAPI_MAX_NAME_LEN];
+	enum kapi_param_type type;
+	u32 flags;
+	size_t size;
+	size_t alignment;
+	s64 min_value;
+	s64 max_value;
+	u64 valid_mask;
+	const s64 *enum_values;
+	u32 enum_count;
+	enum kapi_constraint_type constraint_type;
+	bool (*validate)(s64 value);
+	char description[KAPI_MAX_DESC_LEN];
+	char constraints[KAPI_MAX_DESC_LEN];
+	int size_param_idx;	/* Index of param that determines size, -1 if N/A */
+	size_t size_multiplier;	/* Size per unit (e.g., sizeof(struct epoll_event=
)) */
+} __attribute__((packed));
+
+/**
+ * struct kapi_error_spec - Error condition specification
+ * @error_code: Error code value
+ * @name: Error code name (e.g., "EINVAL")
+ * @condition: Condition that triggers this error
+ * @description: Detailed error description
+ */
+struct kapi_error_spec {
+	int error_code;
+	char name[KAPI_MAX_NAME_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_return_check_type - Return value check types
+ * @KAPI_RETURN_EXACT: Success is an exact value
+ * @KAPI_RETURN_RANGE: Success is within a range
+ * @KAPI_RETURN_ERROR_CHECK: Success is when NOT in error list
+ * @KAPI_RETURN_FD: Return value is a file descriptor (>=3D 0 is success)
+ * @KAPI_RETURN_CUSTOM: Custom validation function
+ * @KAPI_RETURN_NO_RETURN: Function does not return (e.g., exec on success)
+ */
+enum kapi_return_check_type {
+	KAPI_RETURN_EXACT,
+	KAPI_RETURN_RANGE,
+	KAPI_RETURN_ERROR_CHECK,
+	KAPI_RETURN_FD,
+	KAPI_RETURN_CUSTOM,
+	KAPI_RETURN_NO_RETURN,
+};
+
+/**
+ * struct kapi_return_spec - Return value specification
+ * @type_name: Return type name
+ * @type: Return type classification
+ * @check_type: Type of success check to perform
+ * @success_value: Exact value indicating success (for EXACT)
+ * @success_min: Minimum success value (for RANGE)
+ * @success_max: Maximum success value (for RANGE)
+ * @error_values: Array of error values (for ERROR_CHECK)
+ * @error_count: Number of error values
+ * @is_success: Custom function to check success
+ * @description: Return value description
+ */
+struct kapi_return_spec {
+	char type_name[KAPI_MAX_NAME_LEN];
+	enum kapi_param_type type;
+	enum kapi_return_check_type check_type;
+	s64 success_value;
+	s64 success_min;
+	s64 success_max;
+	const s64 *error_values;
+	u32 error_count;
+	bool (*is_success)(s64 retval);
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_lock_scope - Lock acquisition/release scope
+ * @KAPI_LOCK_INTERNAL: Lock is acquired and released within the function =
(common case)
+ * @KAPI_LOCK_ACQUIRES: Function acquires lock but does not release it
+ * @KAPI_LOCK_RELEASES: Function releases lock (must be held on entry)
+ * @KAPI_LOCK_CALLER_HELD: Lock must be held by caller throughout the call
+ */
+enum kapi_lock_scope {
+	KAPI_LOCK_INTERNAL =3D 0,
+	KAPI_LOCK_ACQUIRES,
+	KAPI_LOCK_RELEASES,
+	KAPI_LOCK_CALLER_HELD,
+};
+
+/**
+ * struct kapi_lock_spec - Lock requirement specification
+ * @lock_name: Name of the lock
+ * @lock_type: Type of lock
+ * @scope: Lock scope (internal, acquires, releases, or caller-held)
+ * @description: Additional lock requirements
+ */
+struct kapi_lock_spec {
+	char lock_name[KAPI_MAX_NAME_LEN];
+	enum kapi_lock_type lock_type;
+	enum kapi_lock_scope scope;
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_constraint_spec - Additional constraint specification
+ * @name: Constraint name
+ * @description: Constraint description
+ * @expression: Formal expression (if applicable)
+ */
+struct kapi_constraint_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+	char expression[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_signal_direction - Signal flow direction
+ * @KAPI_SIGNAL_RECEIVE: Function may receive this signal
+ * @KAPI_SIGNAL_SEND: Function may send this signal
+ * @KAPI_SIGNAL_HANDLE: Function handles this signal specially
+ * @KAPI_SIGNAL_BLOCK: Function blocks this signal
+ * @KAPI_SIGNAL_IGNORE: Function ignores this signal
+ */
+enum kapi_signal_direction {
+	KAPI_SIGNAL_RECEIVE	=3D (1 << 0),
+	KAPI_SIGNAL_SEND	=3D (1 << 1),
+	KAPI_SIGNAL_HANDLE	=3D (1 << 2),
+	KAPI_SIGNAL_BLOCK	=3D (1 << 3),
+	KAPI_SIGNAL_IGNORE	=3D (1 << 4),
+};
+
+/**
+ * enum kapi_signal_action - What the function does with the signal
+ * @KAPI_SIGNAL_ACTION_DEFAULT: Default signal action applies
+ * @KAPI_SIGNAL_ACTION_TERMINATE: Causes termination
+ * @KAPI_SIGNAL_ACTION_COREDUMP: Causes termination with core dump
+ * @KAPI_SIGNAL_ACTION_STOP: Stops the process
+ * @KAPI_SIGNAL_ACTION_CONTINUE: Continues a stopped process
+ * @KAPI_SIGNAL_ACTION_CUSTOM: Custom handling described in notes
+ * @KAPI_SIGNAL_ACTION_RETURN: Returns from syscall with EINTR
+ * @KAPI_SIGNAL_ACTION_RESTART: Restarts the syscall
+ * @KAPI_SIGNAL_ACTION_QUEUE: Queues the signal for later delivery
+ * @KAPI_SIGNAL_ACTION_DISCARD: Discards the signal
+ * @KAPI_SIGNAL_ACTION_TRANSFORM: Transforms to another signal
+ */
+enum kapi_signal_action {
+	KAPI_SIGNAL_ACTION_DEFAULT =3D 0,
+	KAPI_SIGNAL_ACTION_TERMINATE,
+	KAPI_SIGNAL_ACTION_COREDUMP,
+	KAPI_SIGNAL_ACTION_STOP,
+	KAPI_SIGNAL_ACTION_CONTINUE,
+	KAPI_SIGNAL_ACTION_CUSTOM,
+	KAPI_SIGNAL_ACTION_RETURN,
+	KAPI_SIGNAL_ACTION_RESTART,
+	KAPI_SIGNAL_ACTION_QUEUE,
+	KAPI_SIGNAL_ACTION_DISCARD,
+	KAPI_SIGNAL_ACTION_TRANSFORM,
+};
+
+/**
+ * struct kapi_signal_spec - Signal specification
+ * @signal_num: Signal number (e.g., SIGKILL, SIGTERM)
+ * @signal_name: Signal name as string
+ * @direction: Direction flags (OR of kapi_signal_direction)
+ * @action: What happens when signal is received
+ * @target: Description of target process/thread for sent signals
+ * @condition: Condition under which signal is sent/received/handled
+ * @description: Detailed description of signal handling
+ * @restartable: Whether syscall is restartable after this signal
+ * @sa_flags_required: Required signal action flags (SA_*)
+ * @sa_flags_forbidden: Forbidden signal action flags
+ * @error_on_signal: Error code returned when signal occurs (-EINTR, etc)
+ * @transform_to: Signal number to transform to (if action is TRANSFORM)
+ * @timing: When signal can occur ("entry", "during", "exit", "anytime")
+ * @priority: Signal handling priority (lower processed first)
+ * @interruptible: Whether this operation is interruptible by this signal
+ * @queue_behavior: How signal is queued ("realtime", "standard", "coalesc=
e")
+ * @state_required: Required process state for signal to be delivered
+ * @state_forbidden: Forbidden process state for signal delivery
+ */
+struct kapi_signal_spec {
+	int signal_num;
+	char signal_name[32];
+	u32 direction;
+	enum kapi_signal_action action;
+	char target[KAPI_MAX_DESC_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+	bool restartable;
+	u32 sa_flags_required;
+	u32 sa_flags_forbidden;
+	int error_on_signal;
+	int transform_to;
+	char timing[32];
+	u8 priority;
+	bool interruptible;
+	char queue_behavior[128];
+	u32 state_required;
+	u32 state_forbidden;
+} __attribute__((packed));
+
+/**
+ * struct kapi_signal_mask_spec - Signal mask specification
+ * @mask_name: Name of the signal mask
+ * @signals: Array of signal numbers in the mask
+ * @signal_count: Number of signals in the mask
+ * @description: Description of what this mask represents
+ */
+struct kapi_signal_mask_spec {
+	char mask_name[KAPI_MAX_NAME_LEN];
+	int signals[KAPI_MAX_SIGNALS];
+	u32 signal_count;
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_field - Structure field specification
+ * @name: Field name
+ * @type: Field type classification
+ * @type_name: Type name as string
+ * @offset: Offset within structure
+ * @size: Size of field in bytes
+ * @flags: Field attribute flags
+ * @constraint_type: Type of constraint applied
+ * @min_value: Minimum valid value (for numeric types)
+ * @max_value: Maximum valid value (for numeric types)
+ * @valid_mask: Valid bits mask (for flag fields)
+ * @enum_values: Comma-separated list of valid enum values (for enum types)
+ * @description: Field description
+ */
+struct kapi_struct_field {
+	char name[KAPI_MAX_NAME_LEN];
+	enum kapi_param_type type;
+	char type_name[KAPI_MAX_NAME_LEN];
+	size_t offset;
+	size_t size;
+	u32 flags;
+	enum kapi_constraint_type constraint_type;
+	s64 min_value;
+	s64 max_value;
+	u64 valid_mask;
+	char enum_values[KAPI_MAX_DESC_LEN];	/* Comma-separated list of valid enu=
m values */
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_struct_spec - Structure type specification
+ * @name: Structure name
+ * @size: Total size of structure
+ * @alignment: Required alignment
+ * @field_count: Number of fields
+ * @fields: Field specifications
+ * @description: Structure description
+ */
+struct kapi_struct_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	size_t size;
+	size_t alignment;
+	u32 field_count;
+	struct kapi_struct_field fields[KAPI_MAX_PARAMS];
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * enum kapi_capability_action - What the capability allows
+ * @KAPI_CAP_BYPASS_CHECK: Bypasses a check entirely
+ * @KAPI_CAP_INCREASE_LIMIT: Increases or removes a limit
+ * @KAPI_CAP_OVERRIDE_RESTRICTION: Overrides a restriction
+ * @KAPI_CAP_GRANT_PERMISSION: Grants permission that would otherwise be d=
enied
+ * @KAPI_CAP_MODIFY_BEHAVIOR: Changes the behavior of the operation
+ * @KAPI_CAP_ACCESS_RESOURCE: Allows access to restricted resources
+ * @KAPI_CAP_PERFORM_OPERATION: Allows performing a privileged operation
+ */
+enum kapi_capability_action {
+	KAPI_CAP_BYPASS_CHECK =3D 0,
+	KAPI_CAP_INCREASE_LIMIT,
+	KAPI_CAP_OVERRIDE_RESTRICTION,
+	KAPI_CAP_GRANT_PERMISSION,
+	KAPI_CAP_MODIFY_BEHAVIOR,
+	KAPI_CAP_ACCESS_RESOURCE,
+	KAPI_CAP_PERFORM_OPERATION,
+};
+
+/**
+ * struct kapi_capability_spec - Capability requirement specification
+ * @capability: The capability constant (e.g., CAP_IPC_LOCK)
+ * @cap_name: Capability name as string
+ * @action: What the capability allows (kapi_capability_action)
+ * @allows: Description of what the capability allows
+ * @without_cap: What happens without the capability
+ * @check_condition: Condition when capability is checked
+ * @priority: Check priority (lower checked first)
+ * @alternative: Alternative capabilities that can be used
+ * @alternative_count: Number of alternative capabilities
+ */
+struct kapi_capability_spec {
+	int capability;
+	char cap_name[KAPI_MAX_NAME_LEN];
+	enum kapi_capability_action action;
+	char allows[KAPI_MAX_DESC_LEN];
+	char without_cap[KAPI_MAX_DESC_LEN];
+	char check_condition[KAPI_MAX_DESC_LEN];
+	u8 priority;
+	int alternative[KAPI_MAX_CAPABILITIES];
+	u32 alternative_count;
+} __attribute__((packed));
+
+/**
+ * enum kapi_side_effect_type - Types of side effects
+ * @KAPI_EFFECT_NONE: No side effects
+ * @KAPI_EFFECT_ALLOC_MEMORY: Allocates memory
+ * @KAPI_EFFECT_FREE_MEMORY: Frees memory
+ * @KAPI_EFFECT_MODIFY_STATE: Modifies global/shared state
+ * @KAPI_EFFECT_SIGNAL_SEND: Sends signals
+ * @KAPI_EFFECT_FILE_POSITION: Modifies file position
+ * @KAPI_EFFECT_LOCK_ACQUIRE: Acquires locks
+ * @KAPI_EFFECT_LOCK_RELEASE: Releases locks
+ * @KAPI_EFFECT_RESOURCE_CREATE: Creates system resources (FDs, PIDs, etc)
+ * @KAPI_EFFECT_RESOURCE_DESTROY: Destroys system resources
+ * @KAPI_EFFECT_SCHEDULE: May cause scheduling/context switch
+ * @KAPI_EFFECT_HARDWARE: Interacts with hardware
+ * @KAPI_EFFECT_NETWORK: Network I/O operation
+ * @KAPI_EFFECT_FILESYSTEM: Filesystem modification
+ * @KAPI_EFFECT_PROCESS_STATE: Modifies process state
+ * @KAPI_EFFECT_IRREVERSIBLE: Effect cannot be undone
+ */
+enum kapi_side_effect_type {
+	KAPI_EFFECT_NONE =3D 0,
+	KAPI_EFFECT_ALLOC_MEMORY =3D (1 << 0),
+	KAPI_EFFECT_FREE_MEMORY =3D (1 << 1),
+	KAPI_EFFECT_MODIFY_STATE =3D (1 << 2),
+	KAPI_EFFECT_SIGNAL_SEND =3D (1 << 3),
+	KAPI_EFFECT_FILE_POSITION =3D (1 << 4),
+	KAPI_EFFECT_LOCK_ACQUIRE =3D (1 << 5),
+	KAPI_EFFECT_LOCK_RELEASE =3D (1 << 6),
+	KAPI_EFFECT_RESOURCE_CREATE =3D (1 << 7),
+	KAPI_EFFECT_RESOURCE_DESTROY =3D (1 << 8),
+	KAPI_EFFECT_SCHEDULE =3D (1 << 9),
+	KAPI_EFFECT_HARDWARE =3D (1 << 10),
+	KAPI_EFFECT_NETWORK =3D (1 << 11),
+	KAPI_EFFECT_FILESYSTEM =3D (1 << 12),
+	KAPI_EFFECT_PROCESS_STATE =3D (1 << 13),
+	KAPI_EFFECT_IRREVERSIBLE =3D (1 << 14),
+};
+
+/**
+ * struct kapi_side_effect - Side effect specification
+ * @type: Bitmask of effect types
+ * @target: What is affected (e.g., "process memory", "file descriptor tab=
le")
+ * @condition: Condition under which effect occurs
+ * @description: Detailed description of the effect
+ * @reversible: Whether the effect can be undone
+ */
+struct kapi_side_effect {
+	u32 type;
+	char target[KAPI_MAX_NAME_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+	bool reversible;
+} __attribute__((packed));
+
+/**
+ * struct kapi_state_transition - State transition specification
+ * @from_state: Starting state description
+ * @to_state: Ending state description
+ * @condition: Condition for transition
+ * @object: Object whose state changes
+ * @description: Detailed description
+ */
+struct kapi_state_transition {
+	char from_state[KAPI_MAX_NAME_LEN];
+	char to_state[KAPI_MAX_NAME_LEN];
+	char condition[KAPI_MAX_DESC_LEN];
+	char object[KAPI_MAX_NAME_LEN];
+	char description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+#define KAPI_MAX_STRUCT_SPECS	8
+#define KAPI_MAX_SIDE_EFFECTS	32
+#define KAPI_MAX_STATE_TRANS	8
+
+/**
+ * enum kapi_socket_state - Socket states for state machine
+ */
+enum kapi_socket_state {
+	KAPI_SOCK_STATE_UNSPEC =3D 0,
+	KAPI_SOCK_STATE_CLOSED,
+	KAPI_SOCK_STATE_OPEN,
+	KAPI_SOCK_STATE_BOUND,
+	KAPI_SOCK_STATE_LISTEN,
+	KAPI_SOCK_STATE_SYN_SENT,
+	KAPI_SOCK_STATE_SYN_RECV,
+	KAPI_SOCK_STATE_ESTABLISHED,
+	KAPI_SOCK_STATE_FIN_WAIT1,
+	KAPI_SOCK_STATE_FIN_WAIT2,
+	KAPI_SOCK_STATE_CLOSE_WAIT,
+	KAPI_SOCK_STATE_CLOSING,
+	KAPI_SOCK_STATE_LAST_ACK,
+	KAPI_SOCK_STATE_TIME_WAIT,
+	KAPI_SOCK_STATE_CONNECTED,
+	KAPI_SOCK_STATE_DISCONNECTED,
+};
+
+/**
+ * enum kapi_socket_protocol - Socket protocol types
+ */
+enum kapi_socket_protocol {
+	KAPI_PROTO_TCP		=3D (1 << 0),
+	KAPI_PROTO_UDP		=3D (1 << 1),
+	KAPI_PROTO_UNIX		=3D (1 << 2),
+	KAPI_PROTO_RAW		=3D (1 << 3),
+	KAPI_PROTO_PACKET	=3D (1 << 4),
+	KAPI_PROTO_NETLINK	=3D (1 << 5),
+	KAPI_PROTO_SCTP		=3D (1 << 6),
+	KAPI_PROTO_DCCP		=3D (1 << 7),
+	KAPI_PROTO_ALL		=3D 0xFFFFFFFF,
+};
+
+/**
+ * enum kapi_buffer_behavior - Network buffer handling behaviors
+ */
+enum kapi_buffer_behavior {
+	KAPI_BUF_PEEK		=3D (1 << 0),
+	KAPI_BUF_TRUNCATE	=3D (1 << 1),
+	KAPI_BUF_SCATTER	=3D (1 << 2),
+	KAPI_BUF_ZERO_COPY	=3D (1 << 3),
+	KAPI_BUF_KERNEL_ALLOC	=3D (1 << 4),
+	KAPI_BUF_DMA_CAPABLE	=3D (1 << 5),
+	KAPI_BUF_FRAGMENT	=3D (1 << 6),
+};
+
+/**
+ * enum kapi_async_behavior - Asynchronous operation behaviors
+ */
+enum kapi_async_behavior {
+	KAPI_ASYNC_BLOCK	=3D 0,
+	KAPI_ASYNC_NONBLOCK	=3D (1 << 0),
+	KAPI_ASYNC_POLL_READY	=3D (1 << 1),
+	KAPI_ASYNC_SIGNAL_DRIVEN =3D (1 << 2),
+	KAPI_ASYNC_AIO		=3D (1 << 3),
+	KAPI_ASYNC_IO_URING	=3D (1 << 4),
+	KAPI_ASYNC_EPOLL	=3D (1 << 5),
+};
+
+/**
+ * struct kapi_socket_state_spec - Socket state requirement/transition
+ */
+struct kapi_socket_state_spec {
+	enum kapi_socket_state required_states[KAPI_MAX_SOCKET_STATES];
+	u32 required_state_count;
+	enum kapi_socket_state forbidden_states[KAPI_MAX_SOCKET_STATES];
+	u32 forbidden_state_count;
+	enum kapi_socket_state resulting_state;
+	char state_condition[KAPI_MAX_DESC_LEN];
+	u32 applicable_protocols;
+} __attribute__((packed));
+
+/**
+ * struct kapi_protocol_behavior - Protocol-specific behavior
+ */
+struct kapi_protocol_behavior {
+	u32 applicable_protocols;
+	char behavior[KAPI_MAX_DESC_LEN];
+	s64 protocol_flags;
+	char flag_description[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_buffer_spec - Network buffer specification
+ */
+struct kapi_buffer_spec {
+	u32 buffer_behaviors;
+	size_t min_buffer_size;
+	size_t max_buffer_size;
+	size_t optimal_buffer_size;
+	char fragmentation_rules[KAPI_MAX_DESC_LEN];
+	bool can_partial_transfer;
+	char partial_transfer_rules[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_async_spec - Asynchronous behavior specification
+ */
+struct kapi_async_spec {
+	enum kapi_async_behavior supported_modes;
+	int nonblock_errno;
+	u32 poll_events_in;
+	u32 poll_events_out;
+	char completion_condition[KAPI_MAX_DESC_LEN];
+	bool supports_timeout;
+	char timeout_behavior[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/**
+ * struct kapi_addr_family_spec - Address family specification
+ */
+struct kapi_addr_family_spec {
+	int family;
+	char family_name[32];
+	size_t addr_struct_size;
+	size_t min_addr_len;
+	size_t max_addr_len;
+	char addr_format[KAPI_MAX_DESC_LEN];
+	bool supports_wildcard;
+	bool supports_multicast;
+	bool supports_broadcast;
+	char special_addresses[KAPI_MAX_DESC_LEN];
+	u32 port_range_min;
+	u32 port_range_max;
+} __attribute__((packed));
+
+/**
+ * struct kernel_api_spec - Complete kernel API specification
+ * @name: Function name
+ * @version: API version
+ * @description: Brief description
+ * @long_description: Detailed description
+ * @context_flags: Execution context flags
+ * @param_count: Number of parameters
+ * @params: Parameter specifications
+ * @return_spec: Return value specification
+ * @error_count: Number of possible errors
+ * @errors: Error specifications
+ * @lock_count: Number of lock specifications
+ * @locks: Lock requirement specifications
+ * @constraint_count: Number of additional constraints
+ * @constraints: Additional constraint specifications
+ * @examples: Usage examples
+ * @notes: Additional notes
+ * @since_version: Kernel version when introduced
+ * @signal_count: Number of signal specifications
+ * @signals: Signal handling specifications
+ * @signal_mask_count: Number of signal mask specifications
+ * @signal_masks: Signal mask specifications
+ * @struct_spec_count: Number of structure specifications
+ * @struct_specs: Structure type specifications
+ * @side_effect_count: Number of side effect specifications
+ * @side_effects: Side effect specifications
+ * @state_trans_count: Number of state transition specifications
+ * @state_transitions: State transition specifications
+ */
+struct kernel_api_spec {
+	char name[KAPI_MAX_NAME_LEN];
+	u32 version;
+	char description[KAPI_MAX_DESC_LEN];
+	char long_description[KAPI_MAX_DESC_LEN * 4];
+	u32 context_flags;
+
+	/* Parameters */
+	u32 param_magic;  /* 0x4B415031 =3D 'KAP1' */
+	u32 param_count;
+	struct kapi_param_spec params[KAPI_MAX_PARAMS];
+
+	/* Return value */
+	u32 return_magic; /* 0x4B415232 =3D 'KAR2' */
+	struct kapi_return_spec return_spec;
+
+	/* Errors */
+	u32 error_magic;  /* 0x4B414533 =3D 'KAE3' */
+	u32 error_count;
+	struct kapi_error_spec errors[KAPI_MAX_ERRORS];
+
+	/* Locking */
+	u32 lock_magic;   /* 0x4B414C34 =3D 'KAL4' */
+	u32 lock_count;
+	struct kapi_lock_spec locks[KAPI_MAX_CONSTRAINTS];
+
+	/* Constraints */
+	u32 constraint_magic; /* 0x4B414335 =3D 'KAC5' */
+	u32 constraint_count;
+	struct kapi_constraint_spec constraints[KAPI_MAX_CONSTRAINTS];
+
+	/* Additional information */
+	u32 info_magic;   /* 0x4B414936 =3D 'KAI6' */
+	char examples[KAPI_MAX_DESC_LEN * 2];
+	char notes[KAPI_MAX_DESC_LEN * 2];
+	char since_version[32];
+
+	/* Signal specifications */
+	u32 signal_magic; /* 0x4B415337 =3D 'KAS7' */
+	u32 signal_count;
+	struct kapi_signal_spec signals[KAPI_MAX_SIGNALS];
+
+	/* Signal mask specifications */
+	u32 sigmask_magic; /* 0x4B414D38 =3D 'KAM8' */
+	u32 signal_mask_count;
+	struct kapi_signal_mask_spec signal_masks[KAPI_MAX_SIGNALS];
+
+	/* Structure specifications */
+	u32 struct_magic; /* 0x4B415439 =3D 'KAT9' */
+	u32 struct_spec_count;
+	struct kapi_struct_spec struct_specs[KAPI_MAX_STRUCT_SPECS];
+
+	/* Side effects */
+	u32 effect_magic; /* 0x4B414641 =3D 'KAFA' */
+	u32 side_effect_count;
+	struct kapi_side_effect side_effects[KAPI_MAX_SIDE_EFFECTS];
+
+	/* State transitions */
+	u32 trans_magic;  /* 0x4B415442 =3D 'KATB' */
+	u32 state_trans_count;
+	struct kapi_state_transition state_transitions[KAPI_MAX_STATE_TRANS];
+
+	/* Capability specifications */
+	u32 cap_magic;    /* 0x4B414343 =3D 'KACC' */
+	u32 capability_count;
+	struct kapi_capability_spec capabilities[KAPI_MAX_CAPABILITIES];
+
+	/* Extended fields for socket and network operations */
+	struct kapi_socket_state_spec socket_state;
+	struct kapi_protocol_behavior protocol_behaviors[KAPI_MAX_PROTOCOL_BEHAVI=
ORS];
+	u32 protocol_behavior_count;
+	struct kapi_buffer_spec buffer_spec;
+	struct kapi_async_spec async_spec;
+	struct kapi_addr_family_spec addr_families[KAPI_MAX_ADDR_FAMILIES];
+	u32 addr_family_count;
+
+	/* Operation characteristics */
+	bool is_connection_oriented;
+	bool is_message_oriented;
+	bool supports_oob_data;
+	bool supports_peek;
+	bool supports_select_poll;
+	bool is_reentrant;
+
+	/* Semantic descriptions */
+	char connection_establishment[KAPI_MAX_DESC_LEN];
+	char connection_termination[KAPI_MAX_DESC_LEN];
+	char data_transfer_semantics[KAPI_MAX_DESC_LEN];
+} __attribute__((packed));
+
+/* Macros for defining API specifications */
+
+/**
+ * DEFINE_KERNEL_API_SPEC - Define a kernel API specification
+ * @func_name: Function name to specify
+ */
+#define DEFINE_KERNEL_API_SPEC(func_name) \
+	static struct kernel_api_spec __kapi_spec_##func_name \
+	__used __section(".kapi_specs") =3D {	\
+		.name =3D __stringify(func_name),	\
+		.version =3D 1,
+
+#define KAPI_END_SPEC };
+
+/**
+ * KAPI_DESCRIPTION - Set API description
+ * @desc: Description string
+ */
+#define KAPI_DESCRIPTION(desc) \
+	.description =3D desc,
+
+/**
+ * KAPI_LONG_DESC - Set detailed API description
+ * @desc: Detailed description string
+ */
+#define KAPI_LONG_DESC(desc) \
+	.long_description =3D desc,
+
+/**
+ * KAPI_CONTEXT - Set execution context flags
+ * @flags: Context flags (OR'ed KAPI_CTX_* values)
+ */
+#define KAPI_CONTEXT(flags) \
+	.context_flags =3D flags,
+
+/**
+ * KAPI_PARAM - Define a parameter specification
+ * @idx: Parameter index (0-based)
+ * @pname: Parameter name
+ * @ptype: Type name string
+ * @pdesc: Parameter description
+ */
+#define KAPI_PARAM(idx, pname, ptype, pdesc) \
+	.params[idx] =3D {			\
+		.name =3D pname,			\
+		.type_name =3D ptype,		\
+		.description =3D pdesc,		\
+		.size_param_idx =3D -1,		/* Default: no dynamic sizing */
+
+#define KAPI_PARAM_TYPE(ptype) \
+		.type =3D ptype,
+
+#define KAPI_PARAM_FLAGS(pflags) \
+		.flags =3D pflags,
+
+#define KAPI_PARAM_SIZE(psize) \
+		.size =3D psize,
+
+#define KAPI_PARAM_RANGE(pmin, pmax) \
+		.min_value =3D pmin,	\
+		.max_value =3D pmax,
+
+#define KAPI_PARAM_CONSTRAINT_TYPE(ctype) \
+		.constraint_type =3D ctype,
+
+#define KAPI_PARAM_CONSTRAINT(desc) \
+		.constraints =3D desc,
+
+#define KAPI_PARAM_VALID_MASK(mask) \
+		.valid_mask =3D mask,
+
+#define KAPI_PARAM_ENUM_VALUES(values) \
+		.enum_values =3D values, \
+		.enum_count =3D ARRAY_SIZE(values),
+
+#define KAPI_PARAM_ALIGNMENT(align) \
+		.alignment =3D align,
+
+#define KAPI_PARAM_SIZE_PARAM(idx) \
+		.size_param_idx =3D idx,
+
+#define KAPI_PARAM_END },
+
+/**
+ * KAPI_PARAM_COUNT - Set the number of parameters
+ * @n: Number of parameters
+ */
+#define KAPI_PARAM_COUNT(n) \
+	.param_magic =3D KAPI_MAGIC_PARAMS, \
+	.param_count =3D n,
+
+/**
+ * KAPI_RETURN - Define return value specification
+ * @rtype: Return type name
+ * @rdesc: Return value description
+ */
+#define KAPI_RETURN(rtype, rdesc) \
+	.return_spec =3D {		\
+		.type_name =3D rtype,	\
+		.description =3D rdesc,
+
+#define KAPI_RETURN_SUCCESS(val) \
+		.success_value =3D val,
+
+#define KAPI_RETURN_TYPE(rtype) \
+		.type =3D rtype,
+
+#define KAPI_RETURN_CHECK_TYPE(ctype) \
+		.check_type =3D ctype,
+
+#define KAPI_RETURN_ERROR_VALUES(values) \
+		.error_values =3D values,
+
+#define KAPI_RETURN_ERROR_COUNT(count) \
+		.error_count =3D count,
+
+#define KAPI_RETURN_SUCCESS_RANGE(min, max) \
+		.success_min =3D min, \
+		.success_max =3D max,
+
+#define KAPI_RETURN_END },
+
+/**
+ * KAPI_ERROR - Define an error condition
+ * @idx: Error index
+ * @ecode: Error code value
+ * @ename: Error name
+ * @econd: Error condition
+ * @edesc: Error description
+ */
+#define KAPI_ERROR(idx, ecode, ename, econd, edesc) \
+	.errors[idx] =3D {			\
+		.error_code =3D ecode,		\
+		.name =3D ename,			\
+		.condition =3D econd,		\
+		.description =3D edesc,		\
+	},
+
+/**
+ * KAPI_ERROR_COUNT - Set the number of errors
+ * @n: Number of errors
+ */
+#define KAPI_ERROR_COUNT(n) \
+	.error_magic =3D KAPI_MAGIC_ERRORS, \
+	.error_count =3D n,
+
+/**
+ * KAPI_LOCK - Define a lock requirement
+ * @idx: Lock index
+ * @lname: Lock name
+ * @ltype: Lock type
+ */
+#define KAPI_LOCK(idx, lname, ltype) \
+	.locks[idx] =3D {			\
+		.lock_name =3D lname,	\
+		.lock_type =3D ltype,
+
+#define KAPI_LOCK_ACQUIRED \
+		.acquired =3D true,
+
+#define KAPI_LOCK_RELEASED \
+		.released =3D true,
+
+#define KAPI_LOCK_HELD_ENTRY \
+		.held_on_entry =3D true,
+
+#define KAPI_LOCK_HELD_EXIT \
+		.held_on_exit =3D true,
+
+#define KAPI_LOCK_DESC(ldesc) \
+		.description =3D ldesc,
+
+#define KAPI_LOCK_END },
+
+/**
+ * KAPI_CONSTRAINT - Define an additional constraint
+ * @idx: Constraint index
+ * @cname: Constraint name
+ * @cdesc: Constraint description
+ */
+#define KAPI_CONSTRAINT(idx, cname, cdesc) \
+	.constraints[idx] =3D {		\
+		.name =3D cname,		\
+		.description =3D cdesc,
+
+#define KAPI_CONSTRAINT_EXPR(expr) \
+		.expression =3D expr,
+
+#define KAPI_CONSTRAINT_END },
+
+/**
+ * KAPI_EXAMPLES - Set API usage examples
+ * @examples: Examples string
+ */
+#define KAPI_EXAMPLES(ex) \
+	.info_magic =3D KAPI_MAGIC_INFO, \
+	.examples =3D ex,
+
+/**
+ * KAPI_NOTES - Set API notes
+ * @notes: Notes string
+ */
+#define KAPI_NOTES(n) \
+	.notes =3D n,
+
+
+/**
+ * KAPI_SIGNAL - Define a signal specification
+ * @idx: Signal index
+ * @signum: Signal number (e.g., SIGKILL)
+ * @signame: Signal name string
+ * @dir: Direction flags
+ * @act: Action taken
+ */
+#define KAPI_SIGNAL(idx, signum, signame, dir, act) \
+	.signals[idx] =3D {			\
+		.signal_num =3D signum,		\
+		.signal_name =3D signame,		\
+		.direction =3D dir,		\
+		.action =3D act,
+
+#define KAPI_SIGNAL_TARGET(tgt) \
+		.target =3D tgt,
+
+#define KAPI_SIGNAL_CONDITION(cond) \
+		.condition =3D cond,
+
+#define KAPI_SIGNAL_DESC(desc) \
+		.description =3D desc,
+
+#define KAPI_SIGNAL_RESTARTABLE \
+		.restartable =3D true,
+
+#define KAPI_SIGNAL_SA_FLAGS_REQ(flags) \
+		.sa_flags_required =3D flags,
+
+#define KAPI_SIGNAL_SA_FLAGS_FORBID(flags) \
+		.sa_flags_forbidden =3D flags,
+
+#define KAPI_SIGNAL_ERROR(err) \
+		.error_on_signal =3D err,
+
+#define KAPI_SIGNAL_TRANSFORM(sig) \
+		.transform_to =3D sig,
+
+#define KAPI_SIGNAL_TIMING(when) \
+		.timing =3D when,
+
+#define KAPI_SIGNAL_PRIORITY(prio) \
+		.priority =3D prio,
+
+#define KAPI_SIGNAL_INTERRUPTIBLE \
+		.interruptible =3D true,
+
+#define KAPI_SIGNAL_QUEUE(behavior) \
+		.queue_behavior =3D behavior,
+
+#define KAPI_SIGNAL_STATE_REQ(state) \
+		.state_required =3D state,
+
+#define KAPI_SIGNAL_STATE_FORBID(state) \
+		.state_forbidden =3D state,
+
+#define KAPI_SIGNAL_END },
+
+#define KAPI_SIGNAL_COUNT(n) \
+	.signal_magic =3D KAPI_MAGIC_SIGNALS, \
+	.signal_count =3D n,
+
+/**
+ * KAPI_SIGNAL_MASK - Define a signal mask specification
+ * @idx: Mask index
+ * @name: Mask name
+ * @desc: Mask description
+ */
+#define KAPI_SIGNAL_MASK(idx, name, desc) \
+	.signal_masks[idx] =3D {		\
+		.mask_name =3D name,	\
+		.description =3D desc,
+
+/*
+ * KAPI_SIGNAL_MASK_SIGNALS - Specify signals in a signal mask
+ * @...: Variadic list of signal numbers
+ *
+ * Usage:
+ *   KAPI_SIGNAL_MASK(0, "blocked", "Signals blocked during operation")
+ *   KAPI_SIGNAL_MASK_SIGNALS(SIGINT, SIGTERM, SIGQUIT)
+ *   KAPI_SIGNAL_MASK_END
+ */
+#define KAPI_SIGNAL_MASK_SIGNALS(...) \
+		.signals =3D { __VA_ARGS__ }, \
+		.signal_count =3D sizeof((int[]){ __VA_ARGS__ }) / sizeof(int),
+
+#define KAPI_SIGNAL_MASK_END },
+
+/**
+ * KAPI_STRUCT_SPEC - Define a structure specification
+ * @idx: Structure spec index
+ * @sname: Structure name
+ * @sdesc: Structure description
+ */
+#define KAPI_STRUCT_SPEC(idx, sname, sdesc) \
+	.struct_specs[idx] =3D {		\
+		.name =3D #sname,		\
+		.description =3D sdesc,
+
+#define KAPI_STRUCT_SIZE(ssize, salign) \
+		.size =3D ssize,		\
+		.alignment =3D salign,
+
+#define KAPI_STRUCT_FIELD_COUNT(n) \
+		.field_count =3D n,
+
+/**
+ * KAPI_STRUCT_FIELD - Define a structure field
+ * @fidx: Field index
+ * @fname: Field name
+ * @ftype: Field type (KAPI_TYPE_*)
+ * @ftype_name: Type name as string
+ * @fdesc: Field description
+ */
+#define KAPI_STRUCT_FIELD(fidx, fname, ftype, ftype_name, fdesc) \
+		.fields[fidx] =3D {	\
+			.name =3D fname,	\
+			.type =3D ftype,	\
+			.type_name =3D ftype_name, \
+			.description =3D fdesc,
+
+#define KAPI_FIELD_OFFSET(foffset) \
+			.offset =3D foffset,
+
+#define KAPI_FIELD_SIZE(fsize) \
+			.size =3D fsize,
+
+#define KAPI_FIELD_FLAGS(fflags) \
+			.flags =3D fflags,
+
+#define KAPI_FIELD_CONSTRAINT_RANGE(min, max) \
+			.constraint_type =3D KAPI_CONSTRAINT_RANGE, \
+			.min_value =3D min, \
+			.max_value =3D max,
+
+#define KAPI_FIELD_CONSTRAINT_MASK(mask) \
+			.constraint_type =3D KAPI_CONSTRAINT_MASK, \
+			.valid_mask =3D mask,
+
+#define KAPI_FIELD_CONSTRAINT_ENUM(values) \
+			.constraint_type =3D KAPI_CONSTRAINT_ENUM, \
+			.enum_values =3D values,
+
+#define KAPI_STRUCT_FIELD_END },
+
+#define KAPI_STRUCT_SPEC_END },
+
+/* Counter for structure specifications */
+#define KAPI_STRUCT_SPEC_COUNT(n) \
+	.struct_magic =3D KAPI_MAGIC_STRUCTS, \
+	.struct_spec_count =3D n,
+
+/* Additional lock-related macros */
+#define KAPI_LOCK_COUNT(n) \
+	.lock_magic =3D KAPI_MAGIC_LOCKS, \
+	.lock_count =3D n,
+
+/**
+ * KAPI_SIDE_EFFECT - Define a side effect
+ * @idx: Side effect index
+ * @etype: Effect type bitmask (OR'ed KAPI_EFFECT_* values)
+ * @etarget: What is affected
+ * @edesc: Effect description
+ */
+#define KAPI_SIDE_EFFECT(idx, etype, etarget, edesc) \
+	.side_effects[idx] =3D {		\
+		.type =3D etype,		\
+		.target =3D etarget,	\
+		.description =3D edesc,	\
+		.reversible =3D false,	/* Default to non-reversible */
+
+#define KAPI_EFFECT_CONDITION(cond) \
+		.condition =3D cond,
+
+#define KAPI_EFFECT_REVERSIBLE \
+		.reversible =3D true,
+
+#define KAPI_SIDE_EFFECT_END },
+
+/**
+ * KAPI_STATE_TRANS - Define a state transition
+ * @idx: State transition index
+ * @obj: Object whose state changes
+ * @from: From state
+ * @to: To state
+ * @desc: Transition description
+ */
+#define KAPI_STATE_TRANS(idx, obj, from, to, desc) \
+	.state_transitions[idx] =3D {	\
+		.object =3D obj,		\
+		.from_state =3D from,	\
+		.to_state =3D to,		\
+		.description =3D desc,
+
+#define KAPI_STATE_TRANS_COND(cond) \
+		.condition =3D cond,
+
+#define KAPI_STATE_TRANS_END },
+
+/* Counters for side effects and state transitions */
+#define KAPI_SIDE_EFFECT_COUNT(n) \
+	.effect_magic =3D KAPI_MAGIC_EFFECTS, \
+	.side_effect_count =3D n,
+
+#define KAPI_STATE_TRANS_COUNT(n) \
+	.trans_magic =3D KAPI_MAGIC_TRANS, \
+	.state_trans_count =3D n,
+
+/* Helper macros for common side effect patterns */
+#define KAPI_EFFECTS_MEMORY	(KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_FREE_M=
EMORY)
+#define KAPI_EFFECTS_LOCKING	(KAPI_EFFECT_LOCK_ACQUIRE | KAPI_EFFECT_LOCK_=
RELEASE)
+#define KAPI_EFFECTS_RESOURCES	(KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_=
RESOURCE_DESTROY)
+#define KAPI_EFFECTS_IO		(KAPI_EFFECT_NETWORK | KAPI_EFFECT_FILESYSTEM)
+
+/*
+ * Helper macros for combining common parameter flag patterns.
+ * Note: KAPI_PARAM_IN, KAPI_PARAM_OUT, KAPI_PARAM_INOUT, and KAPI_PARAM_O=
PTIONAL
+ * are already defined in enum kapi_param_flags - use those directly.
+ */
+#define KAPI_PARAM_FLAGS_INOUT	(KAPI_PARAM_IN | KAPI_PARAM_OUT)
+#define KAPI_PARAM_FLAGS_USER	(KAPI_PARAM_USER | KAPI_PARAM_IN)
+
+/* Common signal timing constants */
+#define KAPI_SIGNAL_TIME_ENTRY		"entry"
+#define KAPI_SIGNAL_TIME_DURING		"during"
+#define KAPI_SIGNAL_TIME_EXIT		"exit"
+#define KAPI_SIGNAL_TIME_ANYTIME	"anytime"
+#define KAPI_SIGNAL_TIME_BLOCKING	"while_blocked"
+#define KAPI_SIGNAL_TIME_SLEEPING	"while_sleeping"
+#define KAPI_SIGNAL_TIME_BEFORE		"before"
+#define KAPI_SIGNAL_TIME_AFTER		"after"
+
+/* Common signal queue behaviors */
+#define KAPI_SIGNAL_QUEUE_STANDARD	"standard"
+#define KAPI_SIGNAL_QUEUE_REALTIME	"realtime"
+#define KAPI_SIGNAL_QUEUE_COALESCE	"coalesce"
+#define KAPI_SIGNAL_QUEUE_REPLACE	"replace"
+#define KAPI_SIGNAL_QUEUE_DISCARD	"discard"
+
+/* Process state flags for signal delivery */
+#define KAPI_SIGNAL_STATE_RUNNING	(1 << 0)
+#define KAPI_SIGNAL_STATE_SLEEPING	(1 << 1)
+#define KAPI_SIGNAL_STATE_STOPPED	(1 << 2)
+#define KAPI_SIGNAL_STATE_TRACED	(1 << 3)
+#define KAPI_SIGNAL_STATE_ZOMBIE	(1 << 4)
+#define KAPI_SIGNAL_STATE_DEAD		(1 << 5)
+
+/* Capability specification macros */
+
+/**
+ * KAPI_CAPABILITY - Define a capability requirement
+ * @idx: Capability index
+ * @cap: Capability constant (e.g., CAP_IPC_LOCK)
+ * @name: Capability name string
+ * @act: Action type (kapi_capability_action)
+ */
+#define KAPI_CAPABILITY(idx, cap, name, act) \
+	.capabilities[idx] =3D {		\
+		.capability =3D cap,	\
+		.cap_name =3D name,	\
+		.action =3D act,
+
+#define KAPI_CAP_ALLOWS(desc) \
+		.allows =3D desc,
+
+#define KAPI_CAP_WITHOUT(desc) \
+		.without_cap =3D desc,
+
+#define KAPI_CAP_CONDITION(cond) \
+		.check_condition =3D cond,
+
+#define KAPI_CAP_PRIORITY(prio) \
+		.priority =3D prio,
+
+#define KAPI_CAP_ALTERNATIVE(caps, count) \
+		.alternative =3D caps,	\
+		.alternative_count =3D count,
+
+#define KAPI_CAPABILITY_END },
+
+/* Counter for capability specifications */
+#define KAPI_CAPABILITY_COUNT(n) \
+	.cap_magic =3D KAPI_MAGIC_CAPS, \
+	.capability_count =3D n,
+
+/* Common signal patterns for syscalls */
+#define KAPI_SIGNAL_INTERRUPTIBLE_SLEEP \
+	KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_=
RETURN) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+		KAPI_SIGNAL_ERROR(-EINTR) \
+		KAPI_SIGNAL_RESTARTABLE \
+		KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+	KAPI_SIGNAL_END, \
+	KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO=
N_RETURN) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \
+		KAPI_SIGNAL_ERROR(-EINTR) \
+		KAPI_SIGNAL_RESTARTABLE \
+		KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \
+	KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_FATAL_DEFAULT \
+	KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO=
N_TERMINATE) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+		KAPI_SIGNAL_PRIORITY(0) \
+		KAPI_SIGNAL_DESC("Process terminated immediately") \
+	KAPI_SIGNAL_END
+
+#define KAPI_SIGNAL_STOP_CONT \
+	KAPI_SIGNAL(3, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO=
N_STOP) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+		KAPI_SIGNAL_DESC("Process stopped") \
+	KAPI_SIGNAL_END, \
+	KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO=
N_CONTINUE) \
+		KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \
+		KAPI_SIGNAL_DESC("Process continued") \
+	KAPI_SIGNAL_END
+
+/* Validation and runtime checking */
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+bool kapi_validate_params(const struct kernel_api_spec *spec, ...);
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 val=
ue);
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_=
spec,
+				       s64 value, const s64 *all_params, int param_count);
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+				int param_idx, s64 value);
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+				 const s64 *params, int param_count);
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec,=
 s64 retval);
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 re=
tval);
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 r=
etval);
+void kapi_check_context(const struct kernel_api_spec *spec);
+void kapi_check_locks(const struct kernel_api_spec *spec);
+bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int sig=
num);
+bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int s=
ignum,
+				 struct sigaction *act);
+int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum);
+bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int si=
gnum);
+#else
+static inline bool kapi_validate_params(const struct kernel_api_spec *spec=
, ...)
+{
+	return true;
+}
+static inline bool kapi_validate_param(const struct kapi_param_spec *param=
_spec, s64 value)
+{
+	return true;
+}
+static inline bool kapi_validate_param_with_context(const struct kapi_para=
m_spec *param_spec,
+						     s64 value, const s64 *all_params, int param_count)
+{
+	return true;
+}
+static inline int kapi_validate_syscall_param(const struct kernel_api_spec=
 *spec,
+					       int param_idx, s64 value)
+{
+	return 0;
+}
+static inline int kapi_validate_syscall_params(const struct kernel_api_spe=
c *spec,
+					       const s64 *params, int param_count)
+{
+	return 0;
+}
+static inline bool kapi_check_return_success(const struct kapi_return_spec=
 *return_spec, s64 retval)
+{
+	return true;
+}
+static inline bool kapi_validate_return_value(const struct kernel_api_spec=
 *spec, s64 retval)
+{
+	return true;
+}
+static inline int kapi_validate_syscall_return(const struct kernel_api_spe=
c *spec, s64 retval)
+{
+	return 0;
+}
+static inline void kapi_check_context(const struct kernel_api_spec *spec) =
{}
+static inline void kapi_check_locks(const struct kernel_api_spec *spec) {}
+static inline bool kapi_check_signal_allowed(const struct kernel_api_spec =
*spec, int signum)
+{
+	return true;
+}
+static inline bool kapi_validate_signal_action(const struct kernel_api_spe=
c *spec, int signum,
+					       struct sigaction *act)
+{
+	return true;
+}
+static inline int kapi_get_signal_error(const struct kernel_api_spec *spec=
, int signum)
+{
+	return -EINTR;
+}
+static inline bool kapi_is_signal_restartable(const struct kernel_api_spec=
 *spec, int signum)
+{
+	return false;
+}
+#endif
+
+/* Export/query functions */
+const struct kernel_api_spec *kapi_get_spec(const char *name);
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t=
 size);
+void kapi_print_spec(const struct kernel_api_spec *spec);
+
+/* Registration for dynamic APIs */
+int kapi_register_spec(struct kernel_api_spec *spec);
+void kapi_unregister_spec(const char *name);
+
+/* Helper to get parameter constraint info */
+static inline bool kapi_get_param_constraint(const char *api_name, int par=
am_idx,
+					      enum kapi_constraint_type *type,
+					      u64 *valid_mask, s64 *min_val, s64 *max_val)
+{
+	const struct kernel_api_spec *spec =3D kapi_get_spec(api_name);
+
+	if (!spec || param_idx >=3D spec->param_count)
+		return false;
+
+	if (type)
+		*type =3D spec->params[param_idx].constraint_type;
+	if (valid_mask)
+		*valid_mask =3D spec->params[param_idx].valid_mask;
+	if (min_val)
+		*min_val =3D spec->params[param_idx].min_value;
+	if (max_val)
+		*max_val =3D spec->params[param_idx].max_value;
+
+	return true;
+}
+
+/* Socket state requirement macros */
+#define KAPI_SOCKET_STATE_REQ(...) \
+	.socket_state =3D { \
+		.required_states =3D { __VA_ARGS__ }, \
+		.required_state_count =3D sizeof((enum kapi_socket_state[]){__VA_ARGS__}=
)/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_FORBID(...) \
+		.forbidden_states =3D { __VA_ARGS__ }, \
+		.forbidden_state_count =3D sizeof((enum kapi_socket_state[]){__VA_ARGS__=
})/sizeof(enum kapi_socket_state),
+
+#define KAPI_SOCKET_STATE_RESULT(state) \
+		.resulting_state =3D state,
+
+#define KAPI_SOCKET_STATE_COND(cond) \
+		.state_condition =3D cond,
+
+#define KAPI_SOCKET_STATE_PROTOS(protos) \
+		.applicable_protocols =3D protos,
+
+#define KAPI_SOCKET_STATE_END },
+
+/* Protocol behavior macros */
+#define KAPI_PROTOCOL_BEHAVIOR(idx, protos, desc) \
+	.protocol_behaviors[idx] =3D { \
+		.applicable_protocols =3D protos, \
+		.behavior =3D desc,
+
+#define KAPI_PROTOCOL_FLAGS(flags, desc) \
+		.protocol_flags =3D flags, \
+		.flag_description =3D desc,
+
+#define KAPI_PROTOCOL_BEHAVIOR_END },
+
+/* Async behavior macros */
+#define KAPI_ASYNC_SPEC(modes, errno) \
+	.async_spec =3D { \
+		.supported_modes =3D modes, \
+		.nonblock_errno =3D errno,
+
+#define KAPI_ASYNC_POLL(in, out) \
+		.poll_events_in =3D in, \
+		.poll_events_out =3D out,
+
+#define KAPI_ASYNC_COMPLETION(cond) \
+		.completion_condition =3D cond,
+
+#define KAPI_ASYNC_TIMEOUT(supported, desc) \
+		.supports_timeout =3D supported, \
+		.timeout_behavior =3D desc,
+
+#define KAPI_ASYNC_END },
+
+/* Buffer behavior macros */
+#define KAPI_BUFFER_SPEC(behaviors) \
+	.buffer_spec =3D { \
+		.buffer_behaviors =3D behaviors,
+
+#define KAPI_BUFFER_SIZE(min, max, optimal) \
+		.min_buffer_size =3D min, \
+		.max_buffer_size =3D max, \
+		.optimal_buffer_size =3D optimal,
+
+#define KAPI_BUFFER_PARTIAL(allowed, rules) \
+		.can_partial_transfer =3D allowed, \
+		.partial_transfer_rules =3D rules,
+
+#define KAPI_BUFFER_FRAGMENT(rules) \
+		.fragmentation_rules =3D rules,
+
+#define KAPI_BUFFER_END },
+
+/* Address family macros */
+#define KAPI_ADDR_FAMILY(idx, fam, name, struct_sz, min_len, max_len) \
+	.addr_families[idx] =3D { \
+		.family =3D fam, \
+		.family_name =3D name, \
+		.addr_struct_size =3D struct_sz, \
+		.min_addr_len =3D min_len, \
+		.max_addr_len =3D max_len,
+
+#define KAPI_ADDR_FORMAT(fmt) \
+		.addr_format =3D fmt,
+
+#define KAPI_ADDR_FEATURES(wildcard, multicast, broadcast) \
+		.supports_wildcard =3D wildcard, \
+		.supports_multicast =3D multicast, \
+		.supports_broadcast =3D broadcast,
+
+#define KAPI_ADDR_SPECIAL(addrs) \
+		.special_addresses =3D addrs,
+
+#define KAPI_ADDR_PORTS(min, max) \
+		.port_range_min =3D min, \
+		.port_range_max =3D max,
+
+#define KAPI_ADDR_FAMILY_END },
+
+#define KAPI_ADDR_FAMILY_COUNT(n) \
+	.addr_family_count =3D n,
+
+#define KAPI_PROTOCOL_BEHAVIOR_COUNT(n) \
+	.protocol_behavior_count =3D n,
+
+#define KAPI_CONSTRAINT_COUNT(n) \
+	.constraint_magic =3D KAPI_MAGIC_CONSTRAINTS, \
+	.constraint_count =3D n,
+
+/* Network operation characteristics macros */
+#define KAPI_NET_CONNECTION_ORIENTED \
+	.is_connection_oriented =3D true,
+
+#define KAPI_NET_MESSAGE_ORIENTED \
+	.is_message_oriented =3D true,
+
+#define KAPI_NET_SUPPORTS_OOB \
+	.supports_oob_data =3D true,
+
+#define KAPI_NET_SUPPORTS_PEEK \
+	.supports_peek =3D true,
+
+#define KAPI_NET_REENTRANT \
+	.is_reentrant =3D true,
+
+/* Semantic description macros */
+#define KAPI_NET_CONN_ESTABLISH(desc) \
+	.connection_establishment =3D desc,
+
+#define KAPI_NET_CONN_TERMINATE(desc) \
+	.connection_termination =3D desc,
+
+#define KAPI_NET_DATA_TRANSFER(desc) \
+	.data_transfer_semantics =3D desc,
+
+#endif /* _LINUX_KERNEL_API_SPEC_H */
diff --git a/include/linux/syscall_api_spec.h b/include/linux/syscall_api_s=
pec.h
new file mode 100644
index 0000000000000..b7f9ba0f978ab
--- /dev/null
+++ b/include/linux/syscall_api_spec.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_api_spec.h - System Call API Specification Integration
+ *
+ * This header extends the SYSCALL_DEFINEX macros to support inline API sp=
ecifications,
+ * allowing syscall documentation to be written alongside the implementati=
on in a
+ * human-readable and machine-parseable format.
+ */
+
+#ifndef _LINUX_SYSCALL_API_SPEC_H
+#define _LINUX_SYSCALL_API_SPEC_H
+
+#include <linux/kernel_api_spec.h>
+
+
+
+/* Automatic syscall validation infrastructure */
+/*
+ * The validation is now integrated directly into the SYSCALL_DEFINEx macr=
os
+ * in syscalls.h when CONFIG_KAPI_RUNTIME_CHECKS is enabled.
+ *
+ * The validation happens in the __do_kapi_sys##name wrapper function whic=
h:
+ * 1. Validates all parameters before calling the actual syscall
+ * 2. Calls the real syscall implementation
+ * 3. Validates the return value
+ * 4. Returns the result
+ */
+
+
+/*
+ * Helper macros for common syscall patterns
+ */
+
+/* For syscalls that can sleep */
+#define KAPI_SYSCALL_SLEEPABLE \
+	KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE)
+
+/* For syscalls that must be atomic */
+#define KAPI_SYSCALL_ATOMIC \
+	KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC)
+
+/* Common parameter specifications */
+#define KAPI_PARAM_FD(idx, desc) \
+	KAPI_PARAM(idx, "fd", "int", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+		.type =3D KAPI_TYPE_FD, \
+		.constraint_type =3D KAPI_CONSTRAINT_NONE, \
+	KAPI_PARAM_END
+
+#define KAPI_PARAM_USER_BUF(idx, name, desc) \
+	KAPI_PARAM(idx, name, "void __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_STRUCT - Define a userspace struct pointer parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @struct_type: The struct type (e.g., struct iocb)
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a struct.
+ * The pointer will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The memory region of sizeof(struct_type) bytes is accessible
+ */
+#define KAPI_PARAM_USER_STRUCT(idx, name, struct_type, desc) \
+	KAPI_PARAM(idx, name, #struct_type " __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type =3D KAPI_TYPE_USER_PTR, \
+		.size =3D sizeof(struct_type), \
+		.constraint_type =3D KAPI_CONSTRAINT_USER_PTR, \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_PTR_SIZED - Define a userspace pointer with explicit si=
ze
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @ptr_size: Size in bytes of the memory region
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a memory
+ * region of a specific size. The pointer will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The memory region of ptr_size bytes is accessible
+ */
+#define KAPI_PARAM_USER_PTR_SIZED(idx, name, ptr_size, desc) \
+	KAPI_PARAM(idx, name, "void __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type =3D KAPI_TYPE_USER_PTR, \
+		.size =3D ptr_size, \
+		.constraint_type =3D KAPI_CONSTRAINT_USER_PTR, \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_STRING - Define a userspace null-terminated string para=
meter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @min_len: Minimum string length (excluding null terminator)
+ * @max_len: Maximum string length (excluding null terminator)
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a
+ * null-terminated string. The string will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The string length (excluding null terminator) is within [min_len, max=
_len]
+ */
+#define KAPI_PARAM_USER_STRING(idx, name, min_len, max_len, desc) \
+	KAPI_PARAM(idx, name, "const char __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type =3D KAPI_TYPE_USER_PTR, \
+		.constraint_type =3D KAPI_CONSTRAINT_USER_STRING, \
+		.min_value =3D min_len, \
+		.max_value =3D max_len, \
+	KAPI_PARAM_END
+
+/**
+ * KAPI_PARAM_USER_PATH - Define a userspace pathname parameter
+ * @idx: Parameter index (0-based)
+ * @name: Parameter name
+ * @desc: Parameter description
+ *
+ * This macro defines a parameter that is a userspace pointer to a
+ * null-terminated pathname string. The path will be validated to ensure:
+ * - The pointer is accessible in userspace
+ * - The path is a valid null-terminated string
+ * - The path length does not exceed PATH_MAX (4096 bytes)
+ */
+#define KAPI_PARAM_USER_PATH(idx, name, desc) \
+	KAPI_PARAM(idx, name, "const char __user *", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \
+		.type =3D KAPI_TYPE_PATH, \
+		.constraint_type =3D KAPI_CONSTRAINT_USER_PATH, \
+	KAPI_PARAM_END
+
+#define KAPI_PARAM_SIZE_T(idx, name, desc) \
+	KAPI_PARAM(idx, name, "size_t", desc) \
+		KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \
+		KAPI_PARAM_RANGE(0, SIZE_MAX) \
+	KAPI_PARAM_END
+
+/* Common error specifications */
+#define KAPI_ERROR_EBADF(idx) \
+	KAPI_ERROR(idx, -EBADF, "EBADF", "Invalid file descriptor", \
+		   "The file descriptor is not valid or has been closed")
+
+#define KAPI_ERROR_EINVAL(idx, condition) \
+	KAPI_ERROR(idx, -EINVAL, "EINVAL", condition, \
+		   "Invalid argument provided")
+
+#define KAPI_ERROR_ENOMEM(idx) \
+	KAPI_ERROR(idx, -ENOMEM, "ENOMEM", "Insufficient memory", \
+		   "Cannot allocate memory for the operation")
+
+#define KAPI_ERROR_EPERM(idx) \
+	KAPI_ERROR(idx, -EPERM, "EPERM", "Operation not permitted", \
+		   "The calling process does not have the required permissions")
+
+#define KAPI_ERROR_EFAULT(idx) \
+	KAPI_ERROR(idx, -EFAULT, "EFAULT", "Bad address", \
+		   "Invalid user space address provided")
+
+/* Standard return value specifications */
+#define KAPI_RETURN_SUCCESS_ZERO \
+	KAPI_RETURN("long", "0 on success, negative error code on failure") \
+		KAPI_RETURN_SUCCESS(0, "=3D=3D 0") \
+	KAPI_RETURN_END
+
+#define KAPI_RETURN_FD_SPEC \
+	KAPI_RETURN("long", "File descriptor on success, negative error code on f=
ailure") \
+		.check_type =3D KAPI_RETURN_FD, \
+	KAPI_RETURN_END
+
+#define KAPI_RETURN_COUNT \
+	KAPI_RETURN("long", "Number of bytes processed on success, negative error=
 code on failure") \
+		KAPI_RETURN_SUCCESS(0, ">=3D 0") \
+	KAPI_RETURN_END
+
+/* KAPI_ERROR_COUNT and KAPI_PARAM_COUNT are now defined in kernel_api_spe=
c.h */
+
+/**
+ * KAPI_SINCE_VERSION - Set the since version
+ * @version: Version string when the API was introduced
+ */
+#define KAPI_SINCE_VERSION(version) \
+	.since_version =3D version,
+
+
+/**
+ * KAPI_SIGNAL_MASK_COUNT - Set the signal mask count
+ * @count: Number of signal masks defined
+ */
+#define KAPI_SIGNAL_MASK_COUNT(count) \
+	.signal_mask_count =3D count,
+
+
+
+#endif /* _LINUX_SYSCALL_API_SPEC_H */
\ No newline at end of file
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index cf84d98964b2f..eafda2f509999 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -89,6 +89,7 @@ struct file_attr;
 #include <linux/bug.h>
 #include <linux/sem.h>
 #include <asm/siginfo.h>
+#include <linux/syscall_api_spec.h>
 #include <linux/unistd.h>
 #include <linux/quota.h>
 #include <linux/key.h>
@@ -134,6 +135,7 @@ struct file_attr;
 #define __SC_TYPE(t, a)	t
 #define __SC_ARGS(t, a)	a
 #define __SC_TEST(t, a) (void)BUILD_BUG_ON_ZERO(!__TYPE_IS_LL(t) && sizeof=
(t) > sizeof(long))
+#define __SC_CAST_TO_S64(t, a)	(s64)(a)
=20
 #ifdef CONFIG_FTRACE_SYSCALLS
 #define __SC_STR_ADECL(t, a)	#a
@@ -244,6 +246,41 @@ static inline int is_syscall_trace_event(struct trace_=
event_call *tp_event)
  * done within __do_sys_*().
  */
 #ifndef __SYSCALL_DEFINEx
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+#define __SYSCALL_DEFINEx(x, name, ...)					\
+	__diag_push();							\
+	__diag_ignore(GCC, 8, "-Wattribute-alias",			\
+		      "Type aliasing is used to sanitize syscall arguments");\
+	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
+		__attribute__((alias(__stringify(__se_sys##name))));	\
+	ALLOW_ERROR_INJECTION(sys##name, ERRNO);			\
+	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
+	static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
+	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
+	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
+	{								\
+		long ret =3D __do_kapi_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\
+		__MAP(x,__SC_TEST,__VA_ARGS__);				\
+		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
+		return ret;						\
+	}								\
+	__diag_pop();							\
+	static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
+	{								\
+		const struct kernel_api_spec *__spec =3D kapi_get_spec("sys_" #name); \
+		if (__spec) {						\
+			s64 __params[x] =3D { __MAP(x,__SC_CAST_TO_S64,__VA_ARGS__) }; \
+			int __ret =3D kapi_validate_syscall_params(__spec, __params, x); \
+			if (__ret) return __ret;			\
+		}							\
+		long ret =3D __do_sys##name(__MAP(x,__SC_ARGS,__VA_ARGS__));	\
+		if (__spec) {						\
+			kapi_validate_syscall_return(__spec, (s64)ret); \
+		}							\
+		return ret;						\
+	}								\
+	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#else /* !CONFIG_KAPI_RUNTIME_CHECKS */
 #define __SYSCALL_DEFINEx(x, name, ...)					\
 	__diag_push();							\
 	__diag_ignore(GCC, 8, "-Wattribute-alias",			\
@@ -262,6 +299,7 @@ static inline int is_syscall_trace_event(struct trace_e=
vent_call *tp_event)
 	}								\
 	__diag_pop();							\
 	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
 #endif /* __SYSCALL_DEFINEx */
=20
 /* For split 64-bit arguments on 32-bit architectures */
diff --git a/init/Kconfig b/init/Kconfig
index fa79feb8fe57b..6a2a88de3aa84 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2191,6 +2191,8 @@ source "kernel/Kconfig.kexec"
=20
 source "kernel/liveupdate/Kconfig"
=20
+source "kernel/api/Kconfig"
+
 endmenu		# General setup
=20
 source "arch/Kconfig"
diff --git a/kernel/Makefile b/kernel/Makefile
index e83669841b8cc..0531ed6973619 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -57,6 +57,9 @@ obj-y +=3D dma/
 obj-y +=3D entry/
 obj-y +=3D unwind/
 obj-$(CONFIG_MODULES) +=3D module/
+obj-$(CONFIG_KAPI_SPEC) +=3D api/
+# Ensure api/ is always cleaned even when CONFIG_KAPI_SPEC is not set
+obj- +=3D api/
=20
 obj-$(CONFIG_KCMP) +=3D kcmp.o
 obj-$(CONFIG_FREEZER) +=3D freezer.o
diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
new file mode 100644
index 0000000000000..fde25ec70e134
--- /dev/null
+++ b/kernel/api/Kconfig
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Kernel API Specification Framework Configuration
+#
+
+config KAPI_SPEC
+	bool "Kernel API Specification Framework"
+	help
+	  This option enables the kernel API specification framework,
+	  which provides formal documentation of kernel APIs in both
+	  human and machine-readable formats.
+
+	  The framework allows developers to document APIs inline with
+	  their implementation, including parameter specifications,
+	  return values, error conditions, locking requirements, and
+	  execution context constraints.
+
+	  When enabled, API specifications can be queried at runtime
+	  and exported in various formats (JSON, XML) through debugfs.
+
+	  If unsure, say N.
+
+config KAPI_RUNTIME_CHECKS
+	bool "Runtime API specification checks"
+	depends on KAPI_SPEC
+	depends on DEBUG_KERNEL
+	help
+	  Enable runtime validation of API usage against specifications.
+	  This includes checking execution context requirements, parameter
+	  validation, and lock state verification.
+
+	  This adds overhead and should only be used for debugging and
+	  development. The checks use WARN_ONCE to report violations.
+
+	  If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
new file mode 100644
index 0000000000000..acab17c78afa3
--- /dev/null
+++ b/kernel/api/Makefile
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Kernel API Specification Framework
+#
+
+# Core API specification framework
+obj-$(CONFIG_KAPI_SPEC)		+=3D kernel_api_spec.o
+
+# Auto-generated API specifications collector
+ifeq ($(CONFIG_KAPI_SPEC),y)
+obj-$(CONFIG_KAPI_SPEC)		+=3D generated_api_specs.o
+
+# Find all potential apispec files (this is evaluated at make time)
+apispec-files :=3D $(shell find $(objtree) -name "*.apispec.h" -type f 2>/=
dev/null)
+
+# Generate the collector file
+# Note: FORCE ensures this is always regenerated to pick up new apispec fi=
les
+$(obj)/generated_api_specs.c: $(srctree)/scripts/generate_api_specs.sh FOR=
CE
+	$(Q)$(CONFIG_SHELL) $< $(srctree) $(objtree) > $@
+
+targets +=3D generated_api_specs.c
+
+# Add explicit dependency on the generator script
+$(obj)/generated_api_specs.o: $(obj)/generated_api_specs.c
+endif
+
+# Always clean generated files, regardless of config
+clean-files +=3D generated_api_specs.c
+
diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c
new file mode 100644
index 0000000000000..252000db38d2c
--- /dev/null
+++ b/kernel/api/kernel_api_spec.c
@@ -0,0 +1,1185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kernel_api_spec.c - Kernel API Specification Framework Implementation
+ *
+ * Provides runtime support for kernel API specifications including valida=
tion,
+ * export to various formats, and querying capabilities.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/export.h>
+#include <linux/preempt.h>
+#include <linux/hardirq.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <linux/limits.h>
+#include <linux/fcntl.h>
+#include <linux/mm.h>
+
+/* Section where API specifications are stored */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+/* Dynamic API registration */
+static LIST_HEAD(dynamic_api_specs);
+static DEFINE_MUTEX(api_spec_mutex);
+
+struct dynamic_api_spec {
+	struct list_head list;
+	struct kernel_api_spec *spec;
+};
+
+/*
+ * __kapi_find_spec_locked - Internal lookup, caller must hold api_spec_mu=
tex
+ */
+static const struct kernel_api_spec *__kapi_find_spec_locked(const char *n=
ame)
+{
+	struct kernel_api_spec *spec;
+	struct dynamic_api_spec *dyn_spec;
+
+	/* Search static specifications */
+	for (spec =3D __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+		if (strcmp(spec->name, name) =3D=3D 0)
+			return spec;
+	}
+
+	/* Search dynamic specifications (mutex already held) */
+	list_for_each_entry(dyn_spec, &dynamic_api_specs, list) {
+		if (strcmp(dyn_spec->spec->name, name) =3D=3D 0)
+			return dyn_spec->spec;
+	}
+
+	return NULL;
+}
+
+/**
+ * kapi_get_spec - Get API specification by name
+ * @name: Function name to look up
+ *
+ * Return: Pointer to API specification or NULL if not found
+ */
+const struct kernel_api_spec *kapi_get_spec(const char *name)
+{
+	const struct kernel_api_spec *spec;
+
+	mutex_lock(&api_spec_mutex);
+	spec =3D __kapi_find_spec_locked(name);
+	mutex_unlock(&api_spec_mutex);
+
+	return spec;
+}
+EXPORT_SYMBOL_GPL(kapi_get_spec);
+
+/**
+ * kapi_register_spec - Register a dynamic API specification
+ * @spec: API specification to register
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int kapi_register_spec(struct kernel_api_spec *spec)
+{
+	struct dynamic_api_spec *dyn_spec;
+	int ret =3D 0;
+
+	if (!spec || !spec->name[0])
+		return -EINVAL;
+
+	dyn_spec =3D kzalloc(sizeof(*dyn_spec), GFP_KERNEL);
+	if (!dyn_spec)
+		return -ENOMEM;
+
+	dyn_spec->spec =3D spec;
+
+	mutex_lock(&api_spec_mutex);
+
+	/* Check if already exists while holding lock to prevent races */
+	if (__kapi_find_spec_locked(spec->name)) {
+		ret =3D -EEXIST;
+		kfree(dyn_spec);
+	} else {
+		list_add_tail(&dyn_spec->list, &dynamic_api_specs);
+	}
+
+	mutex_unlock(&api_spec_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_register_spec);
+
+/**
+ * kapi_unregister_spec - Unregister a dynamic API specification
+ * @name: Name of API to unregister
+ */
+void kapi_unregister_spec(const char *name)
+{
+	struct dynamic_api_spec *dyn_spec, *tmp;
+
+	mutex_lock(&api_spec_mutex);
+	list_for_each_entry_safe(dyn_spec, tmp, &dynamic_api_specs, list) {
+		if (strcmp(dyn_spec->spec->name, name) =3D=3D 0) {
+			list_del(&dyn_spec->list);
+			kfree(dyn_spec);
+			break;
+		}
+	}
+	mutex_unlock(&api_spec_mutex);
+}
+EXPORT_SYMBOL_GPL(kapi_unregister_spec);
+
+/**
+ * param_type_to_string - Convert parameter type to string
+ * @type: Parameter type
+ *
+ * Return: String representation of type
+ */
+static const char *param_type_to_string(enum kapi_param_type type)
+{
+	static const char * const type_names[] =3D {
+		[KAPI_TYPE_VOID] =3D "void",
+		[KAPI_TYPE_INT] =3D "int",
+		[KAPI_TYPE_UINT] =3D "uint",
+		[KAPI_TYPE_PTR] =3D "pointer",
+		[KAPI_TYPE_STRUCT] =3D "struct",
+		[KAPI_TYPE_UNION] =3D "union",
+		[KAPI_TYPE_ENUM] =3D "enum",
+		[KAPI_TYPE_FUNC_PTR] =3D "function_pointer",
+		[KAPI_TYPE_ARRAY] =3D "array",
+		[KAPI_TYPE_FD] =3D "file_descriptor",
+		[KAPI_TYPE_USER_PTR] =3D "user_pointer",
+		[KAPI_TYPE_PATH] =3D "pathname",
+		[KAPI_TYPE_CUSTOM] =3D "custom",
+	};
+
+	if (type >=3D ARRAY_SIZE(type_names))
+		return "unknown";
+
+	return type_names[type];
+}
+
+/**
+ * lock_type_to_string - Convert lock type to string
+ * @type: Lock type
+ *
+ * Return: String representation of lock type
+ */
+static const char *lock_type_to_string(enum kapi_lock_type type)
+{
+	static const char * const lock_names[] =3D {
+		[KAPI_LOCK_NONE] =3D "none",
+		[KAPI_LOCK_MUTEX] =3D "mutex",
+		[KAPI_LOCK_SPINLOCK] =3D "spinlock",
+		[KAPI_LOCK_RWLOCK] =3D "rwlock",
+		[KAPI_LOCK_SEQLOCK] =3D "seqlock",
+		[KAPI_LOCK_RCU] =3D "rcu",
+		[KAPI_LOCK_SEMAPHORE] =3D "semaphore",
+		[KAPI_LOCK_CUSTOM] =3D "custom",
+	};
+
+	if (type >=3D ARRAY_SIZE(lock_names))
+		return "unknown";
+
+	return lock_names[type];
+}
+
+/**
+ * lock_scope_to_string - Convert lock scope to string
+ * @scope: Lock scope
+ *
+ * Return: String representation of lock scope
+ */
+static const char *lock_scope_to_string(enum kapi_lock_scope scope)
+{
+	static const char * const scope_names[] =3D {
+		[KAPI_LOCK_INTERNAL] =3D "internal",
+		[KAPI_LOCK_ACQUIRES] =3D "acquires",
+		[KAPI_LOCK_RELEASES] =3D "releases",
+		[KAPI_LOCK_CALLER_HELD] =3D "caller_held",
+	};
+
+	if (scope >=3D ARRAY_SIZE(scope_names))
+		return "unknown";
+
+	return scope_names[scope];
+}
+
+/**
+ * return_check_type_to_string - Convert return check type to string
+ * @type: Return check type
+ *
+ * Return: String representation of return check type
+ */
+static const char *return_check_type_to_string(enum kapi_return_check_type=
 type)
+{
+	static const char * const check_names[] =3D {
+		[KAPI_RETURN_EXACT] =3D "exact",
+		[KAPI_RETURN_RANGE] =3D "range",
+		[KAPI_RETURN_ERROR_CHECK] =3D "error_check",
+		[KAPI_RETURN_FD] =3D "file_descriptor",
+		[KAPI_RETURN_CUSTOM] =3D "custom",
+		[KAPI_RETURN_NO_RETURN] =3D "no_return",
+	};
+
+	if (type >=3D ARRAY_SIZE(check_names))
+		return "unknown";
+
+	return check_names[type];
+}
+
+/**
+ * capability_action_to_string - Convert capability action to string
+ * @action: Capability action
+ *
+ * Return: String representation of capability action
+ */
+static const char *capability_action_to_string(enum kapi_capability_action=
 action)
+{
+	static const char * const action_names[] =3D {
+		[KAPI_CAP_BYPASS_CHECK] =3D "bypass_check",
+		[KAPI_CAP_INCREASE_LIMIT] =3D "increase_limit",
+		[KAPI_CAP_OVERRIDE_RESTRICTION] =3D "override_restriction",
+		[KAPI_CAP_GRANT_PERMISSION] =3D "grant_permission",
+		[KAPI_CAP_MODIFY_BEHAVIOR] =3D "modify_behavior",
+		[KAPI_CAP_ACCESS_RESOURCE] =3D "access_resource",
+		[KAPI_CAP_PERFORM_OPERATION] =3D "perform_operation",
+	};
+
+	if (action >=3D ARRAY_SIZE(action_names))
+		return "unknown";
+
+	return action_names[action];
+}
+
+/**
+ * kapi_export_json - Export API specification to JSON format
+ * @spec: API specification to export
+ * @buf: Buffer to write JSON to
+ * @size: Size of buffer
+ *
+ * Return: Number of bytes written or negative error
+ */
+int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t=
 size)
+{
+	int ret =3D 0;
+	int i;
+
+	if (!spec || !buf || size =3D=3D 0)
+		return -EINVAL;
+
+	ret =3D scnprintf(buf, size,
+		"{\n"
+		"  \"name\": \"%s\",\n"
+		"  \"version\": %u,\n"
+		"  \"description\": \"%s\",\n"
+		"  \"long_description\": \"%s\",\n"
+		"  \"context_flags\": \"0x%x\",\n",
+		spec->name,
+		spec->version,
+		spec->description,
+		spec->long_description,
+		spec->context_flags);
+
+	/* Parameters */
+	ret +=3D scnprintf(buf + ret, size - ret,
+		"  \"parameters\": [\n");
+
+	for (i =3D 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+		const struct kapi_param_spec *param =3D &spec->params[i];
+
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"name\": \"%s\",\n"
+			"      \"type\": \"%s\",\n"
+			"      \"type_class\": \"%s\",\n"
+			"      \"flags\": \"0x%x\",\n"
+			"      \"description\": \"%s\"\n"
+			"    }%s\n",
+			param->name,
+			param->type_name,
+			param_type_to_string(param->type),
+			param->flags,
+			param->description,
+			(i < spec->param_count - 1) ? "," : "");
+	}
+
+	ret +=3D scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Return value */
+	ret +=3D scnprintf(buf + ret, size - ret,
+		"  \"return\": {\n"
+		"    \"type\": \"%s\",\n"
+		"    \"type_class\": \"%s\",\n"
+		"    \"check_type\": \"%s\",\n",
+		spec->return_spec.type_name,
+		param_type_to_string(spec->return_spec.type),
+		return_check_type_to_string(spec->return_spec.check_type));
+
+	switch (spec->return_spec.check_type) {
+	case KAPI_RETURN_EXACT:
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"    \"success_value\": %lld,\n",
+			spec->return_spec.success_value);
+		break;
+	case KAPI_RETURN_RANGE:
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"    \"success_min\": %lld,\n"
+			"    \"success_max\": %lld,\n",
+			spec->return_spec.success_min,
+			spec->return_spec.success_max);
+		break;
+	case KAPI_RETURN_ERROR_CHECK:
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"    \"error_count\": %u,\n",
+			spec->return_spec.error_count);
+		break;
+	default:
+		break;
+	}
+
+	ret +=3D scnprintf(buf + ret, size - ret,
+		"    \"description\": \"%s\"\n"
+		"  },\n",
+		spec->return_spec.description);
+
+	/* Errors */
+	ret +=3D scnprintf(buf + ret, size - ret,
+		"  \"errors\": [\n");
+
+	for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+		const struct kapi_error_spec *error =3D &spec->errors[i];
+
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"code\": %d,\n"
+			"      \"name\": \"%s\",\n"
+			"      \"condition\": \"%s\",\n"
+			"      \"description\": \"%s\"\n"
+			"    }%s\n",
+			error->error_code,
+			error->name,
+			error->condition,
+			error->description,
+			(i < spec->error_count - 1) ? "," : "");
+	}
+
+	ret +=3D scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Locks */
+	ret +=3D scnprintf(buf + ret, size - ret,
+		"  \"locks\": [\n");
+
+	for (i =3D 0; i < spec->lock_count && i < KAPI_MAX_CONSTRAINTS; i++) {
+		const struct kapi_lock_spec *lock =3D &spec->locks[i];
+
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"name\": \"%s\",\n"
+			"      \"type\": \"%s\",\n"
+			"      \"scope\": \"%s\",\n"
+			"      \"description\": \"%s\"\n"
+			"    }%s\n",
+			lock->lock_name,
+			lock_type_to_string(lock->lock_type),
+			lock_scope_to_string(lock->scope),
+			lock->description,
+			(i < spec->lock_count - 1) ? "," : "");
+	}
+
+	ret +=3D scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Capabilities */
+	ret +=3D scnprintf(buf + ret, size - ret,
+		"  \"capabilities\": [\n");
+
+	for (i =3D 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i+=
+) {
+		const struct kapi_capability_spec *cap =3D &spec->capabilities[i];
+
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"    {\n"
+			"      \"capability\": %d,\n"
+			"      \"name\": \"%s\",\n"
+			"      \"action\": \"%s\",\n"
+			"      \"allows\": \"%s\",\n"
+			"      \"without_cap\": \"%s\",\n"
+			"      \"check_condition\": \"%s\",\n"
+			"      \"priority\": %u",
+			cap->capability,
+			cap->cap_name,
+			capability_action_to_string(cap->action),
+			cap->allows,
+			cap->without_cap,
+			cap->check_condition,
+			cap->priority);
+
+		if (cap->alternative_count > 0) {
+			int j;
+			ret +=3D scnprintf(buf + ret, size - ret,
+				",\n      \"alternatives\": [");
+			for (j =3D 0; j < cap->alternative_count; j++) {
+				ret +=3D scnprintf(buf + ret, size - ret,
+					"%d%s", cap->alternative[j],
+					(j < cap->alternative_count - 1) ? ", " : "");
+			}
+			ret +=3D scnprintf(buf + ret, size - ret, "]");
+		}
+
+		ret +=3D scnprintf(buf + ret, size - ret,
+			"\n    }%s\n",
+			(i < spec->capability_count - 1) ? "," : "");
+	}
+
+	ret +=3D scnprintf(buf + ret, size - ret, "  ],\n");
+
+	/* Additional info */
+	ret +=3D scnprintf(buf + ret, size - ret,
+		"  \"since_version\": \"%s\",\n"
+		"  \"examples\": \"%s\",\n"
+		"  \"notes\": \"%s\"\n"
+		"}\n",
+		spec->since_version,
+		spec->examples,
+		spec->notes);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kapi_export_json);
+
+
+/**
+ * kapi_print_spec - Print API specification to kernel log
+ * @spec: API specification to print
+ */
+void kapi_print_spec(const struct kernel_api_spec *spec)
+{
+	int i;
+
+	if (!spec)
+		return;
+
+	pr_info("=3D=3D=3D Kernel API Specification =3D=3D=3D\n");
+	pr_info("Name: %s\n", spec->name);
+	pr_info("Version: %u\n", spec->version);
+	pr_info("Description: %s\n", spec->description);
+
+	if (spec->long_description[0])
+		pr_info("Long Description: %s\n", spec->long_description);
+
+	pr_info("Context Flags: 0x%x\n", spec->context_flags);
+
+	/* Parameters */
+	if (spec->param_count > 0) {
+		pr_info("Parameters:\n");
+		for (i =3D 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+			const struct kapi_param_spec *param =3D &spec->params[i];
+			pr_info("  [%d] %s: %s (flags: 0x%x)\n",
+				i, param->name, param->type_name, param->flags);
+			if (param->description[0])
+				pr_info("      Description: %s\n", param->description);
+		}
+	}
+
+	/* Return value */
+	pr_info("Return: %s\n", spec->return_spec.type_name);
+	if (spec->return_spec.description[0])
+		pr_info("  Description: %s\n", spec->return_spec.description);
+
+	/* Errors */
+	if (spec->error_count > 0) {
+		pr_info("Possible Errors:\n");
+		for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+			const struct kapi_error_spec *error =3D &spec->errors[i];
+			pr_info("  %s (%d): %s\n",
+				error->name, error->error_code, error->condition);
+		}
+	}
+
+	/* Capabilities */
+	if (spec->capability_count > 0) {
+		pr_info("Capabilities:\n");
+		for (i =3D 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i=
++) {
+			const struct kapi_capability_spec *cap =3D &spec->capabilities[i];
+			pr_info("  %s (%d):\n", cap->cap_name, cap->capability);
+			pr_info("    Action: %s\n", capability_action_to_string(cap->action));
+			pr_info("    Allows: %s\n", cap->allows);
+			pr_info("    Without: %s\n", cap->without_cap);
+			if (cap->check_condition[0])
+				pr_info("    Condition: %s\n", cap->check_condition);
+		}
+	}
+
+	pr_info("=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D\n");
+}
+EXPORT_SYMBOL_GPL(kapi_print_spec);
+
+#ifdef CONFIG_KAPI_RUNTIME_CHECKS
+
+/**
+ * kapi_validate_fd - Validate that a file descriptor is valid in current =
context
+ * @fd: File descriptor to validate
+ *
+ * Return: true if fd is valid in current process context, false otherwise
+ */
+static bool kapi_validate_fd(int fd)
+{
+	struct fd f;
+
+	/* Special case: AT_FDCWD is always valid */
+	if (fd =3D=3D AT_FDCWD)
+		return true;
+
+	/* Check basic range */
+	if (fd < 0)
+		return false;
+
+	/* Check if fd is valid in current process context */
+	f =3D fdget(fd);
+	if (fd_empty(f)) {
+		return false;
+	}
+
+	/* fd is valid, release reference */
+	fdput(f);
+	return true;
+}
+
+/**
+ * kapi_validate_user_ptr - Validate that a user pointer is accessible
+ * @ptr: User pointer to validate
+ * @size: Size in bytes to validate
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr(const void __user *ptr, size_t size)
+{
+	/* NULL pointers are not valid; caller handles optional case */
+	if (!ptr)
+		return false;
+
+	return access_ok(ptr, size);
+}
+
+/**
+ * kapi_validate_user_ptr_with_params - Validate user pointer with dynamic=
 size
+ * @param_spec: Parameter specification
+ * @ptr: User pointer to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if user memory is accessible, false otherwise
+ */
+static bool kapi_validate_user_ptr_with_params(const struct kapi_param_spe=
c *param_spec,
+						const void __user *ptr,
+						const s64 *all_params,
+						int param_count)
+{
+	size_t actual_size;
+
+	/* NULL is allowed for optional parameters */
+	if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	/* Calculate actual size based on related parameter */
+	if (param_spec->size_param_idx >=3D 0 &&
+	    param_spec->size_param_idx < param_count) {
+		s64 count =3D all_params[param_spec->size_param_idx];
+
+		/* Validate count is positive */
+		if (count <=3D 0) {
+			pr_warn("Parameter %s: size determinant is non-positive (%lld)\n",
+				param_spec->name, count);
+			return false;
+		}
+
+		/* Check for multiplication overflow */
+		if (param_spec->size_multiplier > 0 &&
+		    count > SIZE_MAX / param_spec->size_multiplier) {
+			pr_warn("Parameter %s: size calculation overflow\n",
+				param_spec->name);
+			return false;
+		}
+
+		actual_size =3D count * param_spec->size_multiplier;
+	} else {
+		/* Use fixed size */
+		actual_size =3D param_spec->size;
+	}
+
+	return kapi_validate_user_ptr(ptr, actual_size);
+}
+
+/**
+ * kapi_validate_path - Validate that a pathname is accessible and within =
limits
+ * @path: User pointer to pathname
+ * @param_spec: Parameter specification
+ *
+ * Return: true if path is valid, false otherwise
+ */
+static bool kapi_validate_path(const char __user *path,
+				const struct kapi_param_spec *param_spec)
+{
+	size_t len;
+
+	/* NULL is allowed for optional parameters */
+	if (!path && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	if (!path) {
+		pr_warn("Parameter %s: NULL path not allowed\n", param_spec->name);
+		return false;
+	}
+
+	/* Check if the path is accessible */
+	if (!access_ok(path, 1)) {
+		pr_warn("Parameter %s: path pointer %p not accessible\n",
+			param_spec->name, path);
+		return false;
+	}
+
+	/* Use strnlen_user to get the length and validate accessibility */
+	len =3D strnlen_user(path, PATH_MAX + 1);
+	if (len =3D=3D 0) {
+		pr_warn("Parameter %s: invalid path pointer %p\n",
+			param_spec->name, path);
+		return false;
+	}
+
+	/* Check path length limit */
+	if (len > PATH_MAX) {
+		pr_warn("Parameter %s: path too long (exceeds PATH_MAX)\n",
+			param_spec->name);
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * kapi_validate_user_string - Validate a userspace null-terminated string
+ * @str: User pointer to string
+ * @param_spec: Parameter specification containing length constraints
+ *
+ * Validates that the userspace string pointer is accessible and that the
+ * string length (excluding null terminator) is within the range specified
+ * by min_value and max_value in the parameter specification.
+ *
+ * Return: true if string is valid, false otherwise
+ */
+static bool kapi_validate_user_string(const char __user *str,
+				       const struct kapi_param_spec *param_spec)
+{
+	size_t len;
+	size_t max_check_len;
+
+	/* NULL is allowed for optional parameters */
+	if (!str && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	if (!str) {
+		pr_warn("Parameter %s: NULL string not allowed\n", param_spec->name);
+		return false;
+	}
+
+	/* Check if the string pointer is accessible */
+	if (!access_ok(str, 1)) {
+		pr_warn("Parameter %s: string pointer %p not accessible\n",
+			param_spec->name, str);
+		return false;
+	}
+
+	/*
+	 * Use strnlen_user to get the string length and validate accessibility.
+	 * Check up to max_value + 1 to detect strings that are too long.
+	 * If max_value is 0 or unset, use PATH_MAX as a reasonable default.
+	 */
+	max_check_len =3D param_spec->max_value > 0 ?
+			(size_t)param_spec->max_value + 1 : PATH_MAX + 1;
+	len =3D strnlen_user(str, max_check_len);
+
+	if (len =3D=3D 0) {
+		pr_warn("Parameter %s: invalid string pointer %p\n",
+			param_spec->name, str);
+		return false;
+	}
+
+	/*
+	 * strnlen_user returns the length including the null terminator.
+	 * Convert to string length (excluding terminator) for range check.
+	 */
+	len--;
+
+	/* Check minimum length constraint */
+	if (param_spec->min_value > 0 && len < (size_t)param_spec->min_value) {
+		pr_warn("Parameter %s: string too short (%zu < %lld)\n",
+			param_spec->name, len, param_spec->min_value);
+		return false;
+	}
+
+	/* Check maximum length constraint */
+	if (param_spec->max_value > 0 && len > (size_t)param_spec->max_value) {
+		pr_warn("Parameter %s: string too long (%zu > %lld)\n",
+			param_spec->name, len, param_spec->max_value);
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * kapi_validate_user_ptr_constraint - Validate a userspace pointer with s=
ize
+ * @ptr: User pointer to validate
+ * @param_spec: Parameter specification containing size
+ *
+ * Validates that the userspace pointer is accessible and that the memory
+ * region of the specified size can be accessed. The size is taken from
+ * the param_spec->size field.
+ *
+ * Return: true if pointer is valid, false otherwise
+ */
+static bool kapi_validate_user_ptr_constraint(const void __user *ptr,
+					       const struct kapi_param_spec *param_spec)
+{
+	/* NULL is allowed for optional parameters */
+	if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+		return true;
+
+	if (!ptr) {
+		pr_warn("Parameter %s: NULL pointer not allowed\n", param_spec->name);
+		return false;
+	}
+
+	/* Validate size is specified */
+	if (param_spec->size =3D=3D 0) {
+		pr_warn("Parameter %s: size not specified for user pointer validation\n",
+			param_spec->name);
+		return false;
+	}
+
+	/* Check if the memory region is accessible */
+	if (!access_ok(ptr, param_spec->size)) {
+		pr_warn("Parameter %s: user pointer %p not accessible for %zu bytes\n",
+			param_spec->name, ptr, param_spec->size);
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * kapi_validate_param - Validate a parameter against its specification
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 val=
ue)
+{
+	int i;
+
+	/* Special handling for file descriptor type */
+	if (param_spec->type =3D=3D KAPI_TYPE_FD) {
+		if (!kapi_validate_fd((int)value)) {
+			pr_warn("Parameter %s: invalid file descriptor %lld\n",
+				param_spec->name, value);
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	/* Special handling for user pointer type */
+	if (param_spec->type =3D=3D KAPI_TYPE_USER_PTR) {
+		const void __user *ptr =3D (const void __user *)value;
+
+		/* NULL is allowed for optional parameters */
+		if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+			return true;
+
+		if (!kapi_validate_user_ptr(ptr, param_spec->size)) {
+			pr_warn("Parameter %s: invalid user pointer %p (size: %zu)\n",
+				param_spec->name, ptr, param_spec->size);
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	/* Special handling for path type */
+	if (param_spec->type =3D=3D KAPI_TYPE_PATH) {
+		const char __user *path =3D (const char __user *)value;
+
+		if (!kapi_validate_path(path, param_spec)) {
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	switch (param_spec->constraint_type) {
+	case KAPI_CONSTRAINT_NONE:
+		return true;
+
+	case KAPI_CONSTRAINT_RANGE:
+		if (value < param_spec->min_value || value > param_spec->max_value) {
+			pr_warn("Parameter %s value %lld out of range [%lld, %lld]\n",
+				param_spec->name, value,
+				param_spec->min_value, param_spec->max_value);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_MASK:
+		if (value & ~param_spec->valid_mask) {
+			pr_warn("Parameter %s value 0x%llx contains invalid bits (valid mask: 0=
x%llx)\n",
+				param_spec->name, value, param_spec->valid_mask);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_ENUM:
+		if (!param_spec->enum_values || param_spec->enum_count =3D=3D 0)
+			return true;
+
+		for (i =3D 0; i < param_spec->enum_count; i++) {
+			if (value =3D=3D param_spec->enum_values[i])
+				return true;
+		}
+		pr_warn("Parameter %s value %lld not in valid enumeration\n",
+			param_spec->name, value);
+		return false;
+
+	case KAPI_CONSTRAINT_ALIGNMENT:
+		if (param_spec->alignment =3D=3D 0) {
+			pr_warn("Parameter %s: alignment constraint specified but alignment is =
0\n",
+				param_spec->name);
+			return false;
+		}
+		if (value & (param_spec->alignment - 1)) {
+			pr_warn("Parameter %s value 0x%llx not aligned to %zu boundary\n",
+				param_spec->name, value, param_spec->alignment);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_POWER_OF_TWO:
+		if (value =3D=3D 0 || (value & (value - 1))) {
+			pr_warn("Parameter %s value %lld is not a power of two\n",
+				param_spec->name, value);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_PAGE_ALIGNED:
+		if (value & (PAGE_SIZE - 1)) {
+			pr_warn("Parameter %s value 0x%llx not page-aligned (PAGE_SIZE=3D%ld)\n=
",
+				param_spec->name, value, PAGE_SIZE);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_NONZERO:
+		if (value =3D=3D 0) {
+			pr_warn("Parameter %s must be non-zero\n", param_spec->name);
+			return false;
+		}
+		return true;
+
+	case KAPI_CONSTRAINT_USER_STRING:
+		return kapi_validate_user_string((const char __user *)value, param_spec);
+
+	case KAPI_CONSTRAINT_USER_PATH:
+		return kapi_validate_path((const char __user *)value, param_spec);
+
+	case KAPI_CONSTRAINT_USER_PTR:
+		return kapi_validate_user_ptr_constraint((const void __user *)value, par=
am_spec);
+
+	case KAPI_CONSTRAINT_CUSTOM:
+		if (param_spec->validate)
+			return param_spec->validate(value);
+		return true;
+
+	default:
+		return true;
+	}
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param);
+
+/**
+ * kapi_validate_param_with_context - Validate parameter with access to al=
l params
+ * @param_spec: Parameter specification
+ * @value: Parameter value to validate
+ * @all_params: Array of all parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: true if valid, false otherwise
+ */
+bool kapi_validate_param_with_context(const struct kapi_param_spec *param_=
spec,
+				       s64 value, const s64 *all_params, int param_count)
+{
+	/* Special handling for user pointer type with dynamic sizing */
+	if (param_spec->type =3D=3D KAPI_TYPE_USER_PTR) {
+		const void __user *ptr =3D (const void __user *)value;
+
+		/* NULL is allowed for optional parameters */
+		if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL))
+			return true;
+
+		if (!kapi_validate_user_ptr_with_params(param_spec, ptr, all_params, par=
am_count)) {
+			pr_warn("Parameter %s: invalid user pointer %p\n",
+				param_spec->name, ptr);
+			return false;
+		}
+		/* Continue with additional constraint checks if needed */
+	}
+
+	/* For other types, fall back to regular validation */
+	return kapi_validate_param(param_spec, value);
+}
+EXPORT_SYMBOL_GPL(kapi_validate_param_with_context);
+
+/**
+ * kapi_validate_syscall_param - Validate syscall parameter with enforceme=
nt
+ * @spec: API specification
+ * @param_idx: Parameter index
+ * @value: Parameter value
+ *
+ * Return: -EINVAL if invalid, 0 if valid
+ */
+int kapi_validate_syscall_param(const struct kernel_api_spec *spec,
+				 int param_idx, s64 value)
+{
+	const struct kapi_param_spec *param_spec;
+
+	if (!spec || param_idx >=3D spec->param_count)
+		return 0;
+
+	param_spec =3D &spec->params[param_idx];
+
+	if (!kapi_validate_param(param_spec, value)) {
+		if (strncmp(spec->name, "sys_", 4) =3D=3D 0) {
+			/* For syscalls, we can return EINVAL to userspace */
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_param);
+
+/**
+ * kapi_validate_syscall_params - Validate all syscall parameters together
+ * @spec: API specification
+ * @params: Array of parameter values
+ * @param_count: Number of parameters
+ *
+ * Return: -EINVAL if any parameter is invalid, 0 if all valid
+ */
+int kapi_validate_syscall_params(const struct kernel_api_spec *spec,
+				 const s64 *params, int param_count)
+{
+	int i;
+
+	if (!spec || !params)
+		return 0;
+
+	/* Validate that we have the expected number of parameters */
+	if (param_count !=3D spec->param_count) {
+		pr_warn("API %s: parameter count mismatch (expected %u, got %d)\n",
+			spec->name, spec->param_count, param_count);
+		return -EINVAL;
+	}
+
+	/* Validate each parameter with context */
+	for (i =3D 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) {
+		const struct kapi_param_spec *param_spec =3D &spec->params[i];
+
+		if (!kapi_validate_param_with_context(param_spec, params[i], params, par=
am_count)) {
+			if (strncmp(spec->name, "sys_", 4) =3D=3D 0) {
+				/* For syscalls, we can return EINVAL to userspace */
+				return -EINVAL;
+			}
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_params);
+
+/**
+ * kapi_check_return_success - Check if return value indicates success
+ * @return_spec: Return specification
+ * @retval: Return value to check
+ *
+ * Returns true if the return value indicates success according to the spe=
c.
+ */
+bool kapi_check_return_success(const struct kapi_return_spec *return_spec,=
 s64 retval)
+{
+	u32 i;
+
+	if (!return_spec)
+		return true; /* No spec means we can't validate */
+
+	switch (return_spec->check_type) {
+	case KAPI_RETURN_EXACT:
+		return retval =3D=3D return_spec->success_value;
+
+	case KAPI_RETURN_RANGE:
+		return retval >=3D return_spec->success_min &&
+		       retval <=3D return_spec->success_max;
+
+	case KAPI_RETURN_ERROR_CHECK:
+		/* Success if NOT in error list */
+		if (return_spec->error_values) {
+			for (i =3D 0; i < return_spec->error_count; i++) {
+				if (retval =3D=3D return_spec->error_values[i])
+					return false; /* Found in error list */
+			}
+		}
+		return true; /* Not in error list =3D success */
+
+	case KAPI_RETURN_FD:
+		/* File descriptors: >=3D 0 is success, < 0 is error */
+		return retval >=3D 0;
+
+	case KAPI_RETURN_CUSTOM:
+		if (return_spec->is_success)
+			return return_spec->is_success(retval);
+		fallthrough;
+
+	default:
+		return true; /* Unknown check type, assume success */
+	}
+}
+EXPORT_SYMBOL_GPL(kapi_check_return_success);
+
+/**
+ * kapi_validate_return_value - Validate that return value matches spec
+ * @spec: API specification
+ * @retval: Return value to validate
+ *
+ * Return: true if return value is valid according to spec, false otherwis=
e.
+ *
+ * This function checks:
+ * 1. If the value indicates success, it must match the success criteria
+ * 2. If the value indicates error, it must be one of the specified error =
codes
+ */
+bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 re=
tval)
+{
+	int i;
+	bool is_success;
+
+	if (!spec)
+		return true; /* No spec means we can't validate */
+
+	/* First check if this is a success return */
+	is_success =3D kapi_check_return_success(&spec->return_spec, retval);
+
+	if (is_success) {
+		/* Special validation for file descriptor returns */
+		if (spec->return_spec.check_type =3D=3D KAPI_RETURN_FD) {
+			/* For successful FD returns, validate it's a valid FD */
+			if (!kapi_validate_fd((int)retval)) {
+				pr_warn("API %s returned invalid file descriptor %lld\n",
+					spec->name, retval);
+				return false;
+			}
+		}
+		return true;
+	}
+
+	/* Error case - check if it's one of the specified errors */
+	if (spec->error_count =3D=3D 0) {
+		/* No errors specified, so any error is potentially valid */
+		pr_debug("API %s returned unspecified error %lld\n",
+			 spec->name, retval);
+		return true;
+	}
+
+	/* Check if the error is in our list of specified errors */
+	for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+		if (retval =3D=3D spec->errors[i].error_code)
+			return true;
+	}
+
+	/* Error not in spec */
+	pr_warn("API %s returned unspecified error code %lld. Valid errors are:\n=
",
+		spec->name, retval);
+	for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) {
+		pr_warn("  %s (%d): %s\n",
+			spec->errors[i].name,
+			spec->errors[i].error_code,
+			spec->errors[i].condition);
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_return_value);
+
+/**
+ * kapi_validate_syscall_return - Validate syscall return value with enfor=
cement
+ * @spec: API specification
+ * @retval: Return value
+ *
+ * Return: 0 if valid, -EINVAL if the return value doesn't match spec
+ *
+ * For syscalls, this can help detect kernel bugs where unspecified error
+ * codes are returned to userspace.
+ */
+int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 r=
etval)
+{
+	if (!spec)
+		return 0;
+
+	if (!kapi_validate_return_value(spec, retval)) {
+		/* Log the violation but don't change the return value */
+		WARN_ONCE(1, "Syscall %s returned unspecified value %lld\n",
+			  spec->name, retval);
+		/* Could return -EINVAL here to enforce, but that might break userspace =
*/
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kapi_validate_syscall_return);
+
+/**
+ * kapi_check_context - Check if current context matches API requirements
+ * @spec: API specification to check against
+ */
+void kapi_check_context(const struct kernel_api_spec *spec)
+{
+	u32 ctx =3D spec->context_flags;
+	bool valid =3D false;
+
+	if (!ctx)
+		return;
+
+	/* Check if we're in an allowed context */
+	if ((ctx & KAPI_CTX_PROCESS) && !in_interrupt())
+		valid =3D true;
+
+	if ((ctx & KAPI_CTX_SOFTIRQ) && in_softirq())
+		valid =3D true;
+
+	if ((ctx & KAPI_CTX_HARDIRQ) && in_hardirq())
+		valid =3D true;
+
+	if ((ctx & KAPI_CTX_NMI) && in_nmi())
+		valid =3D true;
+
+	if (!valid) {
+		WARN_ONCE(1, "API %s called from invalid context\n", spec->name);
+	}
+
+	/* Check specific requirements */
+	if ((ctx & KAPI_CTX_ATOMIC) && preemptible()) {
+		WARN_ONCE(1, "API %s requires atomic context\n", spec->name);
+	}
+
+	if ((ctx & KAPI_CTX_SLEEPABLE) && !preemptible()) {
+		WARN_ONCE(1, "API %s requires sleepable context\n", spec->name);
+	}
+}
+EXPORT_SYMBOL_GPL(kapi_check_context);
+
+#endif /* CONFIG_KAPI_RUNTIME_CHECKS */
diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh
new file mode 100755
index 0000000000000..2c3078a508fef
--- /dev/null
+++ b/scripts/generate_api_specs.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Stub script for generating API specifications collector
+# This is a placeholder until the full implementation is available
+#
+
+cat << 'EOF'
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Auto-generated API specifications collector (stub)
+ * Generated by scripts/generate_api_specs.sh
+ */
+
+#include <linux/kernel_api_spec.h>
+
+/* No API specifications collected yet */
+EOF
--=20
2.51.0

From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AB222D97BD;
	Thu, 18 Dec 2025 20:42:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090568; cv=none;
 b=Rb9zk8ijzkzRAa1sdSOW7HcbBxa7+g5Yvggx2J0NN979xYBbHDir4qkmKRoAtkqnnXRlKa4TTuUMDSbqtAnNLDzr6S8AdYhap2NZIhtxNmeHP8u2ZH+zn5N9m8zBzvixLwi2aie3X4Ak0D/x5EM2zx8YHpzE2srYuBrBiKR7sM8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090568; c=relaxed/simple;
	bh=lDfn1hgkc+g2P4PzDyhasO3+1sCtVJHnt7XbBNuByqE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=CGamYOpf80ArdbRm+k8+fgbYZ9vJXw4XSpFbvh8AC9+NXkBPef9bMEvh9zj+PALjVFcCXDzV1yNb2YxYTYxQkrSntfWyMZ8+5uu6bwD8zkiXqRcbXIWy0WueJ9m/C6nSXXyvwZ+wZ8k6cHwp6FSogyYFFHD1PpG6uCTYaog6Yrg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=lhxttUnh; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="lhxttUnh"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56770C19421;
	Thu, 18 Dec 2025 20:42:47 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090568;
	bh=lDfn1hgkc+g2P4PzDyhasO3+1sCtVJHnt7XbBNuByqE=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=lhxttUnhw8JzdeH7QFLreeAdhesw7gaymacZllocrsn8XRKgc9sfuRCgoUeZBXv8X
	 NERLbSA4sBCFQWOoId7zBT8ub2LnH2I1UscFA8WPgfcvIteeIiWvjg2Q5gnorQcTqx
	 GJ7w6dEssOoKXEMZIqGKKJ2gDdU6PVU7nSwJKEcaPvLss7EmOZ2yjNz22s62ZoO9Ly
	 C7r1hol+LVoI/mg2YD03lMZixbSuyJOjLt4SmNbpxbDpYz0VcvhzKE367oezj+gcKv
	 881WOlv9pl33ticJ/FgZMp4bh7wID46pKudRXarRazbJ6nYxO7xh4im49NcujfE/M+
	 Ccj4pItP9o+ug==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 02/15] kernel/api: enable kerneldoc-based API
 specifications
Date: Thu, 18 Dec 2025 15:42:24 -0500
Message-ID: <20251218204239.4159453-3-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This patch adds support for extracting API specifications from
kernel-doc comments and generating C macro invocations for the
kernel API specification framework.

Changes include:
- New kdoc_apispec.py module for generating API spec macros
- Updates to kernel-doc.py to support -apispec output format
- Build system integration in Makefile.build
- Generator script for collecting all API specifications
- Support for API-specific sections in kernel-doc comments

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 scripts/Makefile.build                |  28 +
 scripts/Makefile.clean                |   3 +
 scripts/generate_api_specs.sh         |  83 ++-
 scripts/kernel-doc.py                 |   5 +
 tools/lib/python/kdoc/kdoc_apispec.py | 755 ++++++++++++++++++++++++++
 tools/lib/python/kdoc/kdoc_output.py  |   9 +-
 tools/lib/python/kdoc/kdoc_parser.py  |  86 ++-
 7 files changed, 957 insertions(+), 12 deletions(-)
 create mode 100644 tools/lib/python/kdoc/kdoc_apispec.py

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 52c08c4eb0b9a..7a192d29a01f6 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -172,6 +172,34 @@ ifneq ($(KBUILD_EXTRA_WARN),)
         $<
 endif
=20
+# Generate API spec headers from kernel-doc comments
+ifeq ($(CONFIG_KAPI_SPEC),y)
+# Function to check if a file has API specifications
+has-apispec =3D $(shell grep -qE '^\s*\*\s*(long-desc|context-flags|state-=
trans):' $(src)/$(1) 2>/dev/null && echo $(1))
+
+# Get base names without directory prefix
+c-objs-base :=3D $(notdir $(real-obj-y) $(real-obj-m))
+# Filter to only .o files with corresponding .c source files
+c-files :=3D $(foreach o,$(c-objs-base),$(if $(wildcard $(src)/$(o:.o=3D.c=
)),$(o:.o=3D.c)))
+# Also check for any additional .c files that contain API specs but are in=
cluded
+extra-c-files :=3D $(shell find $(src) -maxdepth 1 -name "*.c" -exec grep =
-l '^\s*\*\s*\(long-desc\|context-flags\|state-trans\):' {} \; 2>/dev/null =
| xargs -r basename -a)
+# Combine both lists and remove duplicates
+all-c-files :=3D $(sort $(c-files) $(extra-c-files))
+# Only include files that actually have API specifications
+apispec-files :=3D $(foreach f,$(all-c-files),$(call has-apispec,$(f)))
+# Generate apispec targets with proper directory prefix
+apispec-y :=3D $(addprefix $(obj)/,$(apispec-files:.c=3D.apispec.h))
+always-y +=3D $(apispec-y)
+targets +=3D $(apispec-y)
+
+quiet_cmd_apispec =3D APISPEC $@
+      cmd_apispec =3D PYTHONDONTWRITEBYTECODE=3D1 $(KERNELDOC) -apispec \
+                    $(KDOCFLAGS) $< > $@ || rm -f $@
+
+$(obj)/%.apispec.h: $(src)/%.c FORCE
+	$(call if_changed,apispec)
+endif
+
 # Compile C sources (.c)
 # ------------------------------------------------------------------------=
---
=20
diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
index 6ead00ec7313b..f78dbbe637f27 100644
--- a/scripts/Makefile.clean
+++ b/scripts/Makefile.clean
@@ -35,6 +35,9 @@ __clean-files   :=3D $(filter-out $(no-clean-files), $(__=
clean-files))
=20
 __clean-files   :=3D $(wildcard $(addprefix $(obj)/, $(__clean-files)))
=20
+# Also clean generated apispec headers (computed dynamically in Makefile.b=
uild)
+__clean-files   +=3D $(wildcard $(obj)/*.apispec.h)
+
 # =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
 # To make this rule robust against "Argument list too long" error,
diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh
index 2c3078a508fef..3ac6be9b4fe98 100755
--- a/scripts/generate_api_specs.sh
+++ b/scripts/generate_api_specs.sh
@@ -1,18 +1,87 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 #
-# Stub script for generating API specifications collector
-# This is a placeholder until the full implementation is available
+# generate_api_specs.sh - Generate C file that includes all API specificat=
ion headers
 #
+# Usage: generate_api_specs.sh <srctree> <objtree>
=20
-cat << 'EOF'
-// SPDX-License-Identifier: GPL-2.0
+SRCTREE=3D"$1"
+OBJTREE=3D"$2"
+
+if [ -z "$SRCTREE" ] || [ -z "$OBJTREE" ]; then
+    echo "Usage: $0 <srctree> <objtree>" >&2
+    exit 1
+fi
+
+# Generate header
+cat <<EOF
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Auto-generated API specifications collector (stub)
- * Generated by scripts/generate_api_specs.sh
+ * Auto-generated file - DO NOT EDIT
+ * Generated by: scripts/generate_api_specs.sh
+ *
+ * This file includes all kernel API specification headers
  */
=20
+#include <linux/kernel.h>
 #include <linux/kernel_api_spec.h>
+#include <linux/errno.h>
+#include <linux/capability.h>
+#include <linux/fcntl.h>
+#include <uapi/linux/aio_abi.h>
+#include <uapi/linux/sched/types.h>
+#include <uapi/linux/xattr.h>
+#include <linux/eventfd.h>
+#include <linux/eventpoll.h>
+#include <linux/wait.h>
+#include <linux/inotify.h>
+#include <linux/splice.h>
+#include <linux/timerfd.h>
+#include <linux/kexec.h>
+#include <uapi/linux/mount.h>
+#include <uapi/linux/signalfd.h>
+#include <linux/fs.h>
+#include <linux/signal.h>
+#include <linux/ptrace.h>
+#include <linux/quota.h>
+#include <linux/uio.h>
+#include <linux/time.h>
+
+#ifdef CONFIG_KAPI_SPEC
+
+EOF
=20
-/* No API specifications collected yet */
+# Find all .apispec.h files and generate includes
+# Look in both source tree and object tree
+(find "$SRCTREE" -name "*.apispec.h" -type f 2>/dev/null; \
+ find "$OBJTREE" -name "*.apispec.h" -type f 2>/dev/null) | \
+    grep -v "/generated_api_specs.c" | \
+    sort -u | \
+    while read -r apispec_file; do
+        # Get relative path from srctree or objtree
+        case "$apispec_file" in
+            "$SRCTREE"*)
+                rel_path=3D"${apispec_file#$SRCTREE/}"
+                ;;
+            *)
+                rel_path=3D"${apispec_file#$OBJTREE/}"
+                ;;
+        esac
+
+        # Skip if file is empty
+        if [ ! -s "$apispec_file" ]; then
+            continue
+        fi
+
+        # Generate include statement with relative path from kernel/api/
+        # The generated file is always at kernel/api/generated_api_specs.c,
+        # so we need to go up two directories to reach the root
+        echo "#include \"../../${rel_path}\""
+    done
+
+# Close the ifdef
+cat <<EOF
+
+#endif /* CONFIG_KAPI_SPEC */
 EOF
+
diff --git a/scripts/kernel-doc.py b/scripts/kernel-doc.py
index 7a1eaf986bcd4..8404a9e9e3fed 100755
--- a/scripts/kernel-doc.py
+++ b/scripts/kernel-doc.py
@@ -231,6 +231,8 @@ def main():
                          help=3D"Output reStructuredText format (default).=
")
     out_fmt.add_argument("-N", "-none", "--none", action=3D"store_true",
                          help=3D"Do not output documentation, only warning=
s.")
+    out_fmt.add_argument("-apispec", "--apispec", action=3D"store_true",
+                         help=3D"Output C macro invocations for kernel API=
 specifications.")
=20
     # Output selection mutually-exclusive group
=20
@@ -294,11 +296,14 @@ def main():
     # Import kernel-doc libraries only after checking Python version
     from kdoc.kdoc_files import KernelFiles             # pylint: disable=
=3DC0415
     from kdoc.kdoc_output import RestFormat, ManFormat  # pylint: disable=
=3DC0415
+    from kdoc.kdoc_apispec import ApiSpecFormat         # pylint: disable=
=3DC0415
=20
     if args.man:
         out_style =3D ManFormat(modulename=3Dargs.modulename)
     elif args.none:
         out_style =3D None
+    elif args.apispec:
+        out_style =3D ApiSpecFormat()
     else:
         out_style =3D RestFormat()
=20
diff --git a/tools/lib/python/kdoc/kdoc_apispec.py b/tools/lib/python/kdoc/=
kdoc_apispec.py
new file mode 100644
index 0000000000000..ae03af6d10685
--- /dev/null
+++ b/tools/lib/python/kdoc/kdoc_apispec.py
@@ -0,0 +1,755 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Generate C macro invocations for kernel API specifications from kernel-doc=
 comments.
+
+This module creates C header files with API specification macros that match
+the kernel API specification framework introduced in commit 9688de5c25bed.
+"""
+
+from kdoc.kdoc_output import OutputFormat
+import re
+
+
+# Maximum string lengths (from kernel_api_spec.h)
+KAPI_MAX_DESC_LEN =3D 512
+KAPI_MAX_NAME_LEN =3D 128
+KAPI_MAX_SIGNAL_NAME_LEN =3D 32
+
+# Valid KAPI effect types
+VALID_EFFECT_TYPES =3D {
+    'KAPI_EFFECT_NONE', 'KAPI_EFFECT_MODIFY_STATE', 'KAPI_EFFECT_PROCESS_S=
TATE',
+    'KAPI_EFFECT_IRREVERSIBLE', 'KAPI_EFFECT_SCHEDULE', 'KAPI_EFFECT_FILES=
YSTEM',
+    'KAPI_EFFECT_HARDWARE', 'KAPI_EFFECT_ALLOC_MEMORY', 'KAPI_EFFECT_FREE_=
MEMORY',
+    'KAPI_EFFECT_SIGNAL_SEND', 'KAPI_EFFECT_FILE_POSITION', 'KAPI_EFFECT_L=
OCK_ACQUIRE',
+    'KAPI_EFFECT_LOCK_RELEASE', 'KAPI_EFFECT_RESOURCE_CREATE', 'KAPI_EFFEC=
T_RESOURCE_DESTROY',
+    'KAPI_EFFECT_NETWORK'
+}
+
+
+class ApiSpecFormat(OutputFormat):
+    """Generate C macro invocations for kernel API specifications"""
+
+    def __init__(self):
+        super().__init__()
+        self.header_written =3D False
+
+    def msg(self, fname, name, args):
+        """Handles a single entry from kernel-doc parser"""
+        if not self.header_written:
+            self.data =3D self._generate_header()
+            self.header_written =3D True
+        else:
+            self.data =3D ""
+
+        result =3D super().msg(fname, name, args)
+        return result if result else self.data
+
+    def _generate_header(self):
+        """Generate the file header"""
+        return (
+            "/* SPDX-License-Identifier: GPL-2.0 */\n"
+            "/* Auto-generated from kerneldoc annotations - DO NOT EDIT */=
\n\n"
+            "#include <linux/kernel_api_spec.h>\n"
+            "#include <linux/errno.h>\n\n"
+        )
+
+    def _format_macro_param(self, value, max_len=3DKAPI_MAX_DESC_LEN):
+        """Format a value for use in C macro parameter, truncating if need=
ed"""
+        if value is None:
+            return '""'
+        value =3D str(value).replace('\\', '\\\\').replace('"', '\\"')
+        value =3D value.replace('\n', ' ')
+        # Truncate to fit within max_len, accounting for escaping overhead
+        if len(value) > max_len - 10:
+            value =3D value[:max_len - 13] + '...'
+        return f'"{value}"'
+
+    def _get_section(self, sections, key):
+        """Get first line from sections, checking with and without @ prefi=
x"""
+        for prefix in ['', '@']:
+            full_key =3D prefix + key
+            if full_key in sections:
+                content =3D sections[full_key].strip()
+                # Return only first line to avoid mixing sections
+                return content.split('\n')[0].strip() if content else ''
+        return None
+
+    def _get_raw_section(self, sections, key):
+        """Get full section content, checking with and without @ prefix"""
+        for prefix in ['', '@']:
+            full_key =3D prefix + key
+            if full_key in sections:
+                return sections[full_key]
+        return ''
+
+    def _parse_indented_items(self, section_content, item_parser):
+        """Generic parser for indented items.
+
+        Args:
+            section_content: Raw section content
+            item_parser: Function that takes (lines, start_index) and retu=
rns (item, next_index)
+
+        Returns:
+            List of parsed items
+        """
+        if not section_content:
+            return []
+
+        items =3D []
+        lines =3D section_content.strip().split('\n')
+        i =3D 0
+
+        while i < len(lines):
+            if not lines[i].strip():
+                i +=3D 1
+                continue
+
+            # Check if this is a main item (not indented)
+            if not lines[i].startswith((' ', '\t')):
+                item, i =3D item_parser(lines, i)
+                if item:
+                    items.append(item)
+            else:
+                i +=3D 1
+
+        return items
+
+    def _parse_subfields(self, lines, start_idx):
+        """Parse indented subfields starting from start_idx+1.
+
+        Returns: (dict of subfields, next index)
+        """
+        subfields =3D {}
+        i =3D start_idx + 1
+
+        while i < len(lines) and (lines[i].startswith((' ', '\t'))):
+            line =3D lines[i].strip()
+            if ':' in line:
+                key, value =3D line.split(':', 1)
+                subfields[key.strip()] =3D value.strip()
+            i +=3D 1
+
+        return subfields, i
+
+    def _parse_signal_item(self, lines, i):
+        """Parse a single signal specification"""
+        signal =3D {'name': lines[i].strip()}
+        subfields, next_i =3D self._parse_subfields(lines, i)
+
+        # Map subfields to signal attributes
+        signal.update({
+            'direction': subfields.get('direction', 'KAPI_SIGNAL_RECEIVE'),
+            'action': subfields.get('action', 'KAPI_SIGNAL_ACTION_RETURN'),
+            'condition': subfields.get('condition'),
+            'desc': subfields.get('desc'),
+            'error': subfields.get('error'),
+            'timing': subfields.get('timing'),
+            'priority': subfields.get('priority'),
+            'interruptible': subfields.get('interruptible', '').lower() =
=3D=3D 'yes',
+            'number': subfields.get('number', '0'),
+        })
+
+        return signal, next_i
+
+    def _parse_error_item(self, lines, i):
+        """Parse a single error specification"""
+        line =3D lines[i].strip()
+
+        # Skip desc: lines
+        if line.startswith('desc:'):
+            return None, i + 1
+
+        # Check for error pattern
+        if not re.match(r'^[A-Z][A-Z0-9_]+,', line):
+            return None, i + 1
+
+        error =3D {'line': line, 'desc': ''}
+
+        # Look for desc: continuation
+        i +=3D 1
+        desc_lines =3D []
+        while i < len(lines):
+            next_line =3D lines[i].strip()
+            if next_line.startswith('desc:'):
+                desc_lines.append(next_line[5:].strip())
+                i +=3D 1
+            elif not next_line or re.match(r'^[A-Z][A-Z0-9_]+,', next_line=
):
+                break
+            else:
+                desc_lines.append(next_line)
+                i +=3D 1
+
+        if desc_lines:
+            error['desc'] =3D ' '.join(desc_lines)
+
+        return error, i
+
+    def _parse_lock_item(self, lines, i):
+        """Parse a single lock specification"""
+        line =3D lines[i].strip()
+        if ':' not in line:
+            return None, i + 1
+
+        parts =3D line.split(':', 1)[1].strip().split(',', 1)
+        if len(parts) < 2:
+            return None, i + 1
+
+        lock =3D {
+            'name': parts[0].strip(),
+            'type': parts[1].strip()
+        }
+
+        subfields, next_i =3D self._parse_subfields(lines, i)
+
+        # Map boolean fields
+        for field in ['acquired', 'released', 'held-on-entry', 'held-on-ex=
it']:
+            if subfields.get(field, '').lower() =3D=3D 'true':
+                lock[field] =3D True
+
+        lock['desc'] =3D subfields.get('desc', '')
+
+        return lock, next_i
+
+    def _parse_constraint_item(self, lines, i):
+        """Parse a single constraint specification"""
+        line =3D lines[i].strip()
+
+        # Check for old format with comma
+        if ',' in line:
+            parts =3D line.split(',', 1)
+            constraint =3D {
+                'name': parts[0].strip(),
+                'desc': parts[1].strip() if len(parts) > 1 else '',
+                'expr': None
+            }
+        else:
+            constraint =3D {'name': line, 'desc': '', 'expr': None}
+
+        subfields, next_i =3D self._parse_subfields(lines, i)
+
+        if 'desc' in subfields:
+            constraint['desc'] =3D (constraint['desc'] + ' ' + subfields['=
desc']).strip()
+        constraint['expr'] =3D subfields.get('expr')
+
+        return constraint, next_i
+
+    def _parse_side_effect_item(self, lines, i):
+        """Parse a single side effect specification"""
+        line =3D lines[i].strip()
+
+        # Default to new format
+        effect =3D {
+            'type': line,
+            'target': '',
+            'desc': '',
+            'condition': None,
+            'reversible': False
+        }
+
+        # Check for old format with commas
+        if ',' in line:
+            # Handle condition and reversible flags
+            cond_match =3D re.search(r',\s*condition=3D([^,]+?)(?:\s*,\s*r=
eversible=3D(yes|no)\s*)?$', line)
+            if cond_match:
+                effect['condition'] =3D cond_match.group(1).strip()
+                effect['reversible'] =3D cond_match.group(2) =3D=3D 'yes'
+                line =3D line[:cond_match.start()]
+            elif ', reversible=3Dyes' in line:
+                effect['reversible'] =3D True
+                line =3D line.replace(', reversible=3Dyes', '')
+            elif ', reversible=3Dno' in line:
+                line =3D line.replace(', reversible=3Dno', '')
+
+            parts =3D line.split(',', 2)
+            if len(parts) >=3D 1:
+                effect['type'] =3D parts[0].strip()
+            if len(parts) >=3D 2:
+                effect['target'] =3D parts[1].strip()
+            if len(parts) >=3D 3:
+                effect['desc'] =3D parts[2].strip()
+        else:
+            # Multi-line format with subfields
+            subfields, next_i =3D self._parse_subfields(lines, i)
+            effect.update({
+                'target': subfields.get('target', ''),
+                'desc': subfields.get('desc', ''),
+                'condition': subfields.get('condition'),
+                'reversible': subfields.get('reversible', '').lower() =3D=
=3D 'yes'
+            })
+            return effect, next_i
+
+        return effect, i + 1
+
+    def _parse_state_trans_item(self, lines, i):
+        """Parse a single state transition specification"""
+        line =3D lines[i].strip()
+
+        trans =3D {
+            'target': line,
+            'from': '',
+            'to': '',
+            'condition': '',
+            'desc': ''
+        }
+
+        # Check for old format with commas
+        if ',' in line:
+            parts =3D line.split(',', 3)
+            if len(parts) >=3D 1:
+                trans['target'] =3D parts[0].strip()
+            if len(parts) >=3D 2:
+                trans['from'] =3D parts[1].strip()
+            if len(parts) >=3D 3:
+                trans['to'] =3D parts[2].strip()
+            if len(parts) >=3D 4:
+                desc_part =3D parts[3].strip()
+                desc_parts =3D desc_part.split(',', 1)
+                if len(desc_parts) > 1:
+                    trans['condition'] =3D desc_parts[0].strip()
+                    trans['desc'] =3D desc_parts[1].strip()
+                else:
+                    trans['desc'] =3D desc_part
+            return trans, i + 1
+        else:
+            # Multi-line format with subfields
+            subfields, next_i =3D self._parse_subfields(lines, i)
+            trans.update({
+                'from': subfields.get('from', ''),
+                'to': subfields.get('to', ''),
+                'condition': subfields.get('condition', ''),
+                'desc': subfields.get('desc', '')
+            })
+            return trans, next_i
+
+    def _process_parameters(self, sections, parameterlist, parameterdescs,=
 parametertypes):
+        """Process and output parameter specifications"""
+        param_count =3D len(parameterlist)
+        if param_count > 0:
+            self.data +=3D f"\n\tKAPI_PARAM_COUNT({param_count})\n"
+
+        for param_idx, param in enumerate(parameterlist):
+            param_name =3D param.strip()
+            param_desc =3D parameterdescs.get(param_name, '')
+            param_ctype =3D parametertypes.get(param_name, '')
+
+            # Parse parameter specifications
+            param_section =3D self._get_raw_section(sections, 'param')
+            param_specs =3D {}
+            if param_section:
+                param_specs =3D self._parse_param_spec(param_section, para=
m_name)
+
+            self.data +=3D f"\n\tKAPI_PARAM({param_idx}, {self._format_mac=
ro_param(param_name)}, "
+            self.data +=3D f"{self._format_macro_param(param_ctype)}, {sel=
f._format_macro_param(param_desc)})\n"
+
+            # Add parameter attributes
+            for key, macro in [
+                ('param-type', 'KAPI_PARAM_TYPE'),
+                ('param-flags', 'KAPI_PARAM_FLAGS'),
+                ('param-size', 'KAPI_PARAM_SIZE'),
+                ('param-alignment', 'KAPI_PARAM_ALIGNMENT'),
+            ]:
+                if key in param_specs:
+                    self.data +=3D f"\t\t{macro}({param_specs[key]})\n"
+
+            # Handle constraint type
+            if 'param-constraint-type' in param_specs:
+                ctype =3D param_specs['param-constraint-type']
+                if ctype =3D=3D 'KAPI_CONSTRAINT_BITMASK':
+                    ctype =3D 'KAPI_CONSTRAINT_MASK'
+                self.data +=3D f"\t\tKAPI_PARAM_CONSTRAINT_TYPE({ctype})\n"
+
+            # Handle range
+            if 'param-range' in param_specs and ',' in param_specs['param-=
range']:
+                min_val, max_val =3D param_specs['param-range'].split(',',=
 1)
+                self.data +=3D f"\t\tKAPI_PARAM_RANGE({min_val.strip()}, {=
max_val.strip()})\n"
+
+            # Handle mask
+            if 'param-mask' in param_specs:
+                self.data +=3D f"\t\tKAPI_PARAM_VALID_MASK({param_specs['p=
aram-mask']})\n"
+
+            # Handle enum values
+            if 'param-enum-values' in param_specs:
+                self.data +=3D f"\t\tKAPI_PARAM_ENUM_VALUES({param_specs['=
param-enum-values']})\n"
+
+            # Handle constraint description
+            if 'param-constraint' in param_specs:
+                self.data +=3D f"\t\tKAPI_PARAM_CONSTRAINT({self._format_m=
acro_param(param_specs['param-constraint'])})\n"
+
+            self.data +=3D "\tKAPI_PARAM_END\n"
+
+    def _parse_param_spec(self, section_content, param_name):
+        """Parse parameter specifications from indented format"""
+        specs =3D {}
+        lines =3D section_content.strip().split('\n')
+        current_item =3D None
+
+        # Map to expected keys
+        field_map =3D {
+            'flags': 'param-flags',
+            'size': 'param-size',
+            'constraint-type': 'param-constraint-type',
+            'constraint': 'param-constraint',
+            'range': 'param-range',
+            'mask': 'param-mask',
+            'valid-mask': 'param-mask',
+            'valid-values': 'param-enum-values',
+            'alignment': 'param-alignment',
+            'struct-type': 'param-struct-type',
+        }
+
+        i =3D 0
+        while i < len(lines):
+            line =3D lines[i]
+            if not line.strip():
+                i +=3D 1
+                continue
+
+            # Check if this is our parameter (non-indented line)
+            if not line.startswith((' ', '\t')):
+                parts =3D line.strip().split(',', 1)
+                current_item =3D param_name if parts[0].strip() =3D=3D par=
am_name else None
+                if current_item and len(parts) > 1:
+                    specs['param-type'] =3D parts[1].strip()
+                i +=3D 1
+            elif current_item =3D=3D param_name:
+                # Parse subfield
+                stripped =3D line.strip()
+                if ':' in stripped:
+                    key, value =3D stripped.split(':', 1)
+                    key =3D key.strip()
+                    value =3D value.strip()
+
+                    # Collect continuation lines (indented lines without a=
 colon that
+                    # defines a new key, i.e., lines that are pure continu=
ations)
+                    i +=3D 1
+                    while i < len(lines):
+                        next_line =3D lines[i]
+                        # Stop if we hit a non-indented line (new param)
+                        if next_line.strip() and not next_line.startswith(=
(' ', '\t')):
+                            break
+                        next_stripped =3D next_line.strip()
+                        # Stop if we hit a new key (contains colon with kn=
own key prefix)
+                        if next_stripped and ':' in next_stripped:
+                            potential_key =3D next_stripped.split(':', 1)[=
0].strip()
+                            if potential_key in field_map or potential_key=
 in ['type', 'desc']:
+                                break
+                        # This is a continuation line
+                        if next_stripped:
+                            value =3D value + ' ' + next_stripped
+                        i +=3D 1
+
+                    if key in field_map:
+                        # Clean up the value - remove excessive whitespace
+                        value =3D ' '.join(value.split())
+                        specs[field_map[key]] =3D value
+            else:
+                i +=3D 1
+
+        return specs
+
+    def _validate_effect_type(self, effect_type):
+        """Validate and normalize effect type"""
+        if 'KAPI_EFFECT_SCHEDULER' in effect_type:
+            return effect_type.replace('KAPI_EFFECT_SCHEDULER', 'KAPI_EFFE=
CT_SCHEDULE')
+
+        if 'KAPI_EFFECT_' in effect_type and effect_type not in VALID_EFFE=
CT_TYPES:
+            if '|' in effect_type:
+                parts =3D [p.strip() for p in effect_type.split('|')]
+                valid_parts =3D [p if p in VALID_EFFECT_TYPES else 'KAPI_E=
FFECT_MODIFY_STATE' for p in parts]
+                return ' | '.join(valid_parts)
+            return 'KAPI_EFFECT_MODIFY_STATE'
+
+        return effect_type
+
+    def _has_api_spec(self, sections):
+        """Check if this function has an API specification.
+
+        Returns True if at least 2 KAPI-specific section indicators are pr=
esent.
+        We require 2+ indicators (not just 1) to avoid false positives from
+        regular kernel-doc comments that happen to use a common section na=
me
+        like 'return' or 'error'. Having multiple KAPI sections strongly
+        suggests intentional API specification rather than coincidence.
+        """
+        indicators =3D [
+            'api-type', 'context-flags', 'param-type', 'error-code',
+            'capability', 'signal', 'lock', 'state-trans', 'constraint',
+            'return', 'error', 'side-effects', 'struct'
+        ]
+
+        count =3D sum(1 for ind in indicators
+                   if any(key.lower().startswith(ind.lower()) or
+                         key.lower().startswith('@' + ind.lower())
+                         for key in sections.keys()))
+
+        # Require 2+ indicators to distinguish from regular kernel-doc
+        return count >=3D 2
+
+    def out_function(self, fname, name, args):
+        """Generate API spec for a function"""
+        function_name =3D args.get('function', name)
+        sections =3D args.sections if hasattr(args, 'sections') else args.=
get('sections', {})
+
+        if not self._has_api_spec(sections):
+            return
+
+        parameterlist =3D args.parameterlist if hasattr(args, 'parameterli=
st') else args.get('parameterlist', [])
+        parameterdescs =3D args.parameterdescs if hasattr(args, 'parameter=
descs') else args.get('parameterdescs', {})
+        parametertypes =3D args.parametertypes if hasattr(args, 'parameter=
types') else args.get('parametertypes', {})
+        purpose =3D args.get('purpose', '')
+
+        # Start macro invocation
+        self.data +=3D f"DEFINE_KERNEL_API_SPEC({function_name})\n"
+
+        # Basic info
+        if purpose:
+            self.data +=3D f"\tKAPI_DESCRIPTION({self._format_macro_param(=
purpose)})\n"
+
+        long_desc =3D self._get_section(sections, 'long-desc')
+        if long_desc:
+            self.data +=3D f"\tKAPI_LONG_DESC({self._format_macro_param(lo=
ng_desc)})\n"
+
+        # Context flags
+        context =3D self._get_section(sections, 'context-flags') or self._=
get_section(sections, 'context')
+        if context:
+            self.data +=3D f"\tKAPI_CONTEXT({context})\n"
+
+        # Process parameters
+        self._process_parameters(sections, parameterlist, parameterdescs, =
parametertypes)
+
+        # Process errors
+        errors =3D self._parse_indented_items(
+            self._get_raw_section(sections, 'error'),
+            self._parse_error_item
+        )
+
+        if errors:
+            self.data +=3D f"\n\tKAPI_RETURN_ERROR_COUNT({len(errors)})\n"
+            self.data +=3D f"\n\tKAPI_ERROR_COUNT({len(errors)})\n"
+
+            for idx, error in enumerate(errors):
+                self._output_error(idx, error)
+
+        # Process signals
+        signals =3D self._parse_indented_items(
+            self._get_raw_section(sections, 'signal'),
+            self._parse_signal_item
+        )
+
+        if signals:
+            self.data +=3D f"\n\tKAPI_SIGNAL_COUNT({len(signals)})\n"
+
+            for idx, signal in enumerate(signals):
+                self._output_signal(idx, signal)
+
+        # Process other specifications
+        self._process_locks(sections)
+        self._process_constraints(sections)
+        self._process_side_effects(sections)
+        self._process_state_transitions(sections)
+        self._process_capabilities(sections)
+
+        # Add examples and notes
+        for key, macro in [('examples', 'KAPI_EXAMPLES'), ('notes', 'KAPI_=
NOTES')]:
+            value =3D self._get_section(sections, key)
+            if value:
+                self.data +=3D f"\n\t{macro}({self._format_macro_param(val=
ue)})\n"
+
+        self.data +=3D "\nKAPI_END_SPEC;\n\n"
+
+    def _output_error(self, idx, error):
+        """Output a single error specification"""
+        line =3D error['line']
+        if line.startswith('-'):
+            line =3D line[1:].strip()
+
+        parts =3D line.split(',', 2)
+        if len(parts) =3D=3D 2:
+            # Format: NAME, description
+            name =3D parts[0].strip()
+            short_desc =3D parts[1].strip()
+            code =3D f"-{name}"
+        elif len(parts) >=3D 3:
+            # Format: code, name, description
+            code =3D parts[0].strip()
+            name =3D parts[1].strip()
+            short_desc =3D parts[2].strip()
+            if not code.startswith('-'):
+                code =3D f"-{code}"
+        else:
+            return
+
+        long_desc =3D error.get('desc', '') or short_desc
+
+        self.data +=3D f"\n\tKAPI_ERROR({idx}, {code}, {self._format_macro=
_param(name)}, "
+        self.data +=3D f"{self._format_macro_param(short_desc)},\n\t\t   {=
self._format_macro_param(long_desc)})\n"
+
+    def _output_signal(self, idx, signal):
+        """Output a single signal specification"""
+        self.data +=3D f"\n\tKAPI_SIGNAL({idx}, {signal['number']}, "
+        self.data +=3D f"{self._format_macro_param(signal['name'], KAPI_MA=
X_SIGNAL_NAME_LEN)}, "
+        self.data +=3D f"{signal['direction']}, {signal['action']})\n"
+
+        for key, macro in [
+            ('condition', 'KAPI_SIGNAL_CONDITION'),
+            ('desc', 'KAPI_SIGNAL_DESC'),
+            ('error', 'KAPI_SIGNAL_ERROR'),
+            ('timing', 'KAPI_SIGNAL_TIMING'),
+            ('priority', 'KAPI_SIGNAL_PRIORITY'),
+        ]:
+            if signal.get(key):
+                # Priority field is numeric
+                if key =3D=3D 'priority':
+                    self.data +=3D f"\t\t{macro}({signal[key]})\n"
+                else:
+                    self.data +=3D f"\t\t{macro}({self._format_macro_param=
(signal[key])})\n"
+
+        if signal.get('interruptible'):
+            self.data +=3D "\t\tKAPI_SIGNAL_INTERRUPTIBLE\n"
+
+        self.data +=3D "\tKAPI_SIGNAL_END\n"
+
+    def _process_locks(self, sections):
+        """Process lock specifications"""
+        locks =3D self._parse_indented_items(
+            self._get_raw_section(sections, 'lock'),
+            self._parse_lock_item
+        )
+
+        if locks:
+            self.data +=3D f"\n\tKAPI_LOCK_COUNT({len(locks)})\n"
+
+            for idx, lock in enumerate(locks):
+                self.data +=3D f"\n\tKAPI_LOCK({idx}, {self._format_macro_=
param(lock['name'])}, {lock['type']})\n"
+
+                for flag in ['acquired', 'released']:
+                    if lock.get(flag):
+                        self.data +=3D f"\t\tKAPI_LOCK_{flag.upper()}\n"
+
+                if lock.get('desc'):
+                    self.data +=3D f"\t\tKAPI_LOCK_DESC({self._format_macr=
o_param(lock['desc'])})\n"
+
+                self.data +=3D "\tKAPI_LOCK_END\n"
+
+    def _process_constraints(self, sections):
+        """Process constraint specifications"""
+        constraints =3D self._parse_indented_items(
+            self._get_raw_section(sections, 'constraint'),
+            self._parse_constraint_item
+        )
+
+        if constraints:
+            self.data +=3D f"\n\tKAPI_CONSTRAINT_COUNT({len(constraints)})=
\n"
+
+            for idx, constraint in enumerate(constraints):
+                self.data +=3D f"\n\tKAPI_CONSTRAINT({idx}, {self._format_=
macro_param(constraint['name'])},\n"
+                self.data +=3D f"\t\t\t{self._format_macro_param(constrain=
t['desc'])})\n"
+
+                if constraint.get('expr'):
+                    self.data +=3D f"\t\tKAPI_CONSTRAINT_EXPR({self._forma=
t_macro_param(constraint['expr'])})\n"
+
+                self.data +=3D "\tKAPI_CONSTRAINT_END\n"
+
+    def _process_side_effects(self, sections):
+        """Process side effect specifications"""
+        effects =3D self._parse_indented_items(
+            self._get_raw_section(sections, 'side-effect'),
+            self._parse_side_effect_item
+        )
+
+        if effects:
+            self.data +=3D f"\n\tKAPI_SIDE_EFFECT_COUNT({len(effects)})\n"
+
+            for idx, effect in enumerate(effects):
+                effect_type =3D self._validate_effect_type(effect['type'])
+
+                self.data +=3D f"\n\tKAPI_SIDE_EFFECT({idx}, {effect_type}=
,\n"
+                self.data +=3D f"\t\t\t {self._format_macro_param(effect['=
target'])},\n"
+                self.data +=3D f"\t\t\t {self._format_macro_param(effect['=
desc'])})\n"
+
+                if effect.get('condition'):
+                    self.data +=3D f"\t\tKAPI_EFFECT_CONDITION({self._form=
at_macro_param(effect['condition'])})\n"
+
+                if effect.get('reversible'):
+                    self.data +=3D "\t\tKAPI_EFFECT_REVERSIBLE\n"
+
+                self.data +=3D "\tKAPI_SIDE_EFFECT_END\n"
+
+    def _process_state_transitions(self, sections):
+        """Process state transition specifications"""
+        transitions =3D self._parse_indented_items(
+            self._get_raw_section(sections, 'state-trans'),
+            self._parse_state_trans_item
+        )
+
+        if transitions:
+            self.data +=3D f"\n\tKAPI_STATE_TRANS_COUNT({len(transitions)}=
)\n"
+
+            for idx, trans in enumerate(transitions):
+                desc =3D trans['desc']
+                if trans.get('condition'):
+                    desc =3D trans['condition'] + (', ' + desc if desc els=
e '')
+
+                self.data +=3D f"\n\tKAPI_STATE_TRANS({idx}, {self._format=
_macro_param(trans['target'])}, "
+                self.data +=3D f"{self._format_macro_param(trans['from'])}=
, {self._format_macro_param(trans['to'])},\n"
+                self.data +=3D f"\t\t\t {self._format_macro_param(desc)})\=
n"
+                self.data +=3D "\tKAPI_STATE_TRANS_END\n"
+
+    def _process_capabilities(self, sections):
+        """Process capability specifications"""
+        cap_section =3D self._get_raw_section(sections, 'capability')
+        if not cap_section:
+            return
+
+        lines =3D cap_section.strip().split('\n')
+        capabilities =3D []
+        i =3D 0
+
+        while i < len(lines):
+            line =3D lines[i].strip()
+            if not line or line.startswith(('allows:', 'without:', 'condit=
ion:', 'priority:')):
+                i +=3D 1
+                continue
+
+            cap_info =3D {'line': line}
+
+            # Parse subfields
+            subfields, next_i =3D self._parse_subfields(lines, i)
+            cap_info.update(subfields)
+            capabilities.append(cap_info)
+            i =3D next_i
+
+        if capabilities:
+            self.data +=3D f"\n\tKAPI_CAPABILITY_COUNT({len(capabilities)}=
)\n"
+
+            for idx, cap in enumerate(capabilities):
+                parts =3D cap['line'].split(',', 2)
+                if len(parts) >=3D 2:
+                    cap_name =3D parts[0].strip()
+                    cap_type =3D parts[1].strip()
+                    cap_desc =3D parts[2].strip() if len(parts) > 2 else c=
ap_name
+
+                    # Fix common type issues
+                    if 'BYPASS' in cap_type and cap_type !=3D 'KAPI_CAP_BY=
PASS_CHECK':
+                        cap_type =3D 'KAPI_CAP_BYPASS_CHECK'
+
+                    self.data +=3D f"\n\tKAPI_CAPABILITY({idx}, {cap_name}=
, {self._format_macro_param(cap_desc)}, {cap_type})\n"
+
+                    for key, macro in [
+                        ('allows', 'KAPI_CAP_ALLOWS'),
+                        ('without', 'KAPI_CAP_WITHOUT'),
+                        ('condition', 'KAPI_CAP_CONDITION'),
+                        ('priority', 'KAPI_CAP_PRIORITY'),
+                    ]:
+                        if cap.get(key):
+                            value =3D self._format_macro_param(cap[key]) i=
f key !=3D 'priority' else cap[key]
+                            self.data +=3D f"\t\t{macro}({value})\n"
+
+                    self.data +=3D "\tKAPI_CAPABILITY_END\n"
+
+    # Skip output methods for non-function types
+    def out_enum(self, fname, name, args): pass
+    def out_typedef(self, fname, name, args): pass
+    def out_struct(self, fname, name, args): pass
+    def out_doc(self, fname, name, args): pass
diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/k=
doc_output.py
index b1aaa7fc36041..cc5752cd76a8d 100644
--- a/tools/lib/python/kdoc/kdoc_output.py
+++ b/tools/lib/python/kdoc/kdoc_output.py
@@ -124,8 +124,13 @@ class OutputFormat:
         Output warnings for identifiers that will be displayed.
         """
=20
-        for log_msg in args.warnings:
-            self.config.warning(log_msg)
+        warnings =3D getattr(args, 'warnings', [])
+
+        for log_msg in warnings:
+            # Skip numeric warnings (line numbers) which are false positiv=
es
+            # from parameter-specific sections like "param-constraint: nam=
e, value"
+            if not isinstance(log_msg, int):
+                self.config.warning(log_msg)
=20
     def check_doc(self, name, args):
         """Check if DOC should be output"""
diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k=
doc_parser.py
index 500aafc500322..ecd218e762a34 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -31,6 +31,23 @@ from kdoc.kdoc_item import KdocItem
 # Allow whitespace at end of comment start.
 doc_start =3D KernRe(r'^/\*\*\s*$', cache=3DFalse)
=20
+# Sections that are allowed to be duplicated for API specifications
+# These represent lists of items (multiple errors, signals, etc.)
+ALLOWED_DUPLICATE_SECTIONS =3D {
+    'param', '@param',
+    'error', '@error',
+    'signal', '@signal',
+    'lock', '@lock',
+    'side-effect', '@side-effect',
+    'state-trans', '@state-trans',
+    'capability', '@capability',
+    'constraint', '@constraint',
+    'validation-group', '@validation-group',
+    'validation-rule', '@validation-rule',
+    'validation-flag', '@validation-flag',
+    'struct-field', '@struct-field',
+}
+
 doc_end =3D KernRe(r'\*/', cache=3DFalse)
 doc_com =3D KernRe(r'\s*\*\s*', cache=3DFalse)
 doc_com_body =3D KernRe(r'\s*\* ?', cache=3DFalse)
@@ -43,10 +60,71 @@ doc_decl =3D doc_com + KernRe(r'(\w+)', cache=3DFalse)
 #         @{section-name}:
 # while trying to not match literal block starts like "example::"
 #
+# Base kernel-doc section names
 known_section_names =3D 'description|context|returns?|notes?|examples?'
-known_sections =3D KernRe(known_section_names, flags =3D re.I)
+
+# API specification section names (for KAPI spec framework)
+# Format: (base_name, has_count_variant, has_other_variants)
+# Sections with has_count_variant=3DTrue need negative lookahead in doc_se=
ct
+# to avoid matching 'error' when 'error-count' is intended
+_kapi_base_sections =3D [
+    # (name, needs_lookahead, additional_variants)
+    ('api-type', False, []),
+    ('api-version', False, []),
+    ('param', True, []),  # has param-count
+    ('struct', True, ['struct-type', 'struct-field', 'struct-field-[a-z\\-=
]+']),
+    ('validation-group', False, []),
+    ('validation-policy', False, []),
+    ('validation-flag', False, []),
+    ('validation-rule', False, []),
+    ('error', True, ['error-code', 'error-condition']),
+    ('capability', True, []),
+    ('signal', True, []),
+    ('lock', True, []),
+    ('since', False, ['since-version']),
+    ('context-flags', False, []),
+    ('return', True, ['return-type', 'return-check', 'return-check-type',
+                      'return-success', 'return-desc']),
+    ('long-desc', False, []),
+    ('constraint', True, []),
+    ('side-effect', True, []),
+    ('state-trans', True, []),
+]
+
+def _build_kapi_patterns():
+    """Build KAPI section patterns from the base definitions."""
+    validation_parts =3D []  # For known_sections (simple validation)
+    parsing_parts =3D []     # For doc_sect (with negative lookaheads)
+
+    for name, has_count, variants in _kapi_base_sections:
+        # Add base name (with optional @ prefix)
+        validation_parts.append(f'@?{name}')
+        if has_count:
+            # Need negative lookahead to not match 'name-count' or 'name-*'
+            parsing_parts.append(f'@?{name}(?!-)')
+            validation_parts.append(f'@?{name}-count')
+            parsing_parts.append(f'@?{name}-count')
+        else:
+            parsing_parts.append(f'@?{name}')
+
+        # Add variants
+        for variant in variants:
+            validation_parts.append(f'@?{variant}')
+            parsing_parts.append(f'@?{variant}')
+
+    # Add catch-all for kapi-* extensions
+    validation_parts.append(r'@?kapi-.*')
+    parsing_parts.append(r'@?kapi-.*')
+
+    return '|'.join(validation_parts), '|'.join(parsing_parts)
+
+_kapi_validation_pattern, _kapi_parsing_pattern =3D _build_kapi_patterns()
+
+known_sections =3D KernRe(known_section_names + '|' + _kapi_validation_pat=
tern,
+                        flags=3Dre.I)
 doc_sect =3D doc_com + \
-    KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)=
?$',
+    KernRe(r'\s*(@[.\w\-]+|@\.\.\.|' + known_section_names + '|' +
+           _kapi_parsing_pattern + r')\s*:([^:].*)?$',
            flags=3Dre.I, cache=3DFalse)
=20
 doc_content =3D doc_com_body + KernRe(r'(.*)', cache=3DFalse)
@@ -342,7 +420,9 @@ class KernelEntry:
         else:
             if name in self.sections and self.sections[name] !=3D "":
                 # Only warn on user-specified duplicate section names
-                if name !=3D SECTION_DEFAULT:
+                # Skip warning for sections that are expected to have dupl=
icates
+                # (like error, param, signal, etc. for API specifications)
+                if name !=3D SECTION_DEFAULT and name not in ALLOWED_DUPLI=
CATE_SECTIONS:
                     self.emit_msg(self.new_start_line,
                                   f"duplicate section name '{name}'")
                 # Treat as a new paragraph - add a blank line
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 031E12DECC5;
	Thu, 18 Dec 2025 20:42:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090569; cv=none;
 b=jD4i7vNWCz8m80HNhW7oEyyEG7eouRratkSc/qif4B4URZ7bOedPUZelmJ8EcO7PH2I+R1TAA/caGxtdZVC3W6o5ZZ3T5pf2LeYq6cqMcFeiY/++Zc50uBTE4xLANukaZjS9cHAKmjX9598i5xn5yk2J9fpsXNPJ5J0Pd3+i+cM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090569; c=relaxed/simple;
	bh=4njXAcsXQgxzf4FI+Pl/TwMQWMtJkYo+TBa/CAdD/RQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ZALtr1RWIYB8bx4u9YEL++SdIu1yiPn8/tOmPFjwi8Yu7b0piYjAMDUvAQXiPq1KqLOO9ymxRiCXAaf0Pj4uiHbb1C4ZqVdpx7cdwdGTcH4VvqMNonhYPGBiIqArTKXk95mAFSYQPfj+z3fMmFiCDIz8DX7WDbEkIXhswG3Uc64=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=ogzR9M+Y; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="ogzR9M+Y"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3DD3EC16AAE;
	Thu, 18 Dec 2025 20:42:48 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090568;
	bh=4njXAcsXQgxzf4FI+Pl/TwMQWMtJkYo+TBa/CAdD/RQ=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=ogzR9M+Y/t6RFbp3aBNkqojGdsWkWS4Hio91QkeoAErCWQXy6ZO38CuHD+oL7ssBo
	 HwN74EuIoVBpI1ty7xBqke1c+xlWTVTOJLsL91g/EEXe7XND5c1UWY04LsE9E4u54M
	 7XXgw+F2aBHwmEkMGzaCWx0S4Mo2Wjbl8Su1LiiOdMsmveKkNZbZuZXXhI+x7qsHyt
	 xBeKwIsEu1IEO8DrwLftJ+/XktfDI+zbu5CsiB+OrF+qXWnC8sxx0e9ebA0NRAEHYN
	 xPn+h3oq+q9PbNGIcXZ5d4lg2uS6kYqW02yweYkP1chr+59OV1MJlciHikqodiev/d
	 lamxhscxAzxVQ==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 03/15] kernel/api: add debugfs interface for kernel API
 specifications
Date: Thu, 18 Dec 2025 15:42:25 -0500
Message-ID: <20251218204239.4159453-4-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a debugfs interface to expose kernel API specifications at runtime.
This allows tools and users to query the complete API specifications
through the debugfs filesystem.

The interface provides:
- /sys/kernel/debug/kapi/list - lists all available API specifications
- /sys/kernel/debug/kapi/specs/<name> - detailed info for each API

Each specification file includes:
- Function name, version, and descriptions
- Execution context requirements and flags
- Parameter details with types, flags, and constraints
- Return value specifications and success conditions
- Error codes with descriptions and conditions
- Locking requirements and constraints
- Signal handling specifications
- Examples, notes, and deprecation status

This enables runtime introspection of kernel APIs for documentation
tools, static analyzers, and debugging purposes.

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/api/Kconfig        |  20 +++
 kernel/api/Makefile       |   3 +
 kernel/api/kapi_debugfs.c | 358 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 381 insertions(+)
 create mode 100644 kernel/api/kapi_debugfs.c

diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
index fde25ec70e134..d2754b21acc43 100644
--- a/kernel/api/Kconfig
+++ b/kernel/api/Kconfig
@@ -33,3 +33,23 @@ config KAPI_RUNTIME_CHECKS
 	  development. The checks use WARN_ONCE to report violations.
=20
 	  If unsure, say N.
+
+config KAPI_SPEC_DEBUGFS
+	bool "Export kernel API specifications via debugfs"
+	depends on KAPI_SPEC
+	depends on DEBUG_FS
+	help
+	  This option enables exporting kernel API specifications through
+	  the debugfs filesystem. When enabled, specifications can be
+	  accessed at /sys/kernel/debug/kapi/.
+
+	  The debugfs interface provides:
+	  - A list of all available API specifications
+	  - Detailed information for each API including parameters,
+	    return values, errors, locking requirements, and constraints
+	  - Complete machine-readable representation of the specs
+
+	  This is useful for documentation tools, static analyzers, and
+	  runtime introspection of kernel APIs.
+
+	  If unsure, say N.
diff --git a/kernel/api/Makefile b/kernel/api/Makefile
index acab17c78afa3..716f128eea71d 100644
--- a/kernel/api/Makefile
+++ b/kernel/api/Makefile
@@ -10,6 +10,9 @@ obj-$(CONFIG_KAPI_SPEC)		+=3D kernel_api_spec.o
 ifeq ($(CONFIG_KAPI_SPEC),y)
 obj-$(CONFIG_KAPI_SPEC)		+=3D generated_api_specs.o
=20
+# Debugfs interface for kernel API specs
+obj-$(CONFIG_KAPI_SPEC_DEBUGFS) +=3D kapi_debugfs.o
+
 # Find all potential apispec files (this is evaluated at make time)
 apispec-files :=3D $(shell find $(objtree) -name "*.apispec.h" -type f 2>/=
dev/null)
=20
diff --git a/kernel/api/kapi_debugfs.c b/kernel/api/kapi_debugfs.c
new file mode 100644
index 0000000000000..84d5446d93916
--- /dev/null
+++ b/kernel/api/kapi_debugfs.c
@@ -0,0 +1,358 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kernel API specification debugfs interface
+ *
+ * This provides a debugfs interface to expose kernel API specifications
+ * at runtime, allowing tools and users to query the complete API specs.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/kernel_api_spec.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+
+/* External symbols for kernel API spec section */
+extern struct kernel_api_spec __start_kapi_specs[];
+extern struct kernel_api_spec __stop_kapi_specs[];
+
+static struct dentry *kapi_debugfs_root;
+
+/* Helper function to print parameter type as string */
+static const char *param_type_str(enum kapi_param_type type)
+{
+	switch (type) {
+	case KAPI_TYPE_INT: return "int";
+	case KAPI_TYPE_UINT: return "uint";
+	case KAPI_TYPE_PTR: return "ptr";
+	case KAPI_TYPE_STRUCT: return "struct";
+	case KAPI_TYPE_UNION: return "union";
+	case KAPI_TYPE_ARRAY: return "array";
+	case KAPI_TYPE_FD: return "fd";
+	case KAPI_TYPE_ENUM: return "enum";
+	case KAPI_TYPE_USER_PTR: return "user_ptr";
+	case KAPI_TYPE_PATH: return "path";
+	case KAPI_TYPE_FUNC_PTR: return "func_ptr";
+	case KAPI_TYPE_CUSTOM: return "custom";
+	default: return "unknown";
+	}
+}
+
+/* Helper to print parameter flags */
+static void print_param_flags(struct seq_file *m, u32 flags)
+{
+	seq_printf(m, "    flags: ");
+	if (flags & KAPI_PARAM_IN) seq_printf(m, "IN ");
+	if (flags & KAPI_PARAM_OUT) seq_printf(m, "OUT ");
+	if (flags & KAPI_PARAM_INOUT) seq_printf(m, "INOUT ");
+	if (flags & KAPI_PARAM_OPTIONAL) seq_printf(m, "OPTIONAL ");
+	if (flags & KAPI_PARAM_CONST) seq_printf(m, "CONST ");
+	if (flags & KAPI_PARAM_USER) seq_printf(m, "USER ");
+	if (flags & KAPI_PARAM_VOLATILE) seq_printf(m, "VOLATILE ");
+	if (flags & KAPI_PARAM_DMA) seq_printf(m, "DMA ");
+	if (flags & KAPI_PARAM_ALIGNED) seq_printf(m, "ALIGNED ");
+	seq_printf(m, "\n");
+}
+
+/* Helper to print context flags */
+static void print_context_flags(struct seq_file *m, u32 flags)
+{
+	seq_printf(m, "Context flags: ");
+	if (flags & KAPI_CTX_PROCESS) seq_printf(m, "PROCESS ");
+	if (flags & KAPI_CTX_HARDIRQ) seq_printf(m, "HARDIRQ ");
+	if (flags & KAPI_CTX_SOFTIRQ) seq_printf(m, "SOFTIRQ ");
+	if (flags & KAPI_CTX_NMI) seq_printf(m, "NMI ");
+	if (flags & KAPI_CTX_SLEEPABLE) seq_printf(m, "SLEEPABLE ");
+	if (flags & KAPI_CTX_ATOMIC) seq_printf(m, "ATOMIC ");
+	if (flags & KAPI_CTX_PREEMPT_DISABLED) seq_printf(m, "PREEMPT_DISABLED ");
+	if (flags & KAPI_CTX_IRQ_DISABLED) seq_printf(m, "IRQ_DISABLED ");
+	seq_printf(m, "\n");
+}
+
+/* Show function for individual API spec */
+static int kapi_spec_show(struct seq_file *m, void *v)
+{
+	struct kernel_api_spec *spec =3D m->private;
+	int i;
+
+	seq_printf(m, "Kernel API Specification\n");
+	seq_printf(m, "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D\n\n");
+
+	/* Basic info */
+	seq_printf(m, "Name: %s\n", spec->name);
+	seq_printf(m, "Version: %u\n", spec->version);
+	seq_printf(m, "Description: %s\n", spec->description);
+	if (strlen(spec->long_description) > 0)
+		seq_printf(m, "Long description: %s\n", spec->long_description);
+
+	/* Context */
+	print_context_flags(m, spec->context_flags);
+	seq_printf(m, "\n");
+
+	/* Parameters */
+	if (spec->param_count > 0) {
+		seq_printf(m, "Parameters (%u):\n", spec->param_count);
+		for (i =3D 0; i < spec->param_count; i++) {
+			struct kapi_param_spec *param =3D &spec->params[i];
+			seq_printf(m, "  [%d] %s:\n", i, param->name);
+			seq_printf(m, "    type: %s (%s)\n",
+				   param_type_str(param->type), param->type_name);
+			print_param_flags(m, param->flags);
+			if (strlen(param->description) > 0)
+				seq_printf(m, "    description: %s\n", param->description);
+			if (param->size > 0)
+				seq_printf(m, "    size: %zu\n", param->size);
+			if (param->alignment > 0)
+				seq_printf(m, "    alignment: %zu\n", param->alignment);
+
+			/* Print constraints if any */
+			if (param->constraint_type !=3D KAPI_CONSTRAINT_NONE) {
+				seq_printf(m, "    constraints:\n");
+				switch (param->constraint_type) {
+				case KAPI_CONSTRAINT_RANGE:
+					seq_printf(m, "      type: range\n");
+					seq_printf(m, "      min: %lld\n", param->min_value);
+					seq_printf(m, "      max: %lld\n", param->max_value);
+					break;
+				case KAPI_CONSTRAINT_MASK:
+					seq_printf(m, "      type: mask\n");
+					seq_printf(m, "      valid_bits: 0x%llx\n", param->valid_mask);
+					break;
+				case KAPI_CONSTRAINT_ENUM:
+					seq_printf(m, "      type: enum\n");
+					seq_printf(m, "      count: %u\n", param->enum_count);
+					break;
+				case KAPI_CONSTRAINT_USER_STRING:
+					seq_printf(m, "      type: user_string\n");
+					seq_printf(m, "      min_len: %lld\n", param->min_value);
+					seq_printf(m, "      max_len: %lld\n", param->max_value);
+					break;
+				case KAPI_CONSTRAINT_USER_PATH:
+					seq_printf(m, "      type: user_path\n");
+					seq_printf(m, "      max_len: PATH_MAX (4096)\n");
+					break;
+				case KAPI_CONSTRAINT_USER_PTR:
+					seq_printf(m, "      type: user_ptr\n");
+					seq_printf(m, "      size: %zu bytes\n", param->size);
+					break;
+				case KAPI_CONSTRAINT_CUSTOM:
+					seq_printf(m, "      type: custom\n");
+					if (strlen(param->constraints) > 0)
+						seq_printf(m, "      description: %s\n",
+							   param->constraints);
+					break;
+				default:
+					break;
+				}
+			}
+			seq_printf(m, "\n");
+		}
+	}
+
+	/* Return value */
+	seq_printf(m, "Return value:\n");
+	seq_printf(m, "  type: %s\n", spec->return_spec.type_name);
+	if (strlen(spec->return_spec.description) > 0)
+		seq_printf(m, "  description: %s\n", spec->return_spec.description);
+
+	switch (spec->return_spec.check_type) {
+	case KAPI_RETURN_EXACT:
+		seq_printf(m, "  success: =3D=3D %lld\n", spec->return_spec.success_valu=
e);
+		break;
+	case KAPI_RETURN_RANGE:
+		seq_printf(m, "  success: [%lld, %lld]\n",
+			   spec->return_spec.success_min,
+			   spec->return_spec.success_max);
+		break;
+	case KAPI_RETURN_FD:
+		seq_printf(m, "  success: valid file descriptor (>=3D 0)\n");
+		break;
+	case KAPI_RETURN_ERROR_CHECK:
+		seq_printf(m, "  success: error check\n");
+		break;
+	case KAPI_RETURN_CUSTOM:
+		seq_printf(m, "  success: custom check\n");
+		break;
+	default:
+		break;
+	}
+	seq_printf(m, "\n");
+
+	/* Errors */
+	if (spec->error_count > 0) {
+		seq_printf(m, "Errors (%u):\n", spec->error_count);
+		for (i =3D 0; i < spec->error_count; i++) {
+			struct kapi_error_spec *err =3D &spec->errors[i];
+			seq_printf(m, "  %s (%d): %s\n",
+				   err->name, err->error_code, err->description);
+			if (strlen(err->condition) > 0)
+				seq_printf(m, "    condition: %s\n", err->condition);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Locks */
+	if (spec->lock_count > 0) {
+		seq_printf(m, "Locks (%u):\n", spec->lock_count);
+		for (i =3D 0; i < spec->lock_count; i++) {
+			struct kapi_lock_spec *lock =3D &spec->locks[i];
+			const char *type_str, *scope_str;
+			switch (lock->lock_type) {
+			case KAPI_LOCK_MUTEX: type_str =3D "mutex"; break;
+			case KAPI_LOCK_SPINLOCK: type_str =3D "spinlock"; break;
+			case KAPI_LOCK_RWLOCK: type_str =3D "rwlock"; break;
+			case KAPI_LOCK_SEMAPHORE: type_str =3D "semaphore"; break;
+			case KAPI_LOCK_RCU: type_str =3D "rcu"; break;
+			case KAPI_LOCK_SEQLOCK: type_str =3D "seqlock"; break;
+			default: type_str =3D "unknown"; break;
+			}
+			switch (lock->scope) {
+			case KAPI_LOCK_INTERNAL: scope_str =3D "acquired and released"; break;
+			case KAPI_LOCK_ACQUIRES: scope_str =3D "acquired (not released)"; break;
+			case KAPI_LOCK_RELEASES: scope_str =3D "released (held on entry)"; brea=
k;
+			case KAPI_LOCK_CALLER_HELD: scope_str =3D "held by caller"; break;
+			default: scope_str =3D "unknown"; break;
+			}
+			seq_printf(m, "  %s (%s): %s\n",
+				   lock->lock_name, type_str, lock->description);
+			seq_printf(m, "    scope: %s\n", scope_str);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Constraints */
+	if (spec->constraint_count > 0) {
+		seq_printf(m, "Additional constraints (%u):\n", spec->constraint_count);
+		for (i =3D 0; i < spec->constraint_count; i++) {
+			struct kapi_constraint_spec *cons =3D &spec->constraints[i];
+
+			seq_printf(m, "  - %s", cons->name);
+			if (cons->description[0])
+				seq_printf(m, ": %s", cons->description);
+			seq_printf(m, "\n");
+			if (cons->expression[0])
+				seq_printf(m, "    expression: %s\n", cons->expression);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Signals */
+	if (spec->signal_count > 0) {
+		seq_printf(m, "Signal handling (%u):\n", spec->signal_count);
+		for (i =3D 0; i < spec->signal_count; i++) {
+			struct kapi_signal_spec *sig =3D &spec->signals[i];
+			seq_printf(m, "  %s (%d):\n", sig->signal_name, sig->signal_num);
+			seq_printf(m, "    direction: ");
+			if (sig->direction & KAPI_SIGNAL_SEND) seq_printf(m, "send ");
+			if (sig->direction & KAPI_SIGNAL_RECEIVE) seq_printf(m, "receive ");
+			if (sig->direction & KAPI_SIGNAL_HANDLE) seq_printf(m, "handle ");
+			if (sig->direction & KAPI_SIGNAL_BLOCK) seq_printf(m, "block ");
+			if (sig->direction & KAPI_SIGNAL_IGNORE) seq_printf(m, "ignore ");
+			seq_printf(m, "\n");
+			seq_printf(m, "    action: ");
+			switch (sig->action) {
+			case KAPI_SIGNAL_ACTION_DEFAULT: seq_printf(m, "default"); break;
+			case KAPI_SIGNAL_ACTION_TERMINATE: seq_printf(m, "terminate"); break;
+			case KAPI_SIGNAL_ACTION_COREDUMP: seq_printf(m, "coredump"); break;
+			case KAPI_SIGNAL_ACTION_STOP: seq_printf(m, "stop"); break;
+			case KAPI_SIGNAL_ACTION_CONTINUE: seq_printf(m, "continue"); break;
+			case KAPI_SIGNAL_ACTION_CUSTOM: seq_printf(m, "custom"); break;
+			case KAPI_SIGNAL_ACTION_RETURN: seq_printf(m, "return"); break;
+			case KAPI_SIGNAL_ACTION_RESTART: seq_printf(m, "restart"); break;
+			default: seq_printf(m, "unknown"); break;
+			}
+			seq_printf(m, "\n");
+			if (strlen(sig->description) > 0)
+				seq_printf(m, "    description: %s\n", sig->description);
+		}
+		seq_printf(m, "\n");
+	}
+
+	/* Additional info */
+	if (strlen(spec->examples) > 0) {
+		seq_printf(m, "Examples:\n%s\n\n", spec->examples);
+	}
+	if (strlen(spec->notes) > 0) {
+		seq_printf(m, "Notes:\n%s\n\n", spec->notes);
+	}
+	if (strlen(spec->since_version) > 0) {
+		seq_printf(m, "Since: %s\n", spec->since_version);
+	}
+
+	return 0;
+}
+
+static int kapi_spec_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, kapi_spec_show, inode->i_private);
+}
+
+static const struct file_operations kapi_spec_fops =3D {
+	.open =3D kapi_spec_open,
+	.read =3D seq_read,
+	.llseek =3D seq_lseek,
+	.release =3D single_release,
+};
+
+/* Show all available API specs */
+static int kapi_list_show(struct seq_file *m, void *v)
+{
+	struct kernel_api_spec *spec;
+	int count =3D 0;
+
+	seq_printf(m, "Available Kernel API Specifications\n");
+	seq_printf(m, "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D\n\n");
+
+	for (spec =3D __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+		seq_printf(m, "%s - %s\n", spec->name, spec->description);
+		count++;
+	}
+
+	seq_printf(m, "\nTotal: %d specifications\n", count);
+	return 0;
+}
+
+static int kapi_list_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, kapi_list_show, NULL);
+}
+
+static const struct file_operations kapi_list_fops =3D {
+	.open =3D kapi_list_open,
+	.read =3D seq_read,
+	.llseek =3D seq_lseek,
+	.release =3D single_release,
+};
+
+static int __init kapi_debugfs_init(void)
+{
+	struct kernel_api_spec *spec;
+	struct dentry *spec_dir;
+
+	/* Create main directory */
+	kapi_debugfs_root =3D debugfs_create_dir("kapi", NULL);
+
+	/* Create list file */
+	debugfs_create_file("list", 0444, kapi_debugfs_root, NULL, &kapi_list_fop=
s);
+
+	/* Create specs subdirectory */
+	spec_dir =3D debugfs_create_dir("specs", kapi_debugfs_root);
+
+	/* Create a file for each API spec */
+	for (spec =3D __start_kapi_specs; spec < __stop_kapi_specs; spec++) {
+		debugfs_create_file(spec->name, 0444, spec_dir, spec, &kapi_spec_fops);
+	}
+
+	pr_info("Kernel API debugfs interface initialized\n");
+	return 0;
+}
+
+static void __exit kapi_debugfs_exit(void)
+{
+	debugfs_remove_recursive(kapi_debugfs_root);
+}
+
+/* Initialize as part of kernel, not as a module */
+fs_initcall(kapi_debugfs_init);
\ No newline at end of file
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03BD32E266C;
	Thu, 18 Dec 2025 20:42:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090570; cv=none;
 b=YcVkuUqWiJmGSEq2anlyOHzfAmZILs7tWzeq9uYQKWS+UjQEPKLPlGeJ03cFx45BDViC8agWzGql1IUU+XVnsKBm9ImglMaQPX4gpmUURagk4GHeQPambZ1XwwhCns0CX44Y2ZAjmIv3vSPZAkPDUSi4s0cdgb/epC4WYyAYz1A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090570; c=relaxed/simple;
	bh=UqsaxVMM35ucTyEU6DqnkaL2FZGYvIUE6vZ09Qp+Ne0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=rCfOuXzyn28jity3Z71mLCitZeatkM5kEhVbPPzeOCPQVU0bOtpEt5ltrGW29451vdtmOi1mVcHeDqYoDcEPypD5g79LCVglNaoPRgQvVtjc6JXk5ukpt6z/vCXhWJTw1Se1I6cNEjY5Enc+EEIFlIUHScrPDW0yHRUgLZ+8GMo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=qRU4ny9j; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="qRU4ny9j"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26C8CC116D0;
	Thu, 18 Dec 2025 20:42:49 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090569;
	bh=UqsaxVMM35ucTyEU6DqnkaL2FZGYvIUE6vZ09Qp+Ne0=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=qRU4ny9jvgtITbD7oN/6SnBhNtJUTUTPjqvCk/olIW44g+WsNl42uzy7llwYoJIUM
	 ynRcu0TGSVu6e1RRccC/sMW31lhAks533b2V65txhfsZAPkiRu1yPceox796DFeSgd
	 ueIJAkQ1ln/CeTYbItHNCNvfow7aLjYq9I9hVWHRVmXewOa7RGFwwZuBrmQTgeXMBR
	 92SbSNZ6cnuc0c3mn/HBpLZxKnUzd07610eRJpNjbBNYsHA6NMXy1szsb6UPyOvczV
	 5W0xPMPyAbCs7ehx3YEfLi44vTnxDE1NClmV/3aagtY7HO/Dj164AbQb7NGJIrC8GT
	 V1Z6x220xLrng==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 04/15] tools/kapi: Add kernel API specification
 extraction tool
Date: Thu, 18 Dec 2025 15:42:26 -0500
Message-ID: <20251218204239.4159453-5-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

The kapi tool extracts and displays kernel API specifications.

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 Documentation/dev-tools/kernel-api-spec.rst   | 198 +++-
 tools/kapi/.gitignore                         |   4 +
 tools/kapi/Cargo.toml                         |  19 +
 tools/kapi/src/extractor/debugfs.rs           | 442 +++++++++
 tools/kapi/src/extractor/kerneldoc_parser.rs  | 692 ++++++++++++++
 tools/kapi/src/extractor/mod.rs               | 464 ++++++++++
 tools/kapi/src/extractor/source_parser.rs     | 213 +++++
 .../src/extractor/vmlinux/binary_utils.rs     | 180 ++++
 .../src/extractor/vmlinux/magic_finder.rs     | 102 +++
 tools/kapi/src/extractor/vmlinux/mod.rs       | 864 ++++++++++++++++++
 tools/kapi/src/formatter/json.rs              | 468 ++++++++++
 tools/kapi/src/formatter/mod.rs               | 140 +++
 tools/kapi/src/formatter/plain.rs             | 549 +++++++++++
 tools/kapi/src/formatter/rst.rs               | 621 +++++++++++++
 tools/kapi/src/main.rs                        | 116 +++
 15 files changed, 5069 insertions(+), 3 deletions(-)
 create mode 100644 tools/kapi/.gitignore
 create mode 100644 tools/kapi/Cargo.toml
 create mode 100644 tools/kapi/src/extractor/debugfs.rs
 create mode 100644 tools/kapi/src/extractor/kerneldoc_parser.rs
 create mode 100644 tools/kapi/src/extractor/mod.rs
 create mode 100644 tools/kapi/src/extractor/source_parser.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/magic_finder.rs
 create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs
 create mode 100644 tools/kapi/src/formatter/json.rs
 create mode 100644 tools/kapi/src/formatter/mod.rs
 create mode 100644 tools/kapi/src/formatter/plain.rs
 create mode 100644 tools/kapi/src/formatter/rst.rs
 create mode 100644 tools/kapi/src/main.rs

diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/de=
v-tools/kernel-api-spec.rst
index 3a63f6711e27b..9b452753111ad 100644
--- a/Documentation/dev-tools/kernel-api-spec.rst
+++ b/Documentation/dev-tools/kernel-api-spec.rst
@@ -31,7 +31,9 @@ The framework aims to:
    common programming errors during development and testing.
=20
 3. **Support Tooling**: Export API specifications in machine-readable form=
ats for
-   use by static analyzers, documentation generators, and development tool=
s.
+   use by static analyzers, documentation generators, and development tool=
s. The
+   ``kapi`` tool (see `The kapi Tool`_) provides comprehensive extraction =
and
+   formatting capabilities.
=20
 4. **Enhance Debugging**: Provide detailed API information at runtime thro=
ugh debugfs
    for debugging and introspection.
@@ -71,6 +73,13 @@ The framework consists of several key components:
    - Type-safe parameter specifications
    - Context and constraint definitions
=20
+5. **kapi Tool** (``tools/kapi/``)
+
+   - Userspace utility for extracting specifications
+   - Multiple input sources (source, binary, debugfs)
+   - Multiple output formats (plain, JSON, RST)
+   - Testing and validation utilities
+
 Data Model
 ----------
=20
@@ -344,8 +353,177 @@ Documentation Generation
 ------------------------
=20
 The framework exports specifications via debugfs that can be used
-to generate documentation. Tools for automatic documentation generation
-from specifications are planned for future development.
+to generate documentation. The ``kapi`` tool provides comprehensive
+extraction and formatting capabilities for kernel API specifications.
+
+The kapi Tool
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Overview
+--------
+
+The ``kapi`` tool is a userspace utility that extracts and displays kernel=
 API
+specifications from multiple sources. It provides a unified interface to a=
ccess
+API documentation whether from compiled kernels, source code, or runtime s=
ystems.
+
+Installation
+------------
+
+Build the tool from the kernel source tree::
+
+    $ cd tools/kapi
+    $ cargo build --release
+
+    # Optional: Install system-wide
+    $ cargo install --path .
+
+The tool requires Rust and Cargo to build. The binary will be available at
+``tools/kapi/target/release/kapi``.
+
+Command-Line Usage
+------------------
+
+Basic syntax::
+
+    kapi [OPTIONS] [API_NAME]
+
+Options:
+
+- ``--vmlinux <PATH>``: Extract from compiled kernel binary
+- ``--source <PATH>``: Extract from kernel source code
+- ``--debugfs <PATH>``: Extract from debugfs (default: /sys/kernel/debug)
+- ``-f, --format <FORMAT>``: Output format (plain, json, rst)
+- ``-h, --help``: Display help information
+- ``-V, --version``: Display version information
+
+Input Modes
+-----------
+
+**1. Source Code Mode**
+
+Extract specifications directly from kernel source::
+
+    # Scan entire kernel source tree
+    $ kapi --source /path/to/linux
+
+    # Extract from specific file
+    $ kapi --source kernel/sched/core.c
+
+    # Get details for specific API
+    $ kapi --source /path/to/linux sys_sched_yield
+
+**2. Vmlinux Mode**
+
+Extract from compiled kernel with debug symbols::
+
+    # List all APIs in vmlinux
+    $ kapi --vmlinux /boot/vmlinux-5.15.0
+
+    # Get specific syscall details
+    $ kapi --vmlinux ./vmlinux sys_read
+
+**3. Debugfs Mode**
+
+Extract from running kernel via debugfs::
+
+    # Use default debugfs path
+    $ kapi
+
+    # Use custom debugfs mount
+    $ kapi --debugfs /mnt/debugfs
+
+    # Get specific API from running kernel
+    $ kapi sys_write
+
+Output Formats
+--------------
+
+**Plain Text Format** (default)::
+
+    $ kapi sys_read
+
+    Detailed information for sys_read:
+    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+    Description: Read from a file descriptor
+
+    Detailed Description:
+    Reads up to count bytes from file descriptor fd into the buffer starti=
ng at buf.
+
+    Execution Context:
+      - KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+
+    Parameters (3):
+
+    Available since: 1.0
+
+**JSON Format**::
+
+    $ kapi --format json sys_read
+    {
+      "api_details": {
+        "name": "sys_read",
+        "description": "Read from a file descriptor",
+        "long_description": "Reads up to count bytes...",
+        "context_flags": ["KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE"],
+        "since_version": "1.0"
+      }
+    }
+
+**ReStructuredText Format**::
+
+    $ kapi --format rst sys_read
+
+    sys_read
+    =3D=3D=3D=3D=3D=3D=3D=3D
+
+    **Read from a file descriptor**
+
+    Reads up to count bytes from file descriptor fd into the buffer...
+
+Usage Examples
+--------------
+
+**Generate complete API documentation**::
+
+    # Export all kernel APIs to JSON
+    $ kapi --source /path/to/linux --format json > kernel-apis.json
+
+    # Generate RST documentation for all syscalls
+    $ kapi --vmlinux ./vmlinux --format rst > syscalls.rst
+
+    # List APIs from specific subsystem
+    $ kapi --source drivers/gpu/drm/
+
+**Integration with other tools**::
+
+    # Find all APIs that can sleep
+    $ kapi --format json | jq '.apis[] | select(.context_flags[] | contain=
s("SLEEPABLE"))'
+
+    # Generate markdown documentation
+    $ kapi --format rst sys_mmap | pandoc -f rst -t markdown
+
+**Debugging and analysis**::
+
+    # Compare API between kernel versions
+    $ diff <(kapi --vmlinux vmlinux-5.10) <(kapi --vmlinux vmlinux-5.15)
+
+    # Check if specific API exists
+    $ kapi --source . my_custom_api || echo "API not found"
+
+Implementation Details
+----------------------
+
+The tool extracts API specifications from three sources:
+
+1. **Source Code**: Parses KAPI specification macros using regular express=
ions
+2. **Vmlinux**: Reads the ``.kapi_specs`` ELF section from compiled kernels
+3. **Debugfs**: Reads from ``/sys/kernel/debug/kapi/`` filesystem interface
+
+The tool supports all KAPI specification types:
+
+- System calls (``DEFINE_KERNEL_API_SPEC``)
+- IOCTLs (``DEFINE_IOCTL_API_SPEC``)
+- Kernel functions (``KAPI_DEFINE_SPEC``)
=20
 IDE Integration
 ---------------
@@ -357,6 +535,11 @@ Modern IDEs can use the JSON export for:
 - Context validation
 - Error code documentation
=20
+Example IDE integration::
+
+    # Generate IDE completion data
+    $ kapi --format json > .vscode/kernel-apis.json
+
 Testing Framework
 -----------------
=20
@@ -367,6 +550,15 @@ The framework includes test helpers::
     kapi_test_api("kmalloc", test_cases);
     #endif
=20
+The kapi tool can verify specifications against implementations::
+
+    # Run consistency tests
+    $ cd tools/kapi
+    $ ./test_consistency.sh
+
+    # Compare source vs binary specifications
+    $ ./compare_all_syscalls.sh
+
 Best Practices
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
diff --git a/tools/kapi/.gitignore b/tools/kapi/.gitignore
new file mode 100644
index 0000000000000..1390bfc12686c
--- /dev/null
+++ b/tools/kapi/.gitignore
@@ -0,0 +1,4 @@
+# Rust build artifacts
+/target/
+**/*.rs.bk
+
diff --git a/tools/kapi/Cargo.toml b/tools/kapi/Cargo.toml
new file mode 100644
index 0000000000000..4e6bcb10d132f
--- /dev/null
+++ b/tools/kapi/Cargo.toml
@@ -0,0 +1,19 @@
+[package]
+name =3D "kapi"
+version =3D "0.1.0"
+edition =3D "2024"
+authors =3D ["Sasha Levin <sashal@kernel.org>"]
+description =3D "Tool for extracting and displaying kernel API specificati=
ons"
+license =3D "GPL-2.0"
+
+[dependencies]
+goblin =3D "0.10"
+clap =3D { version =3D "4.4", features =3D ["derive"] }
+anyhow =3D "1.0"
+serde =3D { version =3D "1.0", features =3D ["derive"] }
+serde_json =3D "1.0"
+regex =3D "1.10"
+walkdir =3D "2.4"
+
+[dev-dependencies]
+tempfile =3D "3.8"
diff --git a/tools/kapi/src/extractor/debugfs.rs b/tools/kapi/src/extractor=
/debugfs.rs
new file mode 100644
index 0000000000000..698c51e50438f
--- /dev/null
+++ b/tools/kapi/src/extractor/debugfs.rs
@@ -0,0 +1,442 @@
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result, bail};
+use serde::Deserialize;
+use std::fs;
+use std::io::Write;
+use std::path::PathBuf;
+
+use super::{ApiExtractor, ApiSpec, CapabilitySpec, display_api_spec};
+
+#[derive(Deserialize)]
+struct KernelApiJson {
+    name: String,
+    api_type: Option<String>,
+    version: Option<u32>,
+    description: Option<String>,
+    long_description: Option<String>,
+    context_flags: Option<u32>,
+    since_version: Option<String>,
+    examples: Option<String>,
+    notes: Option<String>,
+    capabilities: Option<Vec<KernelCapabilityJson>>,
+}
+
+#[derive(Deserialize)]
+struct KernelCapabilityJson {
+    capability: i32,
+    name: String,
+    action: String,
+    allows: String,
+    without_cap: String,
+    check_condition: Option<String>,
+    priority: Option<u8>,
+    alternatives: Option<Vec<i32>>,
+}
+
+/// Extractor for kernel API specifications from debugfs
+pub struct DebugfsExtractor {
+    debugfs_path: PathBuf,
+}
+
+impl DebugfsExtractor {
+    /// Create a new debugfs extractor with the specified debugfs path
+    pub fn new(debugfs_path: Option<String>) -> Result<Self> {
+        let path =3D match debugfs_path {
+            Some(p) =3D> PathBuf::from(p),
+            None =3D> PathBuf::from("/sys/kernel/debug"),
+        };
+
+        // Check if the debugfs path exists
+        if !path.exists() {
+            bail!("Debugfs path does not exist: {}", path.display());
+        }
+
+        // Check if kapi directory exists
+        let kapi_path =3D path.join("kapi");
+        if !kapi_path.exists() {
+            bail!(
+                "Kernel API debugfs interface not found at: {}",
+                kapi_path.display()
+            );
+        }
+
+        Ok(Self { debugfs_path: path })
+    }
+
+    /// Parse the list file to get all available API names
+    fn parse_list_file(&self) -> Result<Vec<String>> {
+        let list_path =3D self.debugfs_path.join("kapi/list");
+        let content =3D fs::read_to_string(&list_path)
+            .with_context(|| format!("Failed to read {}", list_path.displa=
y()))?;
+
+        let mut apis =3D Vec::new();
+        let mut in_list =3D false;
+
+        for line in content.lines() {
+            if line.contains("=3D=3D=3D") {
+                in_list =3D true;
+                continue;
+            }
+
+            if in_list && line.starts_with("Total:") {
+                break;
+            }
+
+            if in_list && !line.trim().is_empty() {
+                // Extract API name from lines like "sys_read - Read from =
a file descriptor"
+                if let Some(name) =3D line.split(" - ").next() {
+                    apis.push(name.trim().to_string());
+                }
+            }
+        }
+
+        Ok(apis)
+    }
+
+    /// Try to parse JSON content, convert context flags from u32 to strin=
g representations
+    fn parse_context_flags(flags: u32) -> Vec<String> {
+        let mut result =3D Vec::new();
+
+        // These values should match KAPI_CTX_* flags from kernel
+        if flags & (1 << 0) !=3D 0 {
+            result.push("PROCESS".to_string());
+        }
+        if flags & (1 << 1) !=3D 0 {
+            result.push("SOFTIRQ".to_string());
+        }
+        if flags & (1 << 2) !=3D 0 {
+            result.push("HARDIRQ".to_string());
+        }
+        if flags & (1 << 3) !=3D 0 {
+            result.push("NMI".to_string());
+        }
+        if flags & (1 << 4) !=3D 0 {
+            result.push("ATOMIC".to_string());
+        }
+        if flags & (1 << 5) !=3D 0 {
+            result.push("SLEEPABLE".to_string());
+        }
+        if flags & (1 << 6) !=3D 0 {
+            result.push("PREEMPT_DISABLED".to_string());
+        }
+        if flags & (1 << 7) !=3D 0 {
+            result.push("IRQ_DISABLED".to_string());
+        }
+
+        result
+    }
+
+    /// Convert capability action from kernel representation
+    fn parse_capability_action(action: &str) -> String {
+        match action {
+            "bypass_check" =3D> "Bypasses check".to_string(),
+            "increase_limit" =3D> "Increases limit".to_string(),
+            "override_restriction" =3D> "Overrides restriction".to_string(=
),
+            "grant_permission" =3D> "Grants permission".to_string(),
+            "modify_behavior" =3D> "Modifies behavior".to_string(),
+            "access_resource" =3D> "Allows resource access".to_string(),
+            "perform_operation" =3D> "Allows operation".to_string(),
+            _ =3D> action.to_string(),
+        }
+    }
+
+    /// Try to parse as JSON first
+    fn try_parse_json(&self, content: &str) -> Option<ApiSpec> {
+        let json_data: KernelApiJson =3D serde_json::from_str(content).ok(=
)?;
+
+        let mut spec =3D ApiSpec {
+            name: json_data.name,
+            api_type: json_data.api_type.unwrap_or_else(|| "unknown".to_st=
ring()),
+            description: json_data.description,
+            long_description: json_data.long_description,
+            version: json_data.version.map(|v| v.to_string()),
+            context_flags: json_data
+                .context_flags
+                .map_or_else(Vec::new, Self::parse_context_flags),
+            param_count: None,
+            error_count: None,
+            examples: json_data.examples,
+            notes: json_data.notes,
+            since_version: json_data.since_version,
+            subsystem: None,   // Not in current JSON format
+            sysfs_path: None,  // Not in current JSON format
+            permissions: None, // Not in current JSON format
+            socket_state: None,
+            protocol_behaviors: vec![],
+            addr_families: vec![],
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: vec![],
+            parameters: vec![],
+            return_spec: None,
+            errors: vec![],
+            signals: vec![],
+            signal_masks: vec![],
+            side_effects: vec![],
+            state_transitions: vec![],
+            constraints: vec![],
+            locks: vec![],
+            struct_specs: vec![],
+        };
+
+        // Convert capabilities
+        if let Some(caps) =3D json_data.capabilities {
+            for cap in caps {
+                spec.capabilities.push(CapabilitySpec {
+                    capability: cap.capability,
+                    name: cap.name,
+                    action: Self::parse_capability_action(&cap.action),
+                    allows: cap.allows,
+                    without_cap: cap.without_cap,
+                    check_condition: cap.check_condition,
+                    priority: cap.priority,
+                    alternatives: cap.alternatives.unwrap_or_default(),
+                });
+            }
+        }
+
+        Some(spec)
+    }
+
+    /// Parse a single API specification file
+    fn parse_spec_file(&self, api_name: &str) -> Result<ApiSpec> {
+        let spec_path =3D self.debugfs_path.join(format!("kapi/specs/{}", =
api_name));
+        let content =3D fs::read_to_string(&spec_path)
+            .with_context(|| format!("Failed to read {}", spec_path.displa=
y()))?;
+
+        // Try JSON parsing first
+        if let Some(spec) =3D self.try_parse_json(&content) {
+            return Ok(spec);
+        }
+
+        // Fall back to plain text parsing
+        let mut spec =3D ApiSpec {
+            name: api_name.to_string(),
+            api_type: "unknown".to_string(),
+            description: None,
+            long_description: None,
+            version: None,
+            context_flags: Vec::new(),
+            param_count: None,
+            error_count: None,
+            examples: None,
+            notes: None,
+            since_version: None,
+            subsystem: None,
+            sysfs_path: None,
+            permissions: None,
+            socket_state: None,
+            protocol_behaviors: vec![],
+            addr_families: vec![],
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: vec![],
+            parameters: vec![],
+            return_spec: None,
+            errors: vec![],
+            signals: vec![],
+            signal_masks: vec![],
+            side_effects: vec![],
+            state_transitions: vec![],
+            constraints: vec![],
+            locks: vec![],
+            struct_specs: vec![],
+        };
+
+        // Parse the content
+        let mut collecting_multiline =3D false;
+        let mut multiline_buffer =3D String::new();
+        let mut multiline_field =3D "";
+        let mut parsing_capability =3D false;
+        let mut current_capability: Option<CapabilitySpec> =3D None;
+
+        for line in content.lines() {
+            // Handle capability sections
+            if line.starts_with("Capabilities (") {
+                continue; // Skip the header
+            }
+            if line.starts_with("  ") && line.contains(" (") && line.ends_=
with("):") {
+                // Start of a capability entry like "  CAP_IPC_LOCK (14):"
+                if let Some(cap) =3D current_capability.take() {
+                    spec.capabilities.push(cap);
+                }
+
+                let parts: Vec<&str> =3D line.trim().split(" (").collect();
+                if parts.len() =3D=3D 2 {
+                    let cap_name =3D parts[0].to_string();
+                    let cap_id =3D parts[1].trim_end_matches("):").parse()=
.unwrap_or(0);
+                    current_capability =3D Some(CapabilitySpec {
+                        capability: cap_id,
+                        name: cap_name,
+                        action: String::new(),
+                        allows: String::new(),
+                        without_cap: String::new(),
+                        check_condition: None,
+                        priority: None,
+                        alternatives: Vec::new(),
+                    });
+                    parsing_capability =3D true;
+                }
+                continue;
+            }
+            if parsing_capability && line.starts_with("    ") {
+                // Parse capability fields
+                if let Some(ref mut cap) =3D current_capability {
+                    if let Some(action) =3D line.strip_prefix("    Action:=
 ") {
+                        cap.action =3D action.to_string();
+                    } else if let Some(allows) =3D line.strip_prefix("    =
Allows: ") {
+                        cap.allows =3D allows.to_string();
+                    } else if let Some(without) =3D line.strip_prefix("   =
 Without: ") {
+                        cap.without_cap =3D without.to_string();
+                    } else if let Some(cond) =3D line.strip_prefix("    Co=
ndition: ") {
+                        cap.check_condition =3D Some(cond.to_string());
+                    } else if let Some(prio) =3D line.strip_prefix("    Pr=
iority: ") {
+                        cap.priority =3D prio.parse().ok();
+                    } else if let Some(alts) =3D line.strip_prefix("    Al=
ternatives: ") {
+                        cap.alternatives =3D
+                            alts.split(", ").filter_map(|s| s.parse().ok()=
).collect();
+                    }
+                }
+                continue;
+            }
+            if parsing_capability && !line.starts_with("  ") {
+                // End of capabilities section
+                if let Some(cap) =3D current_capability.take() {
+                    spec.capabilities.push(cap);
+                }
+                parsing_capability =3D false;
+            }
+
+            // Handle section headers
+            if line.starts_with("Parameters (") {
+                if let Some(count_str) =3D line
+                    .strip_prefix("Parameters (")
+                    .and_then(|s| s.strip_suffix("):"))
+                {
+                    spec.param_count =3D count_str.parse().ok();
+                }
+                continue;
+            } else if line.starts_with("Errors (") {
+                if let Some(count_str) =3D line
+                    .strip_prefix("Errors (")
+                    .and_then(|s| s.strip_suffix("):"))
+                {
+                    spec.error_count =3D count_str.parse().ok();
+                }
+                continue;
+            } else if line.starts_with("Examples:") {
+                collecting_multiline =3D true;
+                multiline_field =3D "examples";
+                multiline_buffer.clear();
+                continue;
+            } else if line.starts_with("Notes:") {
+                collecting_multiline =3D true;
+                multiline_field =3D "notes";
+                multiline_buffer.clear();
+                continue;
+            }
+
+            // Handle multiline sections
+            if collecting_multiline {
+                if line.trim().is_empty() && multiline_buffer.ends_with("\=
n\n") {
+                    collecting_multiline =3D false;
+                    match multiline_field {
+                        "examples" =3D> spec.examples =3D Some(multiline_b=
uffer.trim().to_string()),
+                        "notes" =3D> spec.notes =3D Some(multiline_buffer.=
trim().to_string()),
+                        _ =3D> {}
+                    }
+                    multiline_buffer.clear();
+                } else {
+                    if !multiline_buffer.is_empty() {
+                        multiline_buffer.push('\n');
+                    }
+                    multiline_buffer.push_str(line);
+                }
+                continue;
+            }
+
+            // Parse regular fields
+            if let Some(desc) =3D line.strip_prefix("Description: ") {
+                spec.description =3D Some(desc.to_string());
+            } else if let Some(long_desc) =3D line.strip_prefix("Long desc=
ription: ") {
+                spec.long_description =3D Some(long_desc.to_string());
+            } else if let Some(version) =3D line.strip_prefix("Version: ")=
 {
+                spec.version =3D Some(version.to_string());
+            } else if let Some(since) =3D line.strip_prefix("Since: ") {
+                spec.since_version =3D Some(since.to_string());
+            } else if let Some(flags) =3D line.strip_prefix("Context flags=
: ") {
+                spec.context_flags =3D flags.split_whitespace().map(str::t=
o_string).collect();
+            } else if let Some(subsys) =3D line.strip_prefix("Subsystem: "=
) {
+                spec.subsystem =3D Some(subsys.to_string());
+            } else if let Some(path) =3D line.strip_prefix("Sysfs Path: ")=
 {
+                spec.sysfs_path =3D Some(path.to_string());
+            } else if let Some(perms) =3D line.strip_prefix("Permissions: =
") {
+                spec.permissions =3D Some(perms.to_string());
+            }
+        }
+
+        // Handle any remaining capability
+        if let Some(cap) =3D current_capability.take() {
+            spec.capabilities.push(cap);
+        }
+
+        // Determine API type based on name
+        if api_name.starts_with("sys_") {
+            spec.api_type =3D "syscall".to_string();
+        } else if api_name.contains("_ioctl") || api_name.starts_with("ioc=
tl_") {
+            spec.api_type =3D "ioctl".to_string();
+        } else if api_name.contains("sysfs")
+            || api_name.ends_with("_show")
+            || api_name.ends_with("_store")
+        {
+            spec.api_type =3D "sysfs".to_string();
+        } else {
+            spec.api_type =3D "function".to_string();
+        }
+
+        Ok(spec)
+    }
+}
+
+impl ApiExtractor for DebugfsExtractor {
+    fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+        let api_names =3D self.parse_list_file()?;
+        let mut specs =3D Vec::new();
+
+        for name in api_names {
+            match self.parse_spec_file(&name) {
+                Ok(spec) =3D> specs.push(spec),
+                Err(_e) =3D> {} // Silently skip files that fail to parse
+            }
+        }
+
+        Ok(specs)
+    }
+
+    fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+        let api_names =3D self.parse_list_file()?;
+
+        if api_names.contains(&name.to_string()) {
+            Ok(Some(self.parse_spec_file(name)?))
+        } else {
+            Ok(None)
+        }
+    }
+
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        writer: &mut dyn Write,
+    ) -> Result<()> {
+        if let Some(spec) =3D self.extract_by_name(api_name)? {
+            display_api_spec(&spec, formatter, writer)?;
+        } else {
+            writeln!(writer, "API '{api_name}' not found in debugfs")?;
+        }
+
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/extractor/kerneldoc_parser.rs b/tools/kapi/src/=
extractor/kerneldoc_parser.rs
new file mode 100644
index 0000000000000..1c6924a0b5291
--- /dev/null
+++ b/tools/kapi/src/extractor/kerneldoc_parser.rs
@@ -0,0 +1,692 @@
+use super::{
+    ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec, ParamSpe=
c,
+    ReturnSpec, SideEffectSpec, SignalSpec, StateTransitionSpec, StructSpe=
c,
+    StructFieldSpec,
+};
+use anyhow::Result;
+use std::collections::HashMap;
+
+/// Real kerneldoc parser that extracts KAPI annotations
+pub struct KerneldocParserImpl;
+
+impl KerneldocParserImpl {
+    pub fn new() -> Self {
+        KerneldocParserImpl
+    }
+
+    pub fn parse_kerneldoc(
+        &self,
+        doc: &str,
+        name: &str,
+        api_type: &str,
+        _signature: Option<&str>,
+    ) -> Result<ApiSpec> {
+        let mut spec =3D ApiSpec {
+            name: name.to_string(),
+            api_type: api_type.to_string(),
+            description: None,
+            long_description: None,
+            version: None,
+            context_flags: vec![],
+            param_count: None,
+            error_count: None,
+            examples: None,
+            notes: None,
+            since_version: None,
+            subsystem: None,
+            sysfs_path: None,
+            permissions: None,
+            socket_state: None,
+            protocol_behaviors: vec![],
+            addr_families: vec![],
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: vec![],
+            parameters: vec![],
+            return_spec: None,
+            errors: vec![],
+            signals: vec![],
+            signal_masks: vec![],
+            side_effects: vec![],
+            state_transitions: vec![],
+            constraints: vec![],
+            locks: vec![],
+            struct_specs: vec![],
+        };
+
+        // Parse line by line
+        let lines: Vec<&str> =3D doc.lines().collect();
+        let mut i =3D 0;
+
+        // Extract main description from function name line
+        if let Some(first_line) =3D lines.first() {
+            if let Some((_, desc)) =3D first_line.split_once(" - ") {
+                spec.description =3D Some(desc.trim().to_string());
+            }
+        }
+
+        // Keep track of parameters we've seen
+        let mut param_map: HashMap<String, ParamSpec> =3D HashMap::new();
+        let mut struct_fields: Vec<StructFieldSpec> =3D Vec::new();
+        let mut current_lock: Option<LockSpec> =3D None;
+        let mut current_signal: Option<SignalSpec> =3D None;
+        let mut current_capability: Option<CapabilitySpec> =3D None;
+
+        while i < lines.len() {
+            let line =3D lines[i].trim();
+
+            // Skip empty lines
+            if line.is_empty() {
+                i +=3D 1;
+                continue;
+            }
+
+            // Parse @param lines
+            if let Some(rest) =3D line.strip_prefix("@") {
+                if let Some((param_name, desc)) =3D rest.split_once(':') {
+                    let param_name =3D param_name.trim();
+                    let desc =3D desc.trim();
+                    if !param_name.contains('-') {
+                        // This is a basic parameter description - add to =
map
+                        param_map.insert(param_name.to_string(), ParamSpec=
 {
+                            index: param_map.len() as u32,
+                            name: param_name.to_string(),
+                            type_name: String::new(),
+                            description: desc.to_string(),
+                            flags: 0,
+                            param_type: 0,
+                            constraint_type: 0,
+                            constraint: None,
+                            min_value: None,
+                            max_value: None,
+                            valid_mask: None,
+                            enum_values: vec![],
+                            size: None,
+                            alignment: None,
+                        });
+                    }
+                }
+            }
+            // Parse long-desc
+            else if let Some(rest) =3D line.strip_prefix("long-desc:") {
+                spec.long_description =3D Some(self.collect_multiline_valu=
e(&lines, i, rest));
+            }
+            // Parse context-flags
+            else if let Some(rest) =3D line.strip_prefix("context-flags:")=
 {
+                spec.context_flags =3D self.parse_context_flags(rest.trim(=
));
+            }
+            // Parse param-count
+            else if let Some(rest) =3D line.strip_prefix("param-count:") {
+                spec.param_count =3D rest.trim().parse().ok();
+            }
+            // Parse param-type
+            else if let Some(rest) =3D line.strip_prefix("param-type:") {
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 2 {
+                    if let Some(param) =3D param_map.get_mut(parts[0]) {
+                        param.param_type =3D self.parse_param_type(parts[1=
]);
+                    }
+                }
+            }
+            // Parse param-flags
+            else if let Some(rest) =3D line.strip_prefix("param-flags:") {
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 2 {
+                    if let Some(param) =3D param_map.get_mut(parts[0]) {
+                        param.flags =3D self.parse_param_flags(parts[1]);
+                    }
+                }
+            }
+            // Parse param-range
+            else if let Some(rest) =3D line.strip_prefix("param-range:") {
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 3 {
+                    if let Some(param) =3D param_map.get_mut(parts[0]) {
+                        param.min_value =3D parts[1].parse().ok();
+                        param.max_value =3D parts[2].parse().ok();
+                        param.constraint_type =3D 1; // KAPI_CONSTRAINT_RA=
NGE
+                    }
+                }
+            }
+            // Parse param-constraint
+            else if let Some(rest) =3D line.strip_prefix("param-constraint=
:") {
+                let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri=
m()).collect();
+                if parts.len() >=3D 2 {
+                    if let Some(param) =3D param_map.get_mut(parts[0]) {
+                        param.constraint =3D Some(parts[1].to_string());
+                    }
+                }
+            }
+            // Parse error
+            else if let Some(rest) =3D line.strip_prefix("error:") {
+                // Parse error in format: "ERROR_CODE, description"
+                let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri=
m()).collect();
+                if parts.len() >=3D 2 {
+                    let error_name =3D parts[0].to_string();
+                    let description =3D parts[1].to_string();
+
+                    // Look for desc: line on the next line
+                    let mut full_description =3D description;
+                    if i + 1 < lines.len() {
+                        if let Some(desc_line) =3D lines[i + 1].strip_pref=
ix("*   desc:") {
+                            full_description =3D desc_line.trim().to_strin=
g();
+                        } else if let Some(desc_line) =3D lines[i + 1].str=
ip_prefix("* desc:") {
+                            full_description =3D desc_line.trim().to_strin=
g();
+                        }
+                    }
+
+                    // Map common error names to codes
+                    let error_code =3D match error_name.as_str() {
+                        "E2BIG" =3D> -7,
+                        "EACCES" =3D> -13,
+                        "EAGAIN" =3D> -11,
+                        "EBADF" =3D> -9,
+                        "EBUSY" =3D> -16,
+                        "EFAULT" =3D> -14,
+                        "EINTR" =3D> -4,
+                        "EINVAL" =3D> -22,
+                        "EIO" =3D> -5,
+                        "EISDIR" =3D> -21,
+                        "ELIBBAD" =3D> -80,
+                        "ELOOP" =3D> -40,
+                        "EMFILE" =3D> -24,
+                        "ENAMETOOLONG" =3D> -36,
+                        "ENFILE" =3D> -23,
+                        "ENOENT" =3D> -2,
+                        "ENOEXEC" =3D> -8,
+                        "ENOMEM" =3D> -12,
+                        "ENOTDIR" =3D> -20,
+                        "EOPNOTSUPP" =3D> -95,
+                        "EPERM" =3D> -1,
+                        "ESRCH" =3D> -3,
+                        "ETXTBSY" =3D> -26,
+                        _ =3D> 0,
+                    };
+
+                    spec.errors.push(ErrorSpec {
+                        error_code,
+                        name: error_name,
+                        condition: String::new(),
+                        description: full_description,
+                    });
+                }
+            }
+            // Parse lock
+            else if let Some(rest) =3D line.strip_prefix("lock:") {
+                // Save previous lock if any
+                if let Some(lock) =3D current_lock.take() {
+                    spec.locks.push(lock);
+                }
+
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 2 {
+                    current_lock =3D Some(LockSpec {
+                        lock_name: parts[0].to_string(),
+                        lock_type: self.parse_lock_type(parts[1]),
+                        scope: super::KAPI_LOCK_INTERNAL, // default: acqu=
ired and released
+                        description: String::new(),
+                    });
+                }
+            }
+            // Parse lock scope
+            else if let Some(rest) =3D line.strip_prefix("lock-scope:") {
+                if let Some(lock) =3D current_lock.as_mut() {
+                    lock.scope =3D match rest.trim() {
+                        "internal" =3D> super::KAPI_LOCK_INTERNAL,
+                        "acquires" =3D> super::KAPI_LOCK_ACQUIRES,
+                        "releases" =3D> super::KAPI_LOCK_RELEASES,
+                        "caller_held" =3D> super::KAPI_LOCK_CALLER_HELD,
+                        _ =3D> super::KAPI_LOCK_INTERNAL,
+                    };
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("lock-desc:") {
+                if let Some(lock) =3D current_lock.as_mut() {
+                    lock.description =3D self.collect_multiline_value(&lin=
es, i, rest);
+                }
+            }
+            // Parse signal
+            else if let Some(rest) =3D line.strip_prefix("signal:") {
+                // Save previous signal if any
+                if let Some(signal) =3D current_signal.take() {
+                    spec.signals.push(signal);
+                }
+
+                let signal_name =3D rest.trim().to_string();
+                current_signal =3D Some(SignalSpec {
+                    signal_num: 0,
+                    signal_name,
+                    direction: 1,
+                    action: 0,
+                    target: None,
+                    condition: None,
+                    description: None,
+                    restartable: false,
+                    timing: 0,
+                    priority: 0,
+                    interruptible: false,
+                    queue: None,
+                    sa_flags: 0,
+                    sa_flags_required: 0,
+                    sa_flags_forbidden: 0,
+                    state_required: 0,
+                    state_forbidden: 0,
+                    error_on_signal: None,
+                });
+            }
+            // Parse signal attributes
+            else if let Some(rest) =3D line.strip_prefix("signal-direction=
:") {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.direction =3D self.parse_signal_direction(rest.=
trim());
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("signal-action:")=
 {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.action =3D self.parse_signal_action(rest.trim()=
);
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("signal-condition=
:") {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.condition =3D Some(self.collect_multiline_value=
(&lines, i, rest));
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("signal-desc:") {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.description =3D Some(self.collect_multiline_val=
ue(&lines, i, rest));
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("signal-timing:")=
 {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.timing =3D self.parse_signal_timing(rest.trim()=
);
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("signal-priority:=
") {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.priority =3D rest.trim().parse().unwrap_or(0);
+                }
+            }
+            else if line.strip_prefix("signal-interruptible:").is_some() {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.interruptible =3D true;
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("signal-state-req=
:") {
+                if let Some(signal) =3D current_signal.as_mut() {
+                    signal.state_required =3D self.parse_signal_state(rest=
.trim());
+                }
+            }
+            // Parse side-effect
+            else if let Some(rest) =3D line.strip_prefix("side-effect:") {
+                let full_effect =3D self.collect_multiline_value(&lines, i=
, rest);
+                let parts: Vec<&str> =3D full_effect.splitn(3, ',').map(|s=
| s.trim()).collect();
+                if parts.len() >=3D 3 {
+                    let mut effect =3D SideEffectSpec {
+                        effect_type: self.parse_effect_type(parts[0]),
+                        target: parts[1].to_string(),
+                        condition: None,
+                        description: parts[2].to_string(),
+                        reversible: false,
+                    };
+
+                    // Check for additional attributes
+                    if let Some(pos) =3D parts[2].find("condition=3D") {
+                        let cond_str =3D &parts[2][pos + 10..];
+                        if let Some(end) =3D cond_str.find(',') {
+                            effect.condition =3D Some(cond_str[..end].to_s=
tring());
+                        } else {
+                            effect.condition =3D Some(cond_str.to_string()=
);
+                        }
+                    }
+
+                    if parts[2].contains("reversible=3Dyes") {
+                        effect.reversible =3D true;
+                    }
+
+                    spec.side_effects.push(effect);
+                }
+            }
+            // Parse state-trans
+            else if let Some(rest) =3D line.strip_prefix("state-trans:") {
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 4 {
+                    spec.state_transitions.push(StateTransitionSpec {
+                        object: parts[0].to_string(),
+                        from_state: parts[1].to_string(),
+                        to_state: parts[2].to_string(),
+                        condition: None,
+                        description: parts[3].to_string(),
+                    });
+                }
+            }
+            // Parse capability
+            else if let Some(rest) =3D line.strip_prefix("capability:") {
+                // Save previous capability if any
+                if let Some(cap) =3D current_capability.take() {
+                    spec.capabilities.push(cap);
+                }
+
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 3 {
+                    current_capability =3D Some(CapabilitySpec {
+                        capability: self.parse_capability_value(parts[0]),
+                        action: parts[1].to_string(),
+                        name: parts[2].to_string(),
+                        allows: String::new(),
+                        without_cap: String::new(),
+                        check_condition: None,
+                        priority: Some(0),
+                        alternatives: vec![],
+                    });
+                }
+            }
+            // Parse capability attributes
+            else if let Some(rest) =3D line.strip_prefix("capability-allow=
s:") {
+                if let Some(cap) =3D current_capability.as_mut() {
+                    cap.allows =3D self.collect_multiline_value(&lines, i,=
 rest);
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("capability-witho=
ut:") {
+                if let Some(cap) =3D current_capability.as_mut() {
+                    cap.without_cap =3D self.collect_multiline_value(&line=
s, i, rest);
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("capability-condi=
tion:") {
+                if let Some(cap) =3D current_capability.as_mut() {
+                    cap.check_condition =3D Some(self.collect_multiline_va=
lue(&lines, i, rest));
+                }
+            }
+            else if let Some(rest) =3D line.strip_prefix("capability-prior=
ity:") {
+                if let Some(cap) =3D current_capability.as_mut() {
+                    cap.priority =3D rest.trim().parse().ok();
+                }
+            }
+            // Parse constraint
+            else if let Some(rest) =3D line.strip_prefix("constraint:") {
+                let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri=
m()).collect();
+                if parts.len() >=3D 2 {
+                    spec.constraints.push(ConstraintSpec {
+                        name: parts[0].to_string(),
+                        description: parts[1].to_string(),
+                        expression: None,
+                    });
+                }
+            }
+            // Parse constraint-expr
+            else if let Some(rest) =3D line.strip_prefix("constraint-expr:=
") {
+                let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri=
m()).collect();
+                if parts.len() >=3D 2 {
+                    // Find matching constraint and update it
+                    if let Some(constraint) =3D spec.constraints.iter_mut(=
).find(|c| c.name =3D=3D parts[0]) {
+                        constraint.expression =3D Some(parts[1].to_string(=
));
+                    }
+                }
+            }
+            // Parse struct-field
+            else if let Some(rest) =3D line.strip_prefix("struct-field:") {
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 3 {
+                    struct_fields.push(StructFieldSpec {
+                        name: parts[0].to_string(),
+                        field_type: self.parse_field_type(parts[1]),
+                        type_name: parts[1].to_string(),
+                        offset: 0,
+                        size: 0,
+                        flags: 0,
+                        constraint_type: 0,
+                        min_value: 0,
+                        max_value: 0,
+                        valid_mask: 0,
+                        description: parts[2].to_string(),
+                    });
+                }
+            }
+            // Parse struct-field-range
+            else if let Some(rest) =3D line.strip_prefix("struct-field-ran=
ge:") {
+                let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())=
.collect();
+                if parts.len() >=3D 3 {
+                    // Update the field with range
+                    if let Some(field) =3D struct_fields.iter_mut().find(|=
f| f.name =3D=3D parts[0]) {
+                        field.min_value =3D parts[1].parse().unwrap_or(0);
+                        field.max_value =3D parts[2].parse().unwrap_or(0);
+                        field.constraint_type =3D 1; // KAPI_CONSTRAINT_RA=
NGE
+                    }
+                }
+            }
+            // Parse examples
+            else if let Some(rest) =3D line.strip_prefix("examples:") {
+                spec.examples =3D Some(self.collect_multiline_value(&lines=
, i, rest));
+            }
+            // Parse notes
+            else if let Some(rest) =3D line.strip_prefix("notes:") {
+                spec.notes =3D Some(self.collect_multiline_value(&lines, i=
, rest));
+            }
+            // Parse since-version
+            else if let Some(rest) =3D line.strip_prefix("since-version:")=
 {
+                spec.since_version =3D Some(rest.trim().to_string());
+            }
+            // Parse return-type
+            else if let Some(rest) =3D line.strip_prefix("return-type:") {
+                if spec.return_spec.is_none() {
+                    spec.return_spec =3D Some(ReturnSpec {
+                        type_name: rest.trim().to_string(),
+                        description: String::new(),
+                        return_type: self.parse_param_type(rest.trim()),
+                        check_type: 0,
+                        success_value: None,
+                        success_min: None,
+                        success_max: None,
+                        error_values: vec![],
+                    });
+                }
+            }
+            // Parse return-check-type
+            else if let Some(rest) =3D line.strip_prefix("return-check-typ=
e:") {
+                if let Some(ret) =3D spec.return_spec.as_mut() {
+                    ret.check_type =3D self.parse_return_check_type(rest.t=
rim());
+                }
+            }
+            // Parse return-success
+            else if let Some(rest) =3D line.strip_prefix("return-success:"=
) {
+                if let Some(ret) =3D spec.return_spec.as_mut() {
+                    ret.success_value =3D rest.trim().parse().ok();
+                }
+            }
+
+            i +=3D 1;
+        }
+
+        // Save any remaining items
+        if let Some(lock) =3D current_lock {
+            spec.locks.push(lock);
+        }
+        if let Some(signal) =3D current_signal {
+            spec.signals.push(signal);
+        }
+        if let Some(cap) =3D current_capability {
+            spec.capabilities.push(cap);
+        }
+
+        // Convert param_map to vec preserving order
+        let mut params: Vec<ParamSpec> =3D param_map.into_values().collect=
();
+        params.sort_by_key(|p| p.index);
+        spec.parameters =3D params;
+
+        // Create struct spec if we have fields
+        if !struct_fields.is_empty() {
+            spec.struct_specs.push(StructSpec {
+                name: "struct sched_attr".to_string(),
+                size: 120, // Default for sched_attr
+                alignment: 8,
+                field_count: struct_fields.len() as u32,
+                fields: struct_fields,
+                description: "Structure specification".to_string(),
+            });
+        }
+
+        Ok(spec)
+    }
+
+    fn collect_multiline_value(&self, lines: &[&str], start_idx: usize, fi=
rst_part: &str) -> String {
+        let mut result =3D String::from(first_part.trim());
+        let mut i =3D start_idx + 1;
+
+        // Continue collecting lines until we hit another annotation or end
+        while i < lines.len() {
+            let line =3D lines[i];
+
+            // Stop if we hit another annotation (contains ':' and starts =
with valid keyword)
+            if self.is_annotation_line(line) {
+                break;
+            }
+
+            // Add continuation lines
+            if !line.trim().is_empty() && line.starts_with("  ") {
+                if !result.is_empty() {
+                    result.push(' ');
+                }
+                result.push_str(line.trim());
+            } else if line.trim().is_empty() {
+                // Empty line might be part of multiline
+                i +=3D 1;
+                continue;
+            } else {
+                // Non-continuation line, stop
+                break;
+            }
+
+            i +=3D 1;
+        }
+
+        result
+    }
+
+    fn is_annotation_line(&self, line: &str) -> bool {
+        let annotations =3D [
+            "param-", "error-", "lock", "signal", "side-effect:",
+            "state-trans:", "capability", "constraint", "struct-",
+            "return-", "examples:", "notes:", "since-", "context-",
+            "long-desc:"
+        ];
+
+        for ann in &annotations {
+            if line.trim_start().starts_with(ann) {
+                return true;
+            }
+        }
+        false
+    }
+
+    fn parse_context_flags(&self, flags: &str) -> Vec<String> {
+        flags.split('|')
+            .map(|f| f.trim().to_string())
+            .collect()
+    }
+
+    fn parse_param_type(&self, type_str: &str) -> u32 {
+        match type_str {
+            "KAPI_TYPE_INT" =3D> 1,
+            "KAPI_TYPE_UINT" =3D> 2,
+            "KAPI_TYPE_LONG" =3D> 3,
+            "KAPI_TYPE_ULONG" =3D> 4,
+            "KAPI_TYPE_STRING" =3D> 5,
+            "KAPI_TYPE_USER_PTR" =3D> 6,
+            _ =3D> 0,
+        }
+    }
+
+    fn parse_field_type(&self, type_str: &str) -> u32 {
+        match type_str {
+            "__s32" | "int" =3D> 1,
+            "__u32" | "unsigned int" =3D> 2,
+            "__s64" | "long" =3D> 3,
+            "__u64" | "unsigned long" =3D> 4,
+            _ =3D> 0,
+        }
+    }
+
+    fn parse_param_flags(&self, flags: &str) -> u32 {
+        let mut result =3D 0;
+        for flag in flags.split('|') {
+            match flag.trim() {
+                "KAPI_PARAM_IN" =3D> result |=3D 1,
+                "KAPI_PARAM_OUT" =3D> result |=3D 2,
+                "KAPI_PARAM_INOUT" =3D> result |=3D 3,
+                "KAPI_PARAM_USER" =3D> result |=3D 4,
+                _ =3D> {}
+            }
+        }
+        result
+    }
+
+    fn parse_lock_type(&self, type_str: &str) -> u32 {
+        match type_str {
+            "KAPI_LOCK_SPINLOCK" =3D> 0,
+            "KAPI_LOCK_MUTEX" =3D> 1,
+            "KAPI_LOCK_RWLOCK" =3D> 2,
+            _ =3D> 3,
+        }
+    }
+
+    fn parse_signal_direction(&self, dir: &str) -> u32 {
+        match dir {
+            "KAPI_SIGNAL_SEND" =3D> 1,
+            "KAPI_SIGNAL_RECEIVE" =3D> 2,
+            _ =3D> 0,
+        }
+    }
+
+    fn parse_signal_action(&self, action: &str) -> u32 {
+        match action {
+            "KAPI_SIGNAL_ACTION_DEFAULT" =3D> 0,
+            "KAPI_SIGNAL_ACTION_IGNORE" =3D> 1,
+            "KAPI_SIGNAL_ACTION_CUSTOM" =3D> 2,
+            _ =3D> 0,
+        }
+    }
+
+    fn parse_signal_timing(&self, timing: &str) -> u32 {
+        match timing {
+            "KAPI_SIGNAL_TIME_BEFORE" =3D> 0,
+            "KAPI_SIGNAL_TIME_DURING" =3D> 1,
+            "KAPI_SIGNAL_TIME_AFTER" =3D> 2,
+            _ =3D> 0,
+        }
+    }
+
+    fn parse_signal_state(&self, state: &str) -> u32 {
+        match state {
+            "KAPI_SIGNAL_STATE_RUNNING" =3D> 1,
+            "KAPI_SIGNAL_STATE_SLEEPING" =3D> 2,
+            _ =3D> 0,
+        }
+    }
+
+    fn parse_effect_type(&self, type_str: &str) -> u32 {
+        let mut result =3D 0;
+        for flag in type_str.split('|') {
+            match flag.trim() {
+                "KAPI_EFFECT_MODIFY_STATE" =3D> result |=3D 1,
+                "KAPI_EFFECT_PROCESS_STATE" =3D> result |=3D 2,
+                "KAPI_EFFECT_SCHEDULE" =3D> result |=3D 4,
+                _ =3D> {}
+            }
+        }
+        result
+    }
+
+    fn parse_capability_value(&self, cap: &str) -> i32 {
+        match cap {
+            "CAP_SYS_NICE" =3D> 23,
+            _ =3D> 0,
+        }
+    }
+
+    fn parse_return_check_type(&self, check: &str) -> u32 {
+        match check {
+            "KAPI_RETURN_ERROR_CHECK" =3D> 1,
+            "KAPI_RETURN_SUCCESS_CHECK" =3D> 2,
+            _ =3D> 0,
+        }
+    }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/extractor/mod.rs b/tools/kapi/src/extractor/mod=
.rs
new file mode 100644
index 0000000000000..4eeb03b9a4ca3
--- /dev/null
+++ b/tools/kapi/src/extractor/mod.rs
@@ -0,0 +1,464 @@
+use crate::formatter::OutputFormatter;
+use anyhow::Result;
+use std::convert::TryInto;
+use std::io::Write;
+
+pub mod debugfs;
+pub mod kerneldoc_parser;
+pub mod source_parser;
+pub mod vmlinux;
+
+pub use debugfs::DebugfsExtractor;
+pub use source_parser::SourceExtractor;
+pub use vmlinux::VmlinuxExtractor;
+
+/// Socket state specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SocketStateSpec {
+    pub required_states: Vec<String>,
+    pub forbidden_states: Vec<String>,
+    pub resulting_state: Option<String>,
+    pub condition: Option<String>,
+    pub applicable_protocols: Option<String>,
+}
+
+/// Protocol behavior specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ProtocolBehaviorSpec {
+    pub applicable_protocols: String,
+    pub behavior: String,
+    pub protocol_flags: Option<String>,
+    pub flag_description: Option<String>,
+}
+
+/// Address family specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AddrFamilySpec {
+    pub family: i32,
+    pub family_name: String,
+    pub addr_struct_size: usize,
+    pub min_addr_len: usize,
+    pub max_addr_len: usize,
+    pub addr_format: Option<String>,
+    pub supports_wildcard: bool,
+    pub supports_multicast: bool,
+    pub supports_broadcast: bool,
+    pub special_addresses: Option<String>,
+    pub port_range_min: u32,
+    pub port_range_max: u32,
+}
+
+/// Buffer specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct BufferSpec {
+    pub buffer_behaviors: Option<String>,
+    pub min_buffer_size: Option<usize>,
+    pub max_buffer_size: Option<usize>,
+    pub optimal_buffer_size: Option<usize>,
+}
+
+/// Async specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct AsyncSpec {
+    pub supported_modes: Option<String>,
+    pub nonblock_errno: Option<i32>,
+}
+
+/// Capability specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct CapabilitySpec {
+    pub capability: i32,
+    pub name: String,
+    pub action: String,
+    pub allows: String,
+    pub without_cap: String,
+    pub check_condition: Option<String>,
+    pub priority: Option<u8>,
+    pub alternatives: Vec<i32>,
+}
+
+/// Parameter specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ParamSpec {
+    pub index: u32,
+    pub name: String,
+    pub type_name: String,
+    pub description: String,
+    pub flags: u32,
+    pub param_type: u32,
+    pub constraint_type: u32,
+    pub constraint: Option<String>,
+    pub min_value: Option<i64>,
+    pub max_value: Option<i64>,
+    pub valid_mask: Option<u64>,
+    pub enum_values: Vec<String>,
+    pub size: Option<u32>,
+    pub alignment: Option<u32>,
+}
+
+/// Return value specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ReturnSpec {
+    pub type_name: String,
+    pub description: String,
+    pub return_type: u32,
+    pub check_type: u32,
+    pub success_value: Option<i64>,
+    pub success_min: Option<i64>,
+    pub success_max: Option<i64>,
+    pub error_values: Vec<i32>,
+}
+
+/// Error specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ErrorSpec {
+    pub error_code: i32,
+    pub name: String,
+    pub condition: String,
+    pub description: String,
+}
+
+/// Signal specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalSpec {
+    pub signal_num: i32,
+    pub signal_name: String,
+    pub direction: u32,
+    pub action: u32,
+    pub target: Option<String>,
+    pub condition: Option<String>,
+    pub description: Option<String>,
+    pub timing: u32,
+    pub priority: u32,
+    pub restartable: bool,
+    pub interruptible: bool,
+    pub queue: Option<String>,
+    pub sa_flags: u32,
+    pub sa_flags_required: u32,
+    pub sa_flags_forbidden: u32,
+    pub state_required: u32,
+    pub state_forbidden: u32,
+    pub error_on_signal: Option<i32>,
+}
+
+/// Signal mask specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SignalMaskSpec {
+    pub name: String,
+    pub description: String,
+}
+
+/// Side effect specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct SideEffectSpec {
+    pub effect_type: u32,
+    pub target: String,
+    pub condition: Option<String>,
+    pub description: String,
+    pub reversible: bool,
+}
+
+/// State transition specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StateTransitionSpec {
+    pub object: String,
+    pub from_state: String,
+    pub to_state: String,
+    pub condition: Option<String>,
+    pub description: String,
+}
+
+/// Constraint specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct ConstraintSpec {
+    pub name: String,
+    pub description: String,
+    pub expression: Option<String>,
+}
+
+/// Lock scope enum values matching kernel enum kapi_lock_scope
+pub const KAPI_LOCK_INTERNAL: u32 =3D 0;
+pub const KAPI_LOCK_ACQUIRES: u32 =3D 1;
+pub const KAPI_LOCK_RELEASES: u32 =3D 2;
+pub const KAPI_LOCK_CALLER_HELD: u32 =3D 3;
+
+/// Lock specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct LockSpec {
+    pub lock_name: String,
+    pub lock_type: u32,
+    pub scope: u32,
+    pub description: String,
+}
+
+/// Struct field specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StructFieldSpec {
+    pub name: String,
+    pub field_type: u32,
+    pub type_name: String,
+    pub offset: usize,
+    pub size: usize,
+    pub flags: u32,
+    pub constraint_type: u32,
+    pub min_value: i64,
+    pub max_value: i64,
+    pub valid_mask: u64,
+    pub description: String,
+}
+
+/// Struct specification
+#[derive(Debug, Clone, serde::Serialize)]
+pub struct StructSpec {
+    pub name: String,
+    pub size: usize,
+    pub alignment: usize,
+    pub field_count: u32,
+    pub fields: Vec<StructFieldSpec>,
+    pub description: String,
+}
+
+/// Common API specification information that all extractors should provide
+#[derive(Debug, Clone)]
+pub struct ApiSpec {
+    pub name: String,
+    pub api_type: String,
+    pub description: Option<String>,
+    pub long_description: Option<String>,
+    pub version: Option<String>,
+    pub context_flags: Vec<String>,
+    pub param_count: Option<u32>,
+    pub error_count: Option<u32>,
+    pub examples: Option<String>,
+    pub notes: Option<String>,
+    pub since_version: Option<String>,
+    // Sysfs-specific fields
+    pub subsystem: Option<String>,
+    pub sysfs_path: Option<String>,
+    pub permissions: Option<String>,
+    // Networking-specific fields
+    pub socket_state: Option<SocketStateSpec>,
+    pub protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+    pub addr_families: Vec<AddrFamilySpec>,
+    pub buffer_spec: Option<BufferSpec>,
+    pub async_spec: Option<AsyncSpec>,
+    pub net_data_transfer: Option<String>,
+    pub capabilities: Vec<CapabilitySpec>,
+    pub parameters: Vec<ParamSpec>,
+    pub return_spec: Option<ReturnSpec>,
+    pub errors: Vec<ErrorSpec>,
+    pub signals: Vec<SignalSpec>,
+    pub signal_masks: Vec<SignalMaskSpec>,
+    pub side_effects: Vec<SideEffectSpec>,
+    pub state_transitions: Vec<StateTransitionSpec>,
+    pub constraints: Vec<ConstraintSpec>,
+    pub locks: Vec<LockSpec>,
+    pub struct_specs: Vec<StructSpec>,
+}
+
+/// Trait for extracting API specifications from different sources
+pub trait ApiExtractor {
+    /// Extract all API specifications from the source
+    fn extract_all(&self) -> Result<Vec<ApiSpec>>;
+
+    /// Extract a specific API specification by name
+    fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>>;
+
+    /// Display detailed information about a specific API
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        writer: &mut dyn Write,
+    ) -> Result<()>;
+}
+
+/// Helper function to display an ApiSpec using a formatter
+pub fn display_api_spec(
+    spec: &ApiSpec,
+    formatter: &mut dyn OutputFormatter,
+    writer: &mut dyn Write,
+) -> Result<()> {
+    formatter.begin_api_details(writer, &spec.name)?;
+
+    if let Some(desc) =3D &spec.description {
+        formatter.description(writer, desc)?;
+    }
+
+    if let Some(long_desc) =3D &spec.long_description {
+        formatter.long_description(writer, long_desc)?;
+    }
+
+    if let Some(version) =3D &spec.since_version {
+        formatter.since_version(writer, version)?;
+    }
+
+    if !spec.context_flags.is_empty() {
+        formatter.begin_context_flags(writer)?;
+        for flag in &spec.context_flags {
+            formatter.context_flag(writer, flag)?;
+        }
+        formatter.end_context_flags(writer)?;
+    }
+
+    if !spec.parameters.is_empty() {
+        formatter.begin_parameters(writer, spec.parameters.len().try_into(=
).unwrap_or(u32::MAX))?;
+        for param in &spec.parameters {
+            formatter.parameter(writer, param)?;
+        }
+        formatter.end_parameters(writer)?;
+    }
+
+    if let Some(ret) =3D &spec.return_spec {
+        formatter.return_spec(writer, ret)?;
+    }
+
+    if !spec.errors.is_empty() {
+        formatter.begin_errors(writer, spec.errors.len().try_into().unwrap=
_or(u32::MAX))?;
+        for error in &spec.errors {
+            formatter.error(writer, error)?;
+        }
+        formatter.end_errors(writer)?;
+    }
+
+    if let Some(notes) =3D &spec.notes {
+        formatter.notes(writer, notes)?;
+    }
+
+    if let Some(examples) =3D &spec.examples {
+        formatter.examples(writer, examples)?;
+    }
+
+    // Display sysfs-specific fields
+    if spec.api_type =3D=3D "sysfs" {
+        if let Some(subsystem) =3D &spec.subsystem {
+            formatter.sysfs_subsystem(writer, subsystem)?;
+        }
+        if let Some(path) =3D &spec.sysfs_path {
+            formatter.sysfs_path(writer, path)?;
+        }
+        if let Some(perms) =3D &spec.permissions {
+            formatter.sysfs_permissions(writer, perms)?;
+        }
+    }
+
+    // Display networking-specific fields
+    if let Some(socket_state) =3D &spec.socket_state {
+        formatter.socket_state(writer, socket_state)?;
+    }
+
+    if !spec.protocol_behaviors.is_empty() {
+        formatter.begin_protocol_behaviors(writer)?;
+        for behavior in &spec.protocol_behaviors {
+            formatter.protocol_behavior(writer, behavior)?;
+        }
+        formatter.end_protocol_behaviors(writer)?;
+    }
+
+    if !spec.addr_families.is_empty() {
+        formatter.begin_addr_families(writer)?;
+        for family in &spec.addr_families {
+            formatter.addr_family(writer, family)?;
+        }
+        formatter.end_addr_families(writer)?;
+    }
+
+    if let Some(buffer_spec) =3D &spec.buffer_spec {
+        formatter.buffer_spec(writer, buffer_spec)?;
+    }
+
+    if let Some(async_spec) =3D &spec.async_spec {
+        formatter.async_spec(writer, async_spec)?;
+    }
+
+    if let Some(net_data_transfer) =3D &spec.net_data_transfer {
+        formatter.net_data_transfer(writer, net_data_transfer)?;
+    }
+
+    if !spec.capabilities.is_empty() {
+        formatter.begin_capabilities(writer)?;
+        for cap in &spec.capabilities {
+            formatter.capability(writer, cap)?;
+        }
+        formatter.end_capabilities(writer)?;
+    }
+
+    // Display signals
+    if !spec.signals.is_empty() {
+        formatter.begin_signals(writer, spec.signals.len().try_into().unwr=
ap_or(u32::MAX))?;
+        for signal in &spec.signals {
+            formatter.signal(writer, signal)?;
+        }
+        formatter.end_signals(writer)?;
+    }
+
+    // Display signal masks
+    if !spec.signal_masks.is_empty() {
+        formatter.begin_signal_masks(
+            writer,
+            spec.signal_masks.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for mask in &spec.signal_masks {
+            formatter.signal_mask(writer, mask)?;
+        }
+        formatter.end_signal_masks(writer)?;
+    }
+
+    // Display side effects
+    if !spec.side_effects.is_empty() {
+        formatter.begin_side_effects(
+            writer,
+            spec.side_effects.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for effect in &spec.side_effects {
+            formatter.side_effect(writer, effect)?;
+        }
+        formatter.end_side_effects(writer)?;
+    }
+
+    // Display state transitions
+    if !spec.state_transitions.is_empty() {
+        formatter.begin_state_transitions(
+            writer,
+            spec.state_transitions.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for trans in &spec.state_transitions {
+            formatter.state_transition(writer, trans)?;
+        }
+        formatter.end_state_transitions(writer)?;
+    }
+
+    // Display constraints
+    if !spec.constraints.is_empty() {
+        formatter.begin_constraints(
+            writer,
+            spec.constraints.len().try_into().unwrap_or(u32::MAX),
+        )?;
+        for constraint in &spec.constraints {
+            formatter.constraint(writer, constraint)?;
+        }
+        formatter.end_constraints(writer)?;
+    }
+
+    // Display locks
+    if !spec.locks.is_empty() {
+        formatter.begin_locks(writer, spec.locks.len().try_into().unwrap_o=
r(u32::MAX))?;
+        for lock in &spec.locks {
+            formatter.lock(writer, lock)?;
+        }
+        formatter.end_locks(writer)?;
+    }
+
+    // Display struct specs
+    if !spec.struct_specs.is_empty() {
+        formatter.begin_struct_specs(writer, spec.struct_specs.len().try_i=
nto().unwrap_or(u32::MAX))?;
+        for struct_spec in &spec.struct_specs {
+            formatter.struct_spec(writer, struct_spec)?;
+        }
+        formatter.end_struct_specs(writer)?;
+    }
+
+    formatter.end_api_details(writer)?;
+
+    Ok(())
+}
diff --git a/tools/kapi/src/extractor/source_parser.rs b/tools/kapi/src/ext=
ractor/source_parser.rs
new file mode 100644
index 0000000000000..7a72b85a83bea
--- /dev/null
+++ b/tools/kapi/src/extractor/source_parser.rs
@@ -0,0 +1,213 @@
+use super::{
+    ApiExtractor, ApiSpec, display_api_spec,
+};
+use super::kerneldoc_parser::KerneldocParserImpl;
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result};
+use regex::Regex;
+use std::fs;
+use std::io::Write;
+use std::path::Path;
+use walkdir::WalkDir;
+
+/// Extractor for kernel source files with KAPI-annotated kerneldoc
+pub struct SourceExtractor {
+    path: String,
+    parser: KerneldocParserImpl,
+    syscall_regex: Regex,
+    ioctl_regex: Regex,
+    function_regex: Regex,
+}
+
+impl SourceExtractor {
+    pub fn new(path: &str) -> Result<Self> {
+        Ok(SourceExtractor {
+            path: path.to_string(),
+            parser: KerneldocParserImpl::new(),
+            syscall_regex: Regex::new(r"SYSCALL_DEFINE\d+\((\w+)")?,
+            ioctl_regex: Regex::new(r"(?:static\s+)?long\s+(\w+_ioctl)\s*\=
(")?,
+            function_regex: Regex::new(
+                r"(?m)^(?:static\s+)?(?:inline\s+)?(?:(?:unsigned\s+)?(?:l=
ong|int|void|char|short|struct\s+\w+\s*\*?|[\w_]+_t)\s*\*?\s+)?(\w+)\s*\([^=
)]*\)",
+            )?,
+        })
+    }
+
+    fn extract_from_file(&self, path: &Path) -> Result<Vec<ApiSpec>> {
+        let content =3D fs::read_to_string(path)
+            .with_context(|| format!("Failed to read file: {}", path.displ=
ay()))?;
+
+        self.extract_from_content(&content)
+    }
+
+    fn extract_from_content(&self, content: &str) -> Result<Vec<ApiSpec>> {
+        let mut specs =3D Vec::new();
+        let mut in_kerneldoc =3D false;
+        let mut current_doc =3D String::new();
+        let lines: Vec<&str> =3D content.lines().collect();
+        let mut i =3D 0;
+
+        while i < lines.len() {
+            let line =3D lines[i];
+
+            // Start of kerneldoc comment
+            if line.trim_start().starts_with("/**") {
+                in_kerneldoc =3D true;
+                current_doc.clear();
+                i +=3D 1;
+                continue;
+            }
+
+            // Inside kerneldoc comment
+            if in_kerneldoc {
+                if line.contains("*/") {
+                    in_kerneldoc =3D false;
+
+                    // Check if this kerneldoc has KAPI annotations
+                    if current_doc.contains("context-flags:") ||
+                       current_doc.contains("param-count:") ||
+                       current_doc.contains("side-effect:") ||
+                       current_doc.contains("state-trans:") ||
+                       current_doc.contains("error-code:") {
+
+                        // Look ahead for the function declaration
+                        if let Some((name, api_type, signature)) =3D self.=
find_function_after(&lines, i + 1) {
+                            if let Ok(spec) =3D self.parser.parse_kerneldo=
c(&current_doc, &name, &api_type, Some(&signature)) {
+                                specs.push(spec);
+                            }
+                        }
+                    }
+                } else {
+                    // Remove leading asterisk and preserve content
+                    let cleaned =3D if let Some(stripped) =3D line.trim_st=
art().strip_prefix("*") {
+                        if let Some(no_space) =3D stripped.strip_prefix(' =
') {
+                            no_space
+                        } else {
+                            stripped
+                        }
+                    } else {
+                        line.trim_start()
+                    };
+                    current_doc.push_str(cleaned);
+                    current_doc.push('\n');
+                }
+            }
+
+            i +=3D 1;
+        }
+
+        Ok(specs)
+    }
+
+    fn find_function_after(&self, lines: &[&str], start: usize) -> Option<=
(String, String, String)> {
+        for i in start..lines.len().min(start + 10) {
+            let line =3D lines[i];
+
+            // Skip empty lines
+            if line.trim().is_empty() {
+                continue;
+            }
+
+            // Check for SYSCALL_DEFINE
+            if let Some(caps) =3D self.syscall_regex.captures(line) {
+                let name =3D format!("sys_{}", caps.get(1).unwrap().as_str=
());
+                let signature =3D self.extract_syscall_signature(lines, i);
+                return Some((name, "syscall".to_string(), signature));
+            }
+
+            // Check for ioctl function
+            if let Some(caps) =3D self.ioctl_regex.captures(line) {
+                let name =3D caps.get(1).unwrap().as_str().to_string();
+                return Some((name, "ioctl".to_string(), line.to_string()));
+            }
+
+            // Check for regular function
+            if let Some(caps) =3D self.function_regex.captures(line) {
+                let name =3D caps.get(1).unwrap().as_str().to_string();
+                return Some((name, "function".to_string(), line.to_string(=
)));
+            }
+
+            // Stop if we hit something that's clearly not part of the fun=
ction declaration
+            if !line.starts_with(' ') && !line.starts_with('\t') && !line.=
trim().is_empty() {
+                break;
+            }
+        }
+
+        None
+    }
+
+    fn extract_syscall_signature(&self, lines: &[&str], start: usize) -> S=
tring {
+        // Extract the full SYSCALL_DEFINE signature
+        let mut sig =3D String::new();
+        let mut in_paren =3D false;
+        let mut paren_count =3D 0;
+
+        for line in lines.iter().skip(start).take(20) {
+            let line =3D *line;
+
+            // Start of SYSCALL_DEFINE
+            if line.contains("SYSCALL_DEFINE") {
+                if let Some(pos) =3D line.find('(') {
+                    sig.push_str(&line[pos..]);
+                    in_paren =3D true;
+                    paren_count =3D line[pos..].chars().filter(|&c| c =3D=
=3D '(').count() -
+                                  line[pos..].chars().filter(|&c| c =3D=3D=
 ')').count();
+                }
+            } else if in_paren {
+                sig.push(' ');
+                sig.push_str(line.trim());
+                paren_count +=3D line.chars().filter(|&c| c =3D=3D '(').co=
unt();
+                paren_count -=3D line.chars().filter(|&c| c =3D=3D ')').co=
unt();
+
+                if paren_count =3D=3D 0 {
+                    break;
+                }
+            }
+        }
+
+        sig
+    }
+}
+
+impl ApiExtractor for SourceExtractor {
+    fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+        let path =3D Path::new(&self.path);
+        let mut all_specs =3D Vec::new();
+
+        if path.is_file() {
+            // Single file
+            all_specs.extend(self.extract_from_file(path)?);
+        } else if path.is_dir() {
+            // Directory - walk all .c files
+            for entry in WalkDir::new(path)
+                .into_iter()
+                .filter_map(|e| e.ok())
+                .filter(|e| e.path().extension().is_some_and(|ext| ext =3D=
=3D "c"))
+            {
+                if let Ok(specs) =3D self.extract_from_file(entry.path()) {
+                    all_specs.extend(specs);
+                }
+            }
+        }
+
+        Ok(all_specs)
+    }
+
+    fn extract_by_name(&self, name: &str) -> Result<Option<ApiSpec>> {
+        let all_specs =3D self.extract_all()?;
+        Ok(all_specs.into_iter().find(|s| s.name =3D=3D name))
+    }
+
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        output: &mut dyn Write,
+    ) -> Result<()> {
+        if let Some(spec) =3D self.extract_by_name(api_name)? {
+            display_api_spec(&spec, formatter, output)?;
+        } else {
+            writeln!(output, "API '{}' not found", api_name)?;
+        }
+        Ok(())
+    }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/extractor/vmlinux/binary_utils.rs b/tools/kapi/=
src/extractor/vmlinux/binary_utils.rs
new file mode 100644
index 0000000000000..0a51943e1c027
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/binary_utils.rs
@@ -0,0 +1,180 @@
+// Constants for all structure field sizes
+pub mod sizes {
+    pub const NAME: usize =3D 128;
+    pub const DESC: usize =3D 512;
+    pub const MAX_PARAMS: usize =3D 16;
+    pub const MAX_ERRORS: usize =3D 32;
+    pub const MAX_CONSTRAINTS: usize =3D 16;
+    pub const MAX_CAPABILITIES: usize =3D 8;
+    pub const MAX_SIGNALS: usize =3D 16;
+    pub const MAX_STRUCT_SPECS: usize =3D 8;
+    pub const MAX_SIDE_EFFECTS: usize =3D 32;
+    pub const MAX_STATE_TRANS: usize =3D 16;
+    pub const MAX_PROTOCOL_BEHAVIORS: usize =3D 8;
+    pub const MAX_ADDR_FAMILIES: usize =3D 8;
+}
+
+// Helper for reading data at specific offsets
+pub struct DataReader<'a> {
+    pub data: &'a [u8],
+    pub pos: usize,
+}
+
+impl<'a> DataReader<'a> {
+    pub fn new(data: &'a [u8], offset: usize) -> Self {
+        Self { data, pos: offset }
+    }
+
+    pub fn read_bytes(&mut self, len: usize) -> Option<&'a [u8]> {
+        if self.pos + len <=3D self.data.len() {
+            let bytes =3D &self.data[self.pos..self.pos + len];
+            self.pos +=3D len;
+            Some(bytes)
+        } else {
+            None
+        }
+    }
+
+    pub fn read_cstring(&mut self, max_len: usize) -> Option<String> {
+        let bytes =3D self.read_bytes(max_len)?;
+        if let Some(null_pos) =3D bytes.iter().position(|&b| b =3D=3D 0) {
+            if null_pos > 0 {
+                if let Ok(s) =3D std::str::from_utf8(&bytes[..null_pos]) {
+                    return Some(s.to_string());
+                }
+            }
+        }
+        None
+    }
+
+    pub fn read_u32(&mut self) -> Option<u32> {
+        self.read_bytes(4).map(|b| u32::from_le_bytes(b.try_into().unwrap(=
)))
+    }
+
+    pub fn read_u8(&mut self) -> Option<u8> {
+        self.read_bytes(1).map(|b| b[0])
+    }
+
+    pub fn read_i32(&mut self) -> Option<i32> {
+        self.read_bytes(4).map(|b| i32::from_le_bytes(b.try_into().unwrap(=
)))
+    }
+
+    pub fn read_u64(&mut self) -> Option<u64> {
+        self.read_bytes(8).map(|b| u64::from_le_bytes(b.try_into().unwrap(=
)))
+    }
+
+    pub fn read_i64(&mut self) -> Option<i64> {
+        self.read_bytes(8).map(|b| i64::from_le_bytes(b.try_into().unwrap(=
)))
+    }
+
+    pub fn read_usize(&mut self) -> Option<usize> {
+        self.read_u64().map(|v| v as usize)
+    }
+
+    pub fn skip(&mut self, len: usize) {
+        self.pos =3D (self.pos + len).min(self.data.len());
+    }
+
+    // Helper methods for common patterns
+    pub fn read_bool(&mut self) -> Option<bool> {
+        self.read_u8().map(|v| v !=3D 0)
+    }
+
+    pub fn read_optional_string(&mut self, max_len: usize) -> Option<Strin=
g> {
+        self.read_cstring(max_len).filter(|s| !s.is_empty())
+    }
+
+    pub fn read_string_or_default(&mut self, max_len: usize) -> String {
+        self.read_cstring(max_len).unwrap_or_default()
+    }
+
+    // Skip and discard - advances position by reading and discarding
+    pub fn discard_cstring(&mut self, max_len: usize) {
+        let _ =3D self.read_cstring(max_len);
+    }
+
+    // Read multiple booleans at once
+    pub fn read_bools<const N: usize>(&mut self) -> Option<[bool; N]> {
+        let mut result =3D [false; N];
+        for item in &mut result {
+            *item =3D self.read_bool()?;
+        }
+        Some(result)
+    }
+
+
+}
+
+// Structure layout definitions for calculating sizes
+pub fn signal_mask_spec_layout_size() -> usize {
+    // Packed structure from struct kapi_signal_mask_spec
+    sizes::NAME + // mask_name
+    4 * sizes::MAX_SIGNALS + // signals array
+    4 + // signal_count
+    sizes::DESC // description
+}
+
+pub fn struct_field_layout_size() -> usize {
+    // Packed structure from struct kapi_struct_field
+    sizes::NAME + // name
+    4 + // type (enum)
+    sizes::NAME + // type_name
+    8 + // offset (size_t)
+    8 + // size (size_t)
+    4 + // flags
+    4 + // constraint_type (enum)
+    8 + // min_value (s64)
+    8 + // max_value (s64)
+    8 + // valid_mask (u64)
+    sizes::DESC + // enum_values
+    sizes::DESC // description
+}
+
+pub fn socket_state_spec_layout_size() -> usize {
+    // struct kapi_socket_state_spec
+    sizes::NAME * sizes::MAX_CONSTRAINTS + // required_states array
+    sizes::NAME * sizes::MAX_CONSTRAINTS + // forbidden_states array
+    sizes::NAME + // resulting_state
+    sizes::DESC + // condition
+    sizes::NAME + // applicable_protocols
+    4 + // required_count
+    4 // forbidden_count
+}
+
+pub fn protocol_behavior_spec_layout_size() -> usize {
+    // struct kapi_protocol_behavior
+    sizes::NAME + // applicable_protocols
+    sizes::DESC + // behavior
+    sizes::NAME + // protocol_flags
+    sizes::DESC // flag_description
+}
+
+pub fn buffer_spec_layout_size() -> usize {
+    // struct kapi_buffer_spec
+    sizes::DESC + // buffer_behaviors
+    8 + // min_buffer_size (size_t)
+    8 + // max_buffer_size (size_t)
+    8 // optimal_buffer_size (size_t)
+}
+
+pub fn async_spec_layout_size() -> usize {
+    // struct kapi_async_spec
+    sizes::NAME + // supported_modes
+    4 // nonblock_errno (int)
+}
+
+pub fn addr_family_spec_layout_size() -> usize {
+    // struct kapi_addr_family_spec
+    4 + // family (int)
+    sizes::NAME + // family_name
+    8 + // addr_struct_size (size_t)
+    8 + // min_addr_len (size_t)
+    8 + // max_addr_len (size_t)
+    sizes::DESC + // addr_format
+    1 + // supports_wildcard (bool)
+    1 + // supports_multicast (bool)
+    1 + // supports_broadcast (bool)
+    sizes::DESC + // special_addresses
+    4 + // port_range_min (u32)
+    4 // port_range_max (u32)
+}
diff --git a/tools/kapi/src/extractor/vmlinux/magic_finder.rs b/tools/kapi/=
src/extractor/vmlinux/magic_finder.rs
new file mode 100644
index 0000000000000..cb7dc535801a0
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/magic_finder.rs
@@ -0,0 +1,102 @@
+// Magic markers for each section
+pub const MAGIC_PARAM: u32 =3D 0x4B415031;    // 'KAP1'
+pub const MAGIC_RETURN: u32 =3D 0x4B415232;   // 'KAR2'
+pub const MAGIC_ERROR: u32 =3D 0x4B414533;    // 'KAE3'
+pub const MAGIC_LOCK: u32 =3D 0x4B414C34;     // 'KAL4'
+pub const MAGIC_CONSTRAINT: u32 =3D 0x4B414335; // 'KAC5'
+pub const MAGIC_INFO: u32 =3D 0x4B414936;     // 'KAI6'
+pub const MAGIC_SIGNAL: u32 =3D 0x4B415337;   // 'KAS7'
+pub const MAGIC_SIGMASK: u32 =3D 0x4B414D38;  // 'KAM8'
+pub const MAGIC_STRUCT: u32 =3D 0x4B415439;   // 'KAT9'
+pub const MAGIC_EFFECT: u32 =3D 0x4B414641;   // 'KAFA'
+pub const MAGIC_TRANS: u32 =3D 0x4B415442;    // 'KATB'
+pub const MAGIC_CAP: u32 =3D 0x4B414343;      // 'KACC'
+
+pub struct MagicOffsets {
+    pub param_offset: Option<usize>,
+    pub return_offset: Option<usize>,
+    pub error_offset: Option<usize>,
+    pub lock_offset: Option<usize>,
+    pub constraint_offset: Option<usize>,
+    pub info_offset: Option<usize>,
+    pub signal_offset: Option<usize>,
+    pub sigmask_offset: Option<usize>,
+    pub struct_offset: Option<usize>,
+    pub effect_offset: Option<usize>,
+    pub trans_offset: Option<usize>,
+    pub cap_offset: Option<usize>,
+}
+
+impl MagicOffsets {
+    /// Find magic markers in the provided data slice
+    /// data: slice of data to search (typically one spec's worth)
+    /// base_offset: absolute offset where this slice starts in the full b=
uffer
+    pub fn find_in_data(data: &[u8], base_offset: usize) -> Self {
+        let mut offsets =3D MagicOffsets {
+            param_offset: None,
+            return_offset: None,
+            error_offset: None,
+            lock_offset: None,
+            constraint_offset: None,
+            info_offset: None,
+            signal_offset: None,
+            sigmask_offset: None,
+            struct_offset: None,
+            effect_offset: None,
+            trans_offset: None,
+            cap_offset: None,
+        };
+
+        // Scan through data looking for magic markers
+        // Only find the first occurrence of each magic to avoid cross-spe=
c contamination
+        let mut i =3D 0;
+        while i + 4 <=3D data.len() {
+            let bytes =3D &data[i..i + 4];
+            let value =3D u32::from_le_bytes([bytes[0], bytes[1], bytes[2]=
, bytes[3]]);
+
+            match value {
+                MAGIC_PARAM if offsets.param_offset.is_none() =3D> {
+                    offsets.param_offset =3D Some(base_offset + i);
+                },
+                MAGIC_RETURN if offsets.return_offset.is_none() =3D> {
+                    offsets.return_offset =3D Some(base_offset + i);
+                },
+                MAGIC_ERROR if offsets.error_offset.is_none() =3D> {
+                    offsets.error_offset =3D Some(base_offset + i);
+                },
+                MAGIC_LOCK if offsets.lock_offset.is_none() =3D> {
+                    offsets.lock_offset =3D Some(base_offset + i);
+                },
+                MAGIC_CONSTRAINT if offsets.constraint_offset.is_none() =
=3D> {
+                    offsets.constraint_offset =3D Some(base_offset + i);
+                },
+                MAGIC_INFO if offsets.info_offset.is_none() =3D> {
+                    offsets.info_offset =3D Some(base_offset + i);
+                },
+                MAGIC_SIGNAL if offsets.signal_offset.is_none() =3D> {
+                    offsets.signal_offset =3D Some(base_offset + i);
+                },
+                MAGIC_SIGMASK if offsets.sigmask_offset.is_none() =3D> {
+                    offsets.sigmask_offset =3D Some(base_offset + i);
+                },
+                MAGIC_STRUCT if offsets.struct_offset.is_none() =3D> {
+                    offsets.struct_offset =3D Some(base_offset + i);
+                },
+                MAGIC_EFFECT if offsets.effect_offset.is_none() =3D> {
+                    offsets.effect_offset =3D Some(base_offset + i);
+                },
+                MAGIC_TRANS if offsets.trans_offset.is_none() =3D> {
+                    offsets.trans_offset =3D Some(base_offset + i);
+                },
+                MAGIC_CAP if offsets.cap_offset.is_none() =3D> {
+                    offsets.cap_offset =3D Some(base_offset + i);
+                },
+                _ =3D> {}
+            }
+
+            i +=3D 1;
+        }
+
+        offsets
+    }
+}
\ No newline at end of file
diff --git a/tools/kapi/src/extractor/vmlinux/mod.rs b/tools/kapi/src/extra=
ctor/vmlinux/mod.rs
new file mode 100644
index 0000000000000..bf3da4df6e66a
--- /dev/null
+++ b/tools/kapi/src/extractor/vmlinux/mod.rs
@@ -0,0 +1,864 @@
+use super::{
+    ApiExtractor, ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, Lock=
Spec, ParamSpec,
+    ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec, StateTransitio=
nSpec, StructSpec,
+    StructFieldSpec,
+};
+use crate::formatter::OutputFormatter;
+use anyhow::{Context, Result};
+use goblin::elf::Elf;
+use std::convert::TryInto;
+use std::fs;
+use std::io::Write;
+
+mod binary_utils;
+mod magic_finder;
+use binary_utils::{
+    DataReader, addr_family_spec_layout_size, async_spec_layout_size, buff=
er_spec_layout_size,
+    protocol_behavior_spec_layout_size, signal_mask_spec_layout_size,
+    sizes, socket_state_spec_layout_size, struct_field_layout_size,
+};
+
+// Helper to convert empty strings to None
+fn opt_string(s: String) -> Option<String> {
+    if s.is_empty() { None } else { Some(s) }
+}
+
+pub struct VmlinuxExtractor {
+    kapi_data: Vec<u8>,
+    specs: Vec<KapiSpec>,
+}
+
+#[derive(Debug)]
+struct KapiSpec {
+    name: String,
+    api_type: String,
+    offset: usize,
+}
+
+impl VmlinuxExtractor {
+    pub fn new(vmlinux_path: &str) -> Result<Self> {
+        let vmlinux_data =3D fs::read(vmlinux_path)
+            .with_context(|| format!("Failed to read vmlinux file: {vmlinu=
x_path}"))?;
+
+        let elf =3D Elf::parse(&vmlinux_data).context("Failed to parse ELF=
 file")?;
+
+        // Find __start_kapi_specs and __stop_kapi_specs symbols first
+        let mut start_addr =3D None;
+        let mut stop_addr =3D None;
+
+        for sym in &elf.syms {
+            if let Some(name) =3D elf.strtab.get_at(sym.st_name) {
+                match name {
+                    "__start_kapi_specs" =3D> start_addr =3D Some(sym.st_v=
alue),
+                    "__stop_kapi_specs" =3D> stop_addr =3D Some(sym.st_val=
ue),
+                    _ =3D> {}
+                }
+            }
+        }
+
+        let start =3D start_addr.context("Could not find __start_kapi_spec=
s symbol")?;
+        let stop =3D stop_addr.context("Could not find __stop_kapi_specs s=
ymbol")?;
+
+        if stop <=3D start {
+            anyhow::bail!("No kernel API specifications found in vmlinux");
+        }
+
+        // Find the section containing the kapi specs data
+        // The specs may be in .kapi_specs (standalone) or .rodata (embedd=
ed in RO_DATA)
+        let containing_section =3D elf
+            .section_headers
+            .iter()
+            .find(|sh| {
+                // Check if this section contains the start address
+                start >=3D sh.sh_addr && start < sh.sh_addr + sh.sh_size
+            })
+            .context("Could not find section containing kapi_specs data")?;
+
+        // Calculate the offset within the file
+        let section_vaddr =3D containing_section.sh_addr;
+        let file_offset =3D containing_section.sh_offset + (start - sectio=
n_vaddr);
+        let data_size: usize =3D (stop - start)
+            .try_into()
+            .context("Data size too large for platform")?;
+
+        let file_offset_usize: usize =3D file_offset
+            .try_into()
+            .context("File offset too large for platform")?;
+
+        if file_offset_usize + data_size > vmlinux_data.len() {
+            anyhow::bail!("Invalid offset/size for kapi_specs data");
+        }
+
+        // Extract the raw data
+        let kapi_data =3D vmlinux_data[file_offset_usize..(file_offset_usi=
ze + data_size)].to_vec();
+
+        // Parse the specifications
+        let specs =3D parse_kapi_specs(&kapi_data)?;
+
+        Ok(VmlinuxExtractor { kapi_data, specs })
+    }
+}
+
+fn parse_kapi_specs(data: &[u8]) -> Result<Vec<KapiSpec>> {
+    let mut specs =3D Vec::new();
+    let mut offset =3D 0;
+    let mut last_found_offset =3D None;
+
+    // Expected offset from struct start to param_magic based on struct la=
yout
+    let param_magic_offset =3D sizes::NAME + 4 + sizes::DESC + (sizes::DES=
C * 4) + 4;
+
+    // Find specs by validating API name and magic marker pairs
+    while offset + param_magic_offset + 4 <=3D data.len() {
+        // Read potential API name
+        let name_bytes =3D &data[offset..offset + sizes::NAME.min(data.len=
() - offset)];
+
+        // Find null terminator
+        let name_len =3D name_bytes.iter().position(|&b| b =3D=3D 0).unwra=
p_or(0);
+
+        if name_len > 0 && name_len < 100 {
+            let name =3D String::from_utf8_lossy(&name_bytes[..name_len]).=
to_string();
+
+            // Validate API name format
+            if is_valid_api_name(&name) {
+                // Verify magic marker at expected position
+                let magic_offset =3D offset + param_magic_offset;
+                if magic_offset + 4 <=3D data.len() {
+                    let magic_bytes =3D &data[magic_offset..magic_offset +=
 4];
+                    let magic_value =3D u32::from_le_bytes([magic_bytes[0]=
, magic_bytes[1], magic_bytes[2], magic_bytes[3]]);
+
+                    if magic_value =3D=3D magic_finder::MAGIC_PARAM {
+                        // Avoid duplicate detection of the same spec
+                        if last_found_offset.is_none() || offset >=3D last=
_found_offset.unwrap() + param_magic_offset {
+                            let api_type =3D if name.starts_with("sys_") {
+                                "syscall"
+                            } else if name.ends_with("_ioctl") {
+                                "ioctl"
+                            } else if name.contains("sysfs") {
+                                "sysfs"
+                            } else {
+                                "function"
+                            }
+                            .to_string();
+
+                            specs.push(KapiSpec {
+                                name: name.clone(),
+                                api_type,
+                                offset,
+                            });
+
+                            last_found_offset =3D Some(offset);
+                        }
+                    }
+                }
+            }
+        }
+
+        // Scan byte by byte to find all specs
+        offset +=3D 1;
+    }
+
+    Ok(specs)
+}
+
+
+
+
+fn is_valid_api_name(name: &str) -> bool {
+    // Validate API name format and length
+    if name.is_empty() || name.len() < 3 || name.len() > 100 {
+        return false;
+    }
+
+    // Alphanumeric and underscore characters only
+    if !name.chars().all(|c| c.is_ascii_alphanumeric() || c =3D=3D '_') {
+        return false;
+    }
+
+    // Must start with letter or underscore
+    let first_char =3D name.chars().next().unwrap();
+    if !first_char.is_ascii_alphabetic() && first_char !=3D '_' {
+        return false;
+    }
+
+    // Match common kernel API patterns
+    name.starts_with("sys_") ||
+    name.starts_with("__") ||
+    name.ends_with("_ioctl") ||
+    name.contains("_") ||
+    name.len() > 6
+}
+
+impl ApiExtractor for VmlinuxExtractor {
+    fn extract_all(&self) -> Result<Vec<ApiSpec>> {
+        Ok(self
+            .specs
+            .iter()
+            .map(|spec| {
+                // Parse the full spec for listing
+                parse_binary_to_api_spec(&self.kapi_data, spec.offset)
+                    .unwrap_or_else(|_| ApiSpec {
+                        name: spec.name.clone(),
+                        api_type: spec.api_type.clone(),
+                        description: None,
+                        long_description: None,
+                        version: None,
+                        context_flags: vec![],
+                        param_count: None,
+                        error_count: None,
+                        examples: None,
+                        notes: None,
+                        since_version: None,
+                        subsystem: None,
+                        sysfs_path: None,
+                        permissions: None,
+                        socket_state: None,
+                        protocol_behaviors: vec![],
+                        addr_families: vec![],
+                        buffer_spec: None,
+                        async_spec: None,
+                        net_data_transfer: None,
+                        capabilities: vec![],
+                        parameters: vec![],
+                        return_spec: None,
+                        errors: vec![],
+                        signals: vec![],
+                        signal_masks: vec![],
+                        side_effects: vec![],
+                        state_transitions: vec![],
+                        constraints: vec![],
+                        locks: vec![],
+                        struct_specs: vec![],
+                    })
+            })
+            .collect())
+    }
+
+    fn extract_by_name(&self, api_name: &str) -> Result<Option<ApiSpec>> {
+        if let Some(spec) =3D self.specs.iter().find(|s| s.name =3D=3D api=
_name) {
+            Ok(Some(parse_binary_to_api_spec(&self.kapi_data, spec.offset)=
?))
+        } else {
+            Ok(None)
+        }
+    }
+
+    fn display_api_details(
+        &self,
+        api_name: &str,
+        formatter: &mut dyn OutputFormatter,
+        writer: &mut dyn Write,
+    ) -> Result<()> {
+        if let Some(spec) =3D self.specs.iter().find(|s| s.name =3D=3D api=
_name) {
+            let api_spec =3D parse_binary_to_api_spec(&self.kapi_data, spe=
c.offset)?;
+            super::display_api_spec(&api_spec, formatter, writer)?;
+        }
+        Ok(())
+    }
+}
+
+/// Helper to read count and parse array items with optional magic offset
+fn parse_array_with_magic<T, F>(
+    reader: &mut DataReader,
+    magic_offset: Option<usize>,
+    max_items: u32,
+    parse_fn: F,
+) -> Vec<T>
+where
+    F: Fn(&mut DataReader) -> Option<T>,
+{
+    // Read count - position at magic+4 if magic offset exists
+    let count =3D if let Some(offset) =3D magic_offset {
+        reader.pos =3D offset + 4;
+        reader.read_u32()
+    } else {
+        reader.read_u32()
+    };
+
+    let mut items =3D Vec::new();
+    if let Some(count) =3D count {
+        // Position at start of array data if magic offset exists
+        if let Some(offset) =3D magic_offset {
+            reader.pos =3D offset + 8; // +4 for magic, +4 for count
+        }
+        // Parse items up to max_items
+        for _ in 0..count.min(max_items) as usize {
+            if let Some(item) =3D parse_fn(reader) {
+                items.push(item);
+            }
+        }
+    }
+    items
+}
+
+fn parse_binary_to_api_spec(data: &[u8], offset: usize) -> Result<ApiSpec>=
 {
+    let mut reader =3D DataReader::new(data, offset);
+
+    // Search for magic markers in the entire spec data
+    let search_end =3D (offset + 0x70000).min(data.len()); // Search full =
spec size
+    let spec_data =3D &data[offset..search_end];
+
+    // Find magic markers relative to the spec start
+    let magic_offsets =3D magic_finder::MagicOffsets::find_in_data(spec_da=
ta, offset);
+
+    // Read fields in exact order of struct kernel_api_spec
+
+    // Read name (128 bytes)
+    let name =3D reader
+        .read_cstring(sizes::NAME)
+        .ok_or_else(|| anyhow::anyhow!("Failed to read API name"))?;
+
+    // Determine API type
+    let api_type =3D if name.starts_with("sys_") {
+        "syscall"
+    } else if name.ends_with("_ioctl") {
+        "ioctl"
+    } else if name.contains("sysfs") {
+        "sysfs"
+    } else {
+        "function"
+    }
+    .to_string();
+
+    // Read version (u32)
+    let version =3D reader.read_u32().map(|v| v.to_string());
+
+    // Read description (512 bytes)
+    let description =3D reader.read_cstring(sizes::DESC).filter(|s| !s.is_=
empty());
+
+    // Read long_description (2048 bytes)
+    let long_description =3D reader
+        .read_cstring(sizes::DESC * 4)
+        .filter(|s| !s.is_empty());
+
+    // Read context_flags (u32)
+    let context_flags =3D parse_context_flags(&mut reader);
+
+    // Parse params array
+    let parameters =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.param_offset,
+        sizes::MAX_PARAMS as u32,
+        |r| parse_param(r, 0),  // Index doesn't seem to be used in parse_=
param
+    );
+
+    // Read return_spec
+    let return_spec =3D parse_return_spec(&mut reader);
+
+    // Parse errors array
+    let errors =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.error_offset,
+        sizes::MAX_ERRORS as u32,
+        parse_error,
+    );
+
+    // Parse locks array
+    let locks =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.lock_offset,
+        sizes::MAX_CONSTRAINTS as u32,
+        parse_lock,
+    );
+
+    // Parse constraints array
+    let constraints =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.constraint_offset,
+        sizes::MAX_CONSTRAINTS as u32,
+        parse_constraint,
+    );
+
+    // Read examples and notes - position reader at info section if magic =
found
+    let (examples, notes) =3D if let Some(info_offset) =3D magic_offsets.i=
nfo_offset {
+        reader.pos =3D info_offset + 4; // +4 to skip magic
+        let examples =3D reader.read_cstring(sizes::DESC * 2).filter(|s| !=
s.is_empty());
+        let notes =3D reader.read_cstring(sizes::DESC * 2).filter(|s| !s.i=
s_empty());
+        (examples, notes)
+    } else {
+        let examples =3D reader.read_cstring(sizes::DESC * 2).filter(|s| !=
s.is_empty());
+        let notes =3D reader.read_cstring(sizes::DESC * 2).filter(|s| !s.i=
s_empty());
+        (examples, notes)
+    };
+
+    // Read since_version (32 bytes)
+    let since_version =3D reader.read_cstring(32).filter(|s| !s.is_empty()=
);
+
+    // Skip deprecated (bool =3D 1 byte + 3 bytes padding) and replacement=
 (128 bytes)
+    // These fields were removed from kernel but we need to skip them for =
binary compatibility
+    reader.skip(4); // deprecated + padding
+    reader.discard_cstring(sizes::NAME); // replacement
+
+    // Parse signals array
+    let signals =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.signal_offset,
+        sizes::MAX_SIGNALS as u32,
+        parse_signal,
+    );
+
+    // Read signal_mask_count (u32)
+    let signal_mask_count =3D reader.read_u32();
+
+    // Parse signal_masks array
+    let mut signal_masks =3D Vec::new();
+    if let Some(count) =3D signal_mask_count {
+        for i in 0..sizes::MAX_SIGNALS {
+            if i < count as usize {
+                if let Some(mask) =3D parse_signal_mask(&mut reader) {
+                    signal_masks.push(mask);
+                }
+            } else {
+                reader.skip(signal_mask_spec_layout_size());
+            }
+        }
+    } else {
+        reader.skip(signal_mask_spec_layout_size() * sizes::MAX_SIGNALS);
+    }
+
+    // Parse struct_specs array
+    let struct_specs =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.struct_offset,
+        sizes::MAX_STRUCT_SPECS as u32,
+        parse_struct_spec,
+    );
+
+    // According to the C struct, the order is:
+    // side_effect_count, side_effects array, state_trans_count, state_tra=
nsitions array,
+    // capability_count, capabilities array
+
+    // Parse side_effects array
+    let side_effects =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.effect_offset,
+        sizes::MAX_SIDE_EFFECTS as u32,
+        parse_side_effect,
+    );
+
+    // Parse state_transitions array
+    let state_transitions =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.trans_offset,
+        sizes::MAX_STATE_TRANS as u32,
+        parse_state_transition,
+    );
+
+    // Parse capabilities array
+    let capabilities =3D parse_array_with_magic(
+        &mut reader,
+        magic_offsets.cap_offset,
+        sizes::MAX_CAPABILITIES as u32,
+        parse_capability,
+    );
+
+    // Skip remaining network/socket fields
+    reader.skip(
+        socket_state_spec_layout_size() +
+        protocol_behavior_spec_layout_size() * sizes::MAX_PROTOCOL_BEHAVIO=
RS +
+        4 + // protocol_behavior_count
+        buffer_spec_layout_size() +
+        async_spec_layout_size() +
+        addr_family_spec_layout_size() * sizes::MAX_ADDR_FAMILIES +
+        4 + // addr_family_count
+        6 + 2 + // 6 bool flags + padding
+        sizes::DESC * 3 // 3 semantic descriptions
+    );
+
+    Ok(ApiSpec {
+        name,
+        api_type,
+        description,
+        long_description,
+        version,
+        context_flags,
+        param_count: if parameters.is_empty() { None } else { Some(paramet=
ers.len() as u32) },
+        error_count: if errors.is_empty() { None } else { Some(errors.len(=
) as u32) },
+        examples,
+        notes,
+        since_version,
+        subsystem: None,
+        sysfs_path: None,
+        permissions: None,
+        socket_state: None,
+        protocol_behaviors: vec![],
+        addr_families: vec![],
+        buffer_spec: None,
+        async_spec: None,
+        net_data_transfer: None,
+        capabilities,
+        parameters,
+        return_spec,
+        errors,
+        signals,
+        signal_masks,
+        side_effects,
+        state_transitions,
+        constraints,
+        locks,
+        struct_specs,
+    })
+}
+
+// Helper parsing functions
+
+fn parse_context_flags(reader: &mut DataReader) -> Vec<String> {
+    const KAPI_CTX_PROCESS: u32 =3D 1 << 0;
+    const KAPI_CTX_SOFTIRQ: u32 =3D 1 << 1;
+    const KAPI_CTX_HARDIRQ: u32 =3D 1 << 2;
+    const KAPI_CTX_NMI: u32 =3D 1 << 3;
+    const KAPI_CTX_ATOMIC: u32 =3D 1 << 4;
+    const KAPI_CTX_SLEEPABLE: u32 =3D 1 << 5;
+    const KAPI_CTX_PREEMPT_DISABLED: u32 =3D 1 << 6;
+    const KAPI_CTX_IRQ_DISABLED: u32 =3D 1 << 7;
+
+    if let Some(flags) =3D reader.read_u32() {
+        let mut parts =3D Vec::new();
+
+        if flags & KAPI_CTX_PROCESS !=3D 0 {
+            parts.push("KAPI_CTX_PROCESS");
+        }
+        if flags & KAPI_CTX_SOFTIRQ !=3D 0 {
+            parts.push("KAPI_CTX_SOFTIRQ");
+        }
+        if flags & KAPI_CTX_HARDIRQ !=3D 0 {
+            parts.push("KAPI_CTX_HARDIRQ");
+        }
+        if flags & KAPI_CTX_NMI !=3D 0 {
+            parts.push("KAPI_CTX_NMI");
+        }
+        if flags & KAPI_CTX_ATOMIC !=3D 0 {
+            parts.push("KAPI_CTX_ATOMIC");
+        }
+        if flags & KAPI_CTX_SLEEPABLE !=3D 0 {
+            parts.push("KAPI_CTX_SLEEPABLE");
+        }
+        if flags & KAPI_CTX_PREEMPT_DISABLED !=3D 0 {
+            parts.push("KAPI_CTX_PREEMPT_DISABLED");
+        }
+        if flags & KAPI_CTX_IRQ_DISABLED !=3D 0 {
+            parts.push("KAPI_CTX_IRQ_DISABLED");
+        }
+
+        if !parts.is_empty() {
+            vec![parts.join(" | ")]
+        } else {
+            vec![]
+        }
+    } else {
+        vec![]
+    }
+}
+
+fn parse_param(reader: &mut DataReader, index: usize) -> Option<ParamSpec>=
 {
+    let name =3D reader.read_cstring(sizes::NAME)?;
+    let type_name =3D reader.read_cstring(sizes::NAME)?;
+    let param_type =3D reader.read_u32()?;
+    let flags =3D reader.read_u32()?;
+    let size =3D reader.read_usize()?;
+    let alignment =3D reader.read_usize()?;
+    let min_value =3D reader.read_i64()?;
+    let max_value =3D reader.read_i64()?;
+    let valid_mask =3D reader.read_u64()?;
+
+    // Skip enum_values pointer (8 bytes)
+    reader.skip(8);
+    let _enum_count =3D reader.read_u32()?; // Must use ? to propagate err=
ors
+    let constraint_type =3D reader.read_u32()?;
+    // Skip validate function pointer (8 bytes)
+    reader.skip(8);
+
+    let description =3D reader.read_string_or_default(sizes::DESC);
+    let constraint =3D reader.read_optional_string(sizes::DESC);
+    let _size_param_idx =3D reader.read_i32()?; // Must use ? to propagate=
 errors
+    let _size_multiplier =3D reader.read_usize()?; // Must use ? to propag=
ate errors
+
+    Some(ParamSpec {
+        index: index as u32,
+        name,
+        type_name,
+        description,
+        flags,
+        param_type,
+        constraint_type,
+        constraint,
+        min_value: Some(min_value),
+        max_value: Some(max_value),
+        valid_mask: Some(valid_mask),
+        enum_values: vec![],
+        size: Some(size as u32),
+        alignment: Some(alignment as u32),
+    })
+}
+
+fn parse_return_spec(reader: &mut DataReader) -> Option<ReturnSpec> {
+    // Read type_name, but treat empty as valid (will be empty string)
+    let type_name =3D reader.read_string_or_default(sizes::NAME);
+
+    // Read return_type and check_type
+    let return_type =3D reader.read_u32().unwrap_or(0);
+    let check_type =3D reader.read_u32().unwrap_or(0);
+    let success_value =3D reader.read_i64().unwrap_or(0);
+    let success_min =3D reader.read_i64().unwrap_or(0);
+    let success_max =3D reader.read_i64().unwrap_or(0);
+
+    // Skip error_values pointer (8 bytes)
+    reader.skip(8);
+    let _error_count =3D reader.read_u32().unwrap_or(0); // Don't fail on =
return spec
+    // Skip is_success function pointer (8 bytes)
+    reader.skip(8);
+
+    let description =3D reader.read_string_or_default(sizes::DESC);
+
+    // Return a spec even if type_name is empty, as long as we have some d=
ata
+    // The type_name might be a string like "KAPI_TYPE_INT" that gets stor=
ed literally
+    if type_name.is_empty() && return_type =3D=3D 0 && check_type =3D=3D 0=
 && success_value =3D=3D 0 {
+        // No return spec at all
+        return None;
+    }
+
+    Some(ReturnSpec {
+        type_name,
+        description,
+        return_type,
+        check_type,
+        success_value: Some(success_value),
+        success_min: Some(success_min),
+        success_max: Some(success_max),
+        error_values: vec![],
+    })
+}
+
+fn parse_error(reader: &mut DataReader) -> Option<ErrorSpec> {
+    let error_code =3D reader.read_i32()?;
+    let name =3D reader.read_cstring(sizes::NAME)?;
+    let condition =3D reader.read_string_or_default(sizes::DESC);
+    let description =3D reader.read_string_or_default(sizes::DESC);
+
+    Some(ErrorSpec {
+        error_code,
+        name,
+        condition,
+        description,
+    })
+}
+
+fn parse_lock(reader: &mut DataReader) -> Option<LockSpec> {
+    let lock_name =3D reader.read_cstring(sizes::NAME)?;
+    let lock_type =3D reader.read_u32()?;
+    let scope =3D reader.read_u32()?;
+    let description =3D reader.read_string_or_default(sizes::DESC);
+
+    Some(LockSpec {
+        lock_name,
+        lock_type,
+        scope,
+        description,
+    })
+}
+
+fn parse_constraint(reader: &mut DataReader) -> Option<ConstraintSpec> {
+    let name =3D reader.read_cstring(sizes::NAME)?;
+    let description =3D reader.read_string_or_default(sizes::DESC);
+    let expression =3D reader.read_string_or_default(sizes::DESC);
+
+    // No function pointer in packed struct
+
+    Some(ConstraintSpec {
+        name,
+        description,
+        expression: opt_string(expression),
+    })
+}
+
+fn parse_signal(reader: &mut DataReader) -> Option<SignalSpec> {
+    let signal_num =3D reader.read_i32()?;
+    let signal_name =3D reader.read_cstring(32)?; // signal_name[32]
+    let direction =3D reader.read_u32()?;
+    let action =3D reader.read_u32()?;
+    let target =3D reader.read_optional_string(sizes::DESC); // target[512]
+    let condition =3D reader.read_optional_string(sizes::DESC); // conditi=
on[512]
+    let description =3D reader.read_optional_string(sizes::DESC); // descr=
iption[512]
+    let restartable =3D reader.read_bool()?;
+    let sa_flags_required =3D reader.read_u32()?;
+    let sa_flags_forbidden =3D reader.read_u32()?;
+    let error_on_signal =3D reader.read_i32()?;
+    let _transform_to =3D reader.read_i32()?; // transform_to
+    let timing_bytes =3D reader.read_bytes(32)?; // timing[32]
+    let timing =3D if let Some(end) =3D timing_bytes.iter().position(|&b| =
b =3D=3D 0) {
+        String::from_utf8_lossy(&timing_bytes[..end]).parse().unwrap_or(0)
+    } else {
+        0
+    };
+    let priority =3D reader.read_u8()?;
+    let interruptible =3D reader.read_bool()?;
+    let _queue_behavior =3D reader.read_bytes(128)?; // queue_behavior[128]
+    let state_required =3D reader.read_u32()?;
+    let state_forbidden =3D reader.read_u32()?;
+
+    Some(SignalSpec {
+        signal_num,
+        signal_name,
+        direction,
+        action,
+        target,
+        condition,
+        description,
+        timing,
+        priority: priority as u32,
+        restartable,
+        interruptible,
+        queue: None, // queue_behavior not exposed in SignalSpec
+        sa_flags: 0, // Not directly available
+        sa_flags_required,
+        sa_flags_forbidden,
+        state_required,
+        state_forbidden,
+        error_on_signal: Some(error_on_signal),
+    })
+}
+
+fn parse_signal_mask(reader: &mut DataReader) -> Option<SignalMaskSpec> {
+    let name =3D reader.read_cstring(sizes::NAME)?;
+    let description =3D reader.read_string_or_default(sizes::DESC);
+
+    // Skip signals array
+    for _ in 0..sizes::MAX_SIGNALS {
+        reader.read_i32();
+    }
+
+    let _signal_count =3D reader.read_u32()?;
+
+    Some(SignalMaskSpec {
+        name,
+        description,
+    })
+}
+
+fn parse_struct_field(reader: &mut DataReader) -> Option<StructFieldSpec> {
+    let name =3D reader.read_cstring(sizes::NAME)?;
+    let field_type =3D reader.read_u32()?;
+    let type_name =3D reader.read_cstring(sizes::NAME)?;
+    let offset =3D reader.read_usize()?;
+    let size =3D reader.read_usize()?;
+    let flags =3D reader.read_u32()?;
+    let constraint_type =3D reader.read_u32()?;
+    let min_value =3D reader.read_i64()?;
+    let max_value =3D reader.read_i64()?;
+    let valid_mask =3D reader.read_u64()?;
+    // Skip enum_values field (512 bytes)
+    let _enum_values =3D reader.read_cstring(sizes::DESC); // Don't fail o=
n optional field
+    let description =3D reader.read_string_or_default(sizes::DESC);
+
+    Some(StructFieldSpec {
+        name,
+        field_type,
+        type_name,
+        offset,
+        size,
+        flags,
+        constraint_type,
+        min_value,
+        max_value,
+        valid_mask,
+        description,
+    })
+}
+
+fn parse_struct_spec(reader: &mut DataReader) -> Option<StructSpec> {
+    let name =3D reader.read_cstring(sizes::NAME)?;
+    let size =3D reader.read_usize()?;
+    let alignment =3D reader.read_usize()?;
+    let field_count =3D reader.read_u32()?;
+
+    // Parse fields array
+    let mut fields =3D Vec::new();
+    for _ in 0..field_count.min(sizes::MAX_PARAMS as u32) {
+        if let Some(field) =3D parse_struct_field(reader) {
+            fields.push(field);
+        } else {
+            // Skip this field if we can't parse it
+            reader.skip(struct_field_layout_size());
+        }
+    }
+
+    // Skip remaining fields if any
+    let remaining =3D sizes::MAX_PARAMS as u32 - field_count.min(sizes::MA=
X_PARAMS as u32);
+    for _ in 0..remaining {
+        reader.skip(struct_field_layout_size());
+    }
+
+    let description =3D reader.read_string_or_default(sizes::DESC);
+
+    Some(StructSpec {
+        name,
+        size,
+        alignment,
+        field_count,
+        fields,
+        description,
+    })
+}
+
+fn parse_side_effect(reader: &mut DataReader) -> Option<SideEffectSpec> {
+    let effect_type =3D reader.read_u32()?;
+    let target =3D reader.read_cstring(sizes::NAME)?;
+    let condition =3D reader.read_string_or_default(sizes::DESC);
+    let description =3D reader.read_string_or_default(sizes::DESC);
+    let reversible =3D reader.read_bool()?;
+    // No padding needed for packed struct
+
+    Some(SideEffectSpec {
+        effect_type,
+        target,
+        condition: opt_string(condition),
+        description,
+        reversible,
+    })
+}
+
+fn parse_state_transition(reader: &mut DataReader) -> Option<StateTransiti=
onSpec> {
+    let from_state =3D reader.read_cstring(sizes::NAME)?;
+    let to_state =3D reader.read_cstring(sizes::NAME)?;
+    let condition =3D reader.read_string_or_default(sizes::DESC);
+    let object =3D reader.read_cstring(sizes::NAME)?;
+    let description =3D reader.read_string_or_default(sizes::DESC);
+
+    Some(StateTransitionSpec {
+        object,
+        from_state,
+        to_state,
+        condition: opt_string(condition),
+        description,
+    })
+}
+
+fn parse_capability(reader: &mut DataReader) -> Option<CapabilitySpec> {
+    let capability =3D reader.read_i32()?;
+    let cap_name =3D reader.read_cstring(sizes::NAME)?;
+    let action =3D reader.read_u32()?;
+    let allows =3D reader.read_string_or_default(sizes::DESC);
+    let without_cap =3D reader.read_string_or_default(sizes::DESC);
+    let check_condition =3D reader.read_optional_string(sizes::DESC);
+    let priority =3D reader.read_u32()?;
+
+    let mut alternatives =3D Vec::new();
+    for _ in 0..sizes::MAX_CAPABILITIES {
+        if let Some(alt) =3D reader.read_i32() {
+            if alt !=3D 0 {
+                alternatives.push(alt);
+            }
+        }
+    }
+
+    let _alternative_count =3D reader.read_u32()?; // alternative_count
+
+    Some(CapabilitySpec {
+        capability,
+        name: cap_name,
+        action: action.to_string(),
+        allows,
+        without_cap,
+        check_condition,
+        priority: Some(priority as u8),
+        alternatives,
+    })
+}
\ No newline at end of file
diff --git a/tools/kapi/src/formatter/json.rs b/tools/kapi/src/formatter/js=
on.rs
new file mode 100644
index 0000000000000..8025467409d64
--- /dev/null
+++ b/tools/kapi/src/formatter/json.rs
@@ -0,0 +1,468 @@
+use super::OutputFormatter;
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,=
 ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas=
kSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec, StructSpec,
+};
+use serde::Serialize;
+use std::io::Write;
+
+pub struct JsonFormatter {
+    data: JsonData,
+}
+
+#[derive(Serialize)]
+struct JsonData {
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    apis: Option<Vec<JsonApi>>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    api_details: Option<JsonApiDetails>,
+}
+
+#[derive(Serialize)]
+struct JsonApi {
+    name: String,
+    api_type: String,
+}
+
+#[derive(Serialize)]
+struct JsonApiDetails {
+    name: String,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    description: Option<String>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    long_description: Option<String>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    context_flags: Vec<String>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    examples: Option<String>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    notes: Option<String>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    since_version: Option<String>,
+    // Sysfs-specific fields
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    subsystem: Option<String>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    sysfs_path: Option<String>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    permissions: Option<String>,
+    // Networking-specific fields
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    socket_state: Option<SocketStateSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    protocol_behaviors: Vec<ProtocolBehaviorSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    addr_families: Vec<AddrFamilySpec>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    buffer_spec: Option<BufferSpec>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    async_spec: Option<AsyncSpec>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    net_data_transfer: Option<String>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    capabilities: Vec<CapabilitySpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    state_transitions: Vec<StateTransitionSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    side_effects: Vec<SideEffectSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    parameters: Vec<ParamSpec>,
+    #[serde(skip_serializing_if =3D "Option::is_none")]
+    return_spec: Option<ReturnSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    errors: Vec<ErrorSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    locks: Vec<LockSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    struct_specs: Vec<StructSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    signals: Vec<SignalSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    signal_masks: Vec<SignalMaskSpec>,
+    #[serde(skip_serializing_if =3D "Vec::is_empty")]
+    constraints: Vec<ConstraintSpec>,
+}
+
+impl JsonFormatter {
+    pub fn new() -> Self {
+        JsonFormatter {
+            data: JsonData {
+                apis: None,
+                api_details: None,
+            },
+        }
+    }
+}
+
+impl OutputFormatter for JsonFormatter {
+    fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()=
> {
+        Ok(())
+    }
+
+    fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()> {
+        let json =3D serde_json::to_string_pretty(&self.data)?;
+        writeln!(w, "{json}")?;
+        Ok(())
+    }
+
+    fn begin_api_list(&mut self, _w: &mut dyn Write, _title: &str) -> std:=
:io::Result<()> {
+        self.data.apis =3D Some(Vec::new());
+        Ok(())
+    }
+
+    fn api_item(&mut self, _w: &mut dyn Write, name: &str, api_type: &str)=
 -> std::io::Result<()> {
+        if let Some(apis) =3D &mut self.data.apis {
+            apis.push(JsonApi {
+                name: name.to_string(),
+                api_type: api_type.to_string(),
+            });
+        }
+        Ok(())
+    }
+
+    fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn total_specs(&mut self, _w: &mut dyn Write, _count: usize) -> std::i=
o::Result<()> {
+        Ok(())
+    }
+
+    fn begin_api_details(&mut self, _w: &mut dyn Write, name: &str) -> std=
::io::Result<()> {
+        self.data.api_details =3D Some(JsonApiDetails {
+            name: name.to_string(),
+            description: None,
+            long_description: None,
+            context_flags: Vec::new(),
+            examples: None,
+            notes: None,
+            since_version: None,
+            subsystem: None,
+            sysfs_path: None,
+            permissions: None,
+            socket_state: None,
+            protocol_behaviors: Vec::new(),
+            addr_families: Vec::new(),
+            buffer_spec: None,
+            async_spec: None,
+            net_data_transfer: None,
+            capabilities: Vec::new(),
+            state_transitions: Vec::new(),
+            side_effects: Vec::new(),
+            parameters: Vec::new(),
+            return_spec: None,
+            errors: Vec::new(),
+            locks: Vec::new(),
+            struct_specs: Vec::new(),
+            signals: Vec::new(),
+            signal_masks: Vec::new(),
+            constraints: Vec::new(),
+        });
+        Ok(())
+    }
+
+    fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<(=
)> {
+        Ok(())
+    }
+
+    fn description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::=
Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.description =3D Some(desc.to_string());
+        }
+        Ok(())
+    }
+
+    fn long_description(&mut self, _w: &mut dyn Write, desc: &str) -> std:=
:io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.long_description =3D Some(desc.to_string());
+        }
+        Ok(())
+    }
+
+    fn begin_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Resu=
lt<()> {
+        Ok(())
+    }
+
+    fn context_flag(&mut self, _w: &mut dyn Write, flag: &str) -> std::io:=
:Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.context_flags.push(flag.to_string());
+        }
+        Ok(())
+    }
+
+    fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result=
<()> {
+        Ok(())
+    }
+
+    fn begin_parameters(&mut self, _w: &mut dyn Write, _count: u32) -> std=
::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()=
> {
+        Ok(())
+    }
+
+    fn begin_errors(&mut self, _w: &mut dyn Write, _count: u32) -> std::io=
::Result<()> {
+        Ok(())
+    }
+
+    fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn examples(&mut self, _w: &mut dyn Write, examples: &str) -> std::io:=
:Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.examples =3D Some(examples.to_string());
+        }
+        Ok(())
+    }
+
+    fn notes(&mut self, _w: &mut dyn Write, notes: &str) -> std::io::Resul=
t<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.notes =3D Some(notes.to_string());
+        }
+        Ok(())
+    }
+
+    fn since_version(&mut self, _w: &mut dyn Write, version: &str) -> std:=
:io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.since_version =3D Some(version.to_string());
+        }
+        Ok(())
+    }
+
+    fn sysfs_subsystem(&mut self, _w: &mut dyn Write, subsystem: &str) -> =
std::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.subsystem =3D Some(subsystem.to_string());
+        }
+        Ok(())
+    }
+
+    fn sysfs_path(&mut self, _w: &mut dyn Write, path: &str) -> std::io::R=
esult<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.sysfs_path =3D Some(path.to_string());
+        }
+        Ok(())
+    }
+
+    fn sysfs_permissions(&mut self, _w: &mut dyn Write, perms: &str) -> st=
d::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.permissions =3D Some(perms.to_string());
+        }
+        Ok(())
+    }
+
+    // Networking-specific methods
+    fn socket_state(&mut self, _w: &mut dyn Write, state: &SocketStateSpec=
) -> std::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.socket_state =3D Some(state.clone());
+        }
+        Ok(())
+    }
+
+    fn begin_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io:=
:Result<()> {
+        Ok(())
+    }
+
+    fn protocol_behavior(
+        &mut self,
+        _w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.protocol_behaviors.push(behavior.clone());
+        }
+        Ok(())
+    }
+
+    fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::R=
esult<()> {
+        Ok(())
+    }
+
+    fn begin_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Resu=
lt<()> {
+        Ok(())
+    }
+
+    fn addr_family(&mut self, _w: &mut dyn Write, family: &AddrFamilySpec)=
 -> std::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.addr_families.push(family.clone());
+        }
+        Ok(())
+    }
+
+    fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result=
<()> {
+        Ok(())
+    }
+
+    fn buffer_spec(&mut self, _w: &mut dyn Write, spec: &BufferSpec) -> st=
d::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.buffer_spec =3D Some(spec.clone());
+        }
+        Ok(())
+    }
+
+    fn async_spec(&mut self, _w: &mut dyn Write, spec: &AsyncSpec) -> std:=
:io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.async_spec =3D Some(spec.clone());
+        }
+        Ok(())
+    }
+
+    fn net_data_transfer(&mut self, _w: &mut dyn Write, desc: &str) -> std=
::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.net_data_transfer =3D Some(desc.to_string());
+        }
+        Ok(())
+    }
+
+    fn begin_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Resul=
t<()> {
+        Ok(())
+    }
+
+    fn capability(&mut self, _w: &mut dyn Write, cap: &CapabilitySpec) -> =
std::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.capabilities.push(cap.clone());
+        }
+        Ok(())
+    }
+
+    fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    // Stub implementations for new methods
+    fn parameter(&mut self, _w: &mut dyn Write, param: &ParamSpec) -> std:=
:io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.parameters.push(param.clone());
+        }
+        Ok(())
+    }
+
+    fn return_spec(&mut self, _w: &mut dyn Write, ret: &ReturnSpec) -> std=
::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.return_spec =3D Some(ret.clone());
+        }
+        Ok(())
+    }
+
+    fn error(&mut self, _w: &mut dyn Write, error: &ErrorSpec) -> std::io:=
:Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.errors.push(error.clone());
+        }
+        Ok(())
+    }
+
+    fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::i=
o::Result<()> {
+        Ok(())
+    }
+
+    fn signal(&mut self, _w: &mut dyn Write, signal: &SignalSpec) -> std::=
io::Result<()> {
+        if let Some(api_details) =3D &mut self.data.api_details {
+            api_details.signals.push(signal.clone());
+        }
+        Ok(())
+    }
+
+    fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> s=
td::io::Result<()> {
+        Ok(())
+    }
+
+    fn signal_mask(&mut self, _w: &mut dyn Write, mask: &SignalMaskSpec) -=
> std::io::Result<()> {
+        if let Some(api_details) =3D &mut self.data.api_details {
+            api_details.signal_masks.push(mask.clone());
+        }
+        Ok(())
+    }
+
+    fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    fn begin_side_effects(&mut self, _w: &mut dyn Write, _count: u32) -> s=
td::io::Result<()> {
+        Ok(())
+    }
+
+    fn side_effect(&mut self, _w: &mut dyn Write, effect: &SideEffectSpec)=
 -> std::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.side_effects.push(effect.clone());
+        }
+        Ok(())
+    }
+
+    fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    fn begin_state_transitions(&mut self, _w: &mut dyn Write, _count: u32)=
 -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn state_transition(
+        &mut self,
+        _w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.state_transitions.push(trans.clone());
+        }
+        Ok(())
+    }
+
+    fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Re=
sult<()> {
+        Ok(())
+    }
+
+    fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> st=
d::io::Result<()> {
+        Ok(())
+    }
+
+    fn constraint(
+        &mut self,
+        _w: &mut dyn Write,
+        constraint: &ConstraintSpec,
+    ) -> std::io::Result<()> {
+        if let Some(api_details) =3D &mut self.data.api_details {
+            api_details.constraints.push(constraint.clone());
+        }
+        Ok(())
+    }
+
+    fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<(=
)> {
+        Ok(())
+    }
+
+    fn begin_locks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io:=
:Result<()> {
+        Ok(())
+    }
+
+    fn lock(&mut self, _w: &mut dyn Write, lock: &LockSpec) -> std::io::Re=
sult<()> {
+        if let Some(details) =3D &mut self.data.api_details {
+            details.locks.push(lock.clone());
+        }
+        Ok(())
+    }
+
+    fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_struct_specs(&mut self, _w: &mut dyn Write, _count: u32) -> s=
td::io::Result<()> {
+        Ok(())
+    }
+
+    fn struct_spec(&mut self, _w: &mut dyn Write, spec: &StructSpec) -> st=
d::io::Result<()> {
+        if let Some(ref mut details) =3D self.data.api_details {
+            details.struct_specs.push(spec.clone());
+        }
+        Ok(())
+    }
+
+    fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/formatter/mod.rs b/tools/kapi/src/formatter/mod=
.rs
new file mode 100644
index 0000000000000..3de8bf23bc29a
--- /dev/null
+++ b/tools/kapi/src/formatter/mod.rs
@@ -0,0 +1,140 @@
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,=
 ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas=
kSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec, StructSpec,
+};
+use std::io::Write;
+
+mod json;
+mod plain;
+mod rst;
+
+pub use json::JsonFormatter;
+pub use plain::PlainFormatter;
+pub use rst::RstFormatter;
+
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum OutputFormat {
+    Plain,
+    Json,
+    Rst,
+}
+
+impl std::str::FromStr for OutputFormat {
+    type Err =3D String;
+
+    fn from_str(s: &str) -> Result<Self, Self::Err> {
+        match s.to_lowercase().as_str() {
+            "plain" =3D> Ok(OutputFormat::Plain),
+            "json" =3D> Ok(OutputFormat::Json),
+            "rst" =3D> Ok(OutputFormat::Rst),
+            _ =3D> Err(format!("Unknown output format: {}", s)),
+        }
+    }
+}
+
+pub trait OutputFormatter {
+    fn begin_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+    fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::i=
o::Result<()>;
+    fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) =
-> std::io::Result<()>;
+    fn end_api_list(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io:=
:Result<()>;
+
+    fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std:=
:io::Result<()>;
+    fn end_api_details(&mut self, w: &mut dyn Write) -> std::io::Result<()=
>;
+    fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::R=
esult<()>;
+    fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::=
io::Result<()>;
+
+    fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Resul=
t<()>;
+    fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::=
Result<()>;
+    fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<=
()>;
+
+    fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::=
io::Result<()>;
+    fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::=
io::Result<()>;
+    fn end_parameters(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std:=
:io::Result<()>;
+
+    fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::=
Result<()>;
+    fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::=
Result<()>;
+    fn end_errors(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::=
Result<()>;
+    fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result=
<()>;
+    fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::=
io::Result<()>;
+
+    // Sysfs-specific methods
+    fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> s=
td::io::Result<()>;
+    fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Re=
sult<()>;
+    fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std=
::io::Result<()>;
+
+    // Networking-specific methods
+    fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec)=
 -> std::io::Result<()>;
+
+    fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::=
Result<()>;
+    fn protocol_behavior(
+        &mut self,
+        w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()>;
+    fn end_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Re=
sult<()>;
+
+    fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Resul=
t<()>;
+    fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) =
-> std::io::Result<()>;
+    fn end_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<=
()>;
+
+    fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std=
::io::Result<()>;
+    fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::=
io::Result<()>;
+    fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std:=
:io::Result<()>;
+
+    fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result=
<()>;
+    fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> s=
td::io::Result<()>;
+    fn end_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<(=
)>;
+
+    // Signal-related methods
+    fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io:=
:Result<()>;
+    fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::i=
o::Result<()>;
+    fn end_signals(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std=
::io::Result<()>;
+    fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) ->=
 std::io::Result<()>;
+    fn end_signal_masks(&mut self, w: &mut dyn Write) -> std::io::Result<(=
)>;
+
+    // Side effects and state transitions
+    fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std=
::io::Result<()>;
+    fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) =
-> std::io::Result<()>;
+    fn end_side_effects(&mut self, w: &mut dyn Write) -> std::io::Result<(=
)>;
+
+    fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -=
> std::io::Result<()>;
+    fn state_transition(
+        &mut self,
+        w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()>;
+    fn end_state_transitions(&mut self, w: &mut dyn Write) -> std::io::Res=
ult<()>;
+
+    // Constraints and locks
+    fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std:=
:io::Result<()>;
+    fn constraint(&mut self, w: &mut dyn Write, constraint: &ConstraintSpe=
c)
+    -> std::io::Result<()>;
+    fn end_constraints(&mut self, w: &mut dyn Write) -> std::io::Result<()=
>;
+
+    fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::R=
esult<()>;
+    fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Res=
ult<()>;
+    fn end_locks(&mut self, w: &mut dyn Write) -> std::io::Result<()>;
+
+    fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std=
::io::Result<()>;
+    fn struct_spec(&mut self, w: &mut dyn Write, spec: &StructSpec) -> std=
::io::Result<()>;
+    fn end_struct_specs(&mut self, w: &mut dyn Write) -> std::io::Result<(=
)>;
+}
+
+pub fn create_formatter(format: OutputFormat) -> Box<dyn OutputFormatter> {
+    match format {
+        OutputFormat::Plain =3D> Box::new(PlainFormatter::new()),
+        OutputFormat::Json =3D> Box::new(JsonFormatter::new()),
+        OutputFormat::Rst =3D> Box::new(RstFormatter::new()),
+    }
+}
diff --git a/tools/kapi/src/formatter/plain.rs b/tools/kapi/src/formatter/p=
lain.rs
new file mode 100644
index 0000000000000..569af9fd7b09b
--- /dev/null
+++ b/tools/kapi/src/formatter/plain.rs
@@ -0,0 +1,549 @@
+use super::OutputFormatter;
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,=
 ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas=
kSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec,
+};
+use std::io::Write;
+
+pub struct PlainFormatter;
+
+impl PlainFormatter {
+    pub fn new() -> Self {
+        PlainFormatter
+    }
+}
+
+impl OutputFormatter for PlainFormatter {
+    fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()=
> {
+        Ok(())
+    }
+
+    fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::i=
o::Result<()> {
+        writeln!(w, "\n{title}:")?;
+        writeln!(w, "{}", "-".repeat(title.len() + 1))
+    }
+
+    fn api_item(&mut self, w: &mut dyn Write, name: &str, _api_type: &str)=
 -> std::io::Result<()> {
+        writeln!(w, "  {name}")
+    }
+
+    fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io:=
:Result<()> {
+        writeln!(w, "\nTotal specifications found: {count}")
+    }
+
+    fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std:=
:io::Result<()> {
+        writeln!(w, "\nDetailed information for {name}:")?;
+        writeln!(w, "{}=3D", "=3D".repeat(25 + name.len()))
+    }
+
+    fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<(=
)> {
+        Ok(())
+    }
+
+    fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::R=
esult<()> {
+        writeln!(w, "Description: {desc}")
+    }
+
+    fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::=
io::Result<()> {
+        writeln!(w, "\nDetailed Description:")?;
+        writeln!(w, "{desc}")
+    }
+
+    fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Resul=
t<()> {
+        writeln!(w, "\nExecution Context:")
+    }
+
+    fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::=
Result<()> {
+        writeln!(w, "  - {flag}")
+    }
+
+    fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result=
<()> {
+        Ok(())
+    }
+
+    fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::=
io::Result<()> {
+        writeln!(w, "\nParameters ({count}):")
+    }
+
+    fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::=
io::Result<()> {
+        writeln!(
+            w,
+            "  [{}] {} ({})",
+            param.index, param.name, param.type_name
+        )?;
+        if !param.description.is_empty() {
+            writeln!(w, "      {}", param.description)?;
+        }
+
+        // Display flags
+        let mut flags =3D Vec::new();
+        if param.flags & 0x01 !=3D 0 {
+            flags.push("IN");
+        }
+        if param.flags & 0x02 !=3D 0 {
+            flags.push("OUT");
+        }
+        if param.flags & 0x04 !=3D 0 {
+            flags.push("INOUT");
+        }
+        if param.flags & 0x08 !=3D 0 {
+            flags.push("USER");
+        }
+        if param.flags & 0x10 !=3D 0 {
+            flags.push("OPTIONAL");
+        }
+        if !flags.is_empty() {
+            writeln!(w, "      Flags: {}", flags.join(" | "))?;
+        }
+
+        // Display constraints
+        if let Some(constraint) =3D &param.constraint {
+            writeln!(w, "      Constraint: {constraint}")?;
+        }
+        if let (Some(min), Some(max)) =3D (param.min_value, param.max_valu=
e) {
+            writeln!(w, "      Range: {min} to {max}")?;
+        }
+        if let Some(mask) =3D param.valid_mask {
+            writeln!(w, "      Valid mask: 0x{mask:x}")?;
+        }
+        Ok(())
+    }
+
+    fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()=
> {
+        Ok(())
+    }
+
+    fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std:=
:io::Result<()> {
+        writeln!(w, "\nReturn Value:")?;
+        writeln!(w, "  Type: {}", ret.type_name)?;
+        writeln!(w, "  {}", ret.description)?;
+        if let Some(val) =3D ret.success_value {
+            writeln!(w, "  Success value: {val}")?;
+        }
+        if let (Some(min), Some(max)) =3D (ret.success_min, ret.success_ma=
x) {
+            writeln!(w, "  Success range: {min} to {max}")?;
+        }
+        Ok(())
+    }
+
+    fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::=
Result<()> {
+        writeln!(w, "\nPossible Errors ({count}):")
+    }
+
+    fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::=
Result<()> {
+        writeln!(w, "  {} ({})", error.name, error.error_code)?;
+        if !error.condition.is_empty() {
+            writeln!(w, "      Condition: {}", error.condition)?;
+        }
+        if !error.description.is_empty() {
+            writeln!(w, "      {}", error.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::=
Result<()> {
+        writeln!(w, "\nExamples:")?;
+        writeln!(w, "{examples}")
+    }
+
+    fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result=
<()> {
+        writeln!(w, "\nNotes:")?;
+        writeln!(w, "{notes}")
+    }
+
+    fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::=
io::Result<()> {
+        writeln!(w, "\nAvailable since: {version}")
+    }
+
+    fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> s=
td::io::Result<()> {
+        writeln!(w, "Subsystem: {subsystem}")
+    }
+
+    fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Re=
sult<()> {
+        writeln!(w, "Sysfs Path: {path}")
+    }
+
+    fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std=
::io::Result<()> {
+        writeln!(w, "Permissions: {perms}")
+    }
+
+    // Networking-specific methods
+    fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec)=
 -> std::io::Result<()> {
+        writeln!(w, "\nSocket State Requirements:")?;
+        if !state.required_states.is_empty() {
+            writeln!(w, "  Required states: {:?}", state.required_states)?;
+        }
+        if !state.forbidden_states.is_empty() {
+            writeln!(w, "  Forbidden states: {:?}", state.forbidden_states=
)?;
+        }
+        if let Some(result) =3D &state.resulting_state {
+            writeln!(w, "  Resulting state: {result}")?;
+        }
+        if let Some(cond) =3D &state.condition {
+            writeln!(w, "  Condition: {cond}")?;
+        }
+        if let Some(protos) =3D &state.applicable_protocols {
+            writeln!(w, "  Applicable protocols: {protos}")?;
+        }
+        Ok(())
+    }
+
+    fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::=
Result<()> {
+        writeln!(w, "\nProtocol-Specific Behaviors:")
+    }
+
+    fn protocol_behavior(
+        &mut self,
+        w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "  {} - {}",
+            behavior.applicable_protocols, behavior.behavior
+        )?;
+        if let Some(flags) =3D &behavior.protocol_flags {
+            writeln!(w, "    Flags: {flags}")?;
+        }
+        Ok(())
+    }
+
+    fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::R=
esult<()> {
+        Ok(())
+    }
+
+    fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Resul=
t<()> {
+        writeln!(w, "\nSupported Address Families:")
+    }
+
+    fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) =
-> std::io::Result<()> {
+        writeln!(w, "  {} ({}):", family.family_name, family.family)?;
+        writeln!(w, "    Struct size: {} bytes", family.addr_struct_size)?;
+        writeln!(
+            w,
+            "    Address length: {}-{} bytes",
+            family.min_addr_len, family.max_addr_len
+        )?;
+        if let Some(format) =3D &family.addr_format {
+            writeln!(w, "    Format: {format}")?;
+        }
+        writeln!(
+            w,
+            "    Features: wildcard=3D{}, multicast=3D{}, broadcast=3D{}",
+            family.supports_wildcard, family.supports_multicast, family.su=
pports_broadcast
+        )?;
+        if let Some(special) =3D &family.special_addresses {
+            writeln!(w, "    Special addresses: {special}")?;
+        }
+        if family.port_range_max > 0 {
+            writeln!(
+                w,
+                "    Port range: {}-{}",
+                family.port_range_min, family.port_range_max
+            )?;
+        }
+        Ok(())
+    }
+
+    fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result=
<()> {
+        Ok(())
+    }
+
+    fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std=
::io::Result<()> {
+        writeln!(w, "\nBuffer Specification:")?;
+        if let Some(behaviors) =3D &spec.buffer_behaviors {
+            writeln!(w, "  Behaviors: {behaviors}")?;
+        }
+        if let Some(min) =3D spec.min_buffer_size {
+            writeln!(w, "  Min size: {min} bytes")?;
+        }
+        if let Some(max) =3D spec.max_buffer_size {
+            writeln!(w, "  Max size: {max} bytes")?;
+        }
+        if let Some(optimal) =3D spec.optimal_buffer_size {
+            writeln!(w, "  Optimal size: {optimal} bytes")?;
+        }
+        Ok(())
+    }
+
+    fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::=
io::Result<()> {
+        writeln!(w, "\nAsynchronous Operation:")?;
+        if let Some(modes) =3D &spec.supported_modes {
+            writeln!(w, "  Supported modes: {modes}")?;
+        }
+        if let Some(errno) =3D spec.nonblock_errno {
+            writeln!(w, "  Non-blocking errno: {errno}")?;
+        }
+        Ok(())
+    }
+
+    fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std:=
:io::Result<()> {
+        writeln!(w, "\nNetwork Data Transfer: {desc}")
+    }
+
+    fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result=
<()> {
+        writeln!(w, "\nRequired Capabilities:")
+    }
+
+    fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> s=
td::io::Result<()> {
+        writeln!(w, "  {} ({}) - {}", cap.name, cap.capability, cap.action=
)?;
+        if !cap.allows.is_empty() {
+            writeln!(w, "    Allows: {}", cap.allows)?;
+        }
+        if !cap.without_cap.is_empty() {
+            writeln!(w, "    Without capability: {}", cap.without_cap)?;
+        }
+        if let Some(cond) =3D &cap.check_condition {
+            writeln!(w, "    Condition: {cond}")?;
+        }
+        Ok(())
+    }
+
+    fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    // Signal-related methods
+    fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io:=
:Result<()> {
+        writeln!(w, "\nSignal Specifications ({count}):")
+    }
+
+    fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::i=
o::Result<()> {
+        write!(w, "  {} ({})", signal.signal_name, signal.signal_num)?;
+
+        // Display direction
+        let direction =3D match signal.direction {
+            0 =3D> "SEND",
+            1 =3D> "RECEIVE",
+            2 =3D> "HANDLE",
+            3 =3D> "IGNORE",
+            _ =3D> "UNKNOWN",
+        };
+        write!(w, " - {direction}")?;
+
+        // Display action
+        let action =3D match signal.action {
+            0 =3D> "DEFAULT",
+            1 =3D> "TERMINATE",
+            2 =3D> "COREDUMP",
+            3 =3D> "STOP",
+            4 =3D> "CONTINUE",
+            5 =3D> "IGNORE",
+            6 =3D> "CUSTOM",
+            7 =3D> "DISCARD",
+            _ =3D> "UNKNOWN",
+        };
+        writeln!(w, " - {action}")?;
+
+        if let Some(target) =3D &signal.target {
+            writeln!(w, "      Target: {target}")?;
+        }
+        if let Some(condition) =3D &signal.condition {
+            writeln!(w, "      Condition: {condition}")?;
+        }
+        if let Some(desc) =3D &signal.description {
+            writeln!(w, "      {desc}")?;
+        }
+
+        // Display timing
+        let timing =3D match signal.timing {
+            0 =3D> "BEFORE",
+            1 =3D> "DURING",
+            2 =3D> "AFTER",
+            3 =3D> "EXIT",
+            _ =3D> "UNKNOWN",
+        };
+        writeln!(w, "      Timing: {timing}")?;
+        writeln!(w, "      Priority: {}", signal.priority)?;
+
+        if signal.restartable {
+            writeln!(w, "      Restartable: yes")?;
+        }
+        if signal.interruptible {
+            writeln!(w, "      Interruptible: yes")?;
+        }
+        if let Some(error) =3D signal.error_on_signal {
+            writeln!(w, "      Error on signal: {error}")?;
+        }
+        Ok(())
+    }
+
+    fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std=
::io::Result<()> {
+        writeln!(w, "\nSignal Masks ({count}):")
+    }
+
+    fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) ->=
 std::io::Result<()> {
+        writeln!(w, "  {}", mask.name)?;
+        if !mask.description.is_empty() {
+            writeln!(w, "      {}", mask.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    // Side effects and state transitions
+    fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std=
::io::Result<()> {
+        writeln!(w, "\nSide Effects ({count}):")
+    }
+
+    fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) =
-> std::io::Result<()> {
+        writeln!(w, "  {} - {}", effect.target, effect.description)?;
+        if let Some(condition) =3D &effect.condition {
+            writeln!(w, "      Condition: {condition}")?;
+        }
+        if effect.reversible {
+            writeln!(w, "      Reversible: yes")?;
+        }
+        Ok(())
+    }
+
+    fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -=
> std::io::Result<()> {
+        writeln!(w, "\nState Transitions ({count}):")
+    }
+
+    fn state_transition(
+        &mut self,
+        w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "  {} : {} -> {}",
+            trans.object, trans.from_state, trans.to_state
+        )?;
+        if let Some(condition) =3D &trans.condition {
+            writeln!(w, "      Condition: {condition}")?;
+        }
+        if !trans.description.is_empty() {
+            writeln!(w, "      {}", trans.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Re=
sult<()> {
+        Ok(())
+    }
+
+    // Constraints and locks
+    fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std:=
:io::Result<()> {
+        writeln!(w, "\nAdditional Constraints ({count}):")
+    }
+
+    fn constraint(
+        &mut self,
+        w: &mut dyn Write,
+        constraint: &ConstraintSpec,
+    ) -> std::io::Result<()> {
+        writeln!(w, "  {}", constraint.name)?;
+        if !constraint.description.is_empty() {
+            writeln!(w, "      {}", constraint.description)?;
+        }
+        if let Some(expr) =3D &constraint.expression {
+            writeln!(w, "      Expression: {expr}")?;
+        }
+        Ok(())
+    }
+
+    fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<(=
)> {
+        Ok(())
+    }
+
+    fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::R=
esult<()> {
+        writeln!(w, "\nLocking Requirements ({count}):")
+    }
+
+    fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Res=
ult<()> {
+        write!(w, "  {}", lock.lock_name)?;
+
+        // Display lock type
+        let lock_type =3D match lock.lock_type {
+            0 =3D> "NONE",
+            1 =3D> "MUTEX",
+            2 =3D> "SPINLOCK",
+            3 =3D> "RWLOCK",
+            4 =3D> "SEQLOCK",
+            5 =3D> "RCU",
+            6 =3D> "SEMAPHORE",
+            7 =3D> "CUSTOM",
+            _ =3D> "UNKNOWN",
+        };
+        writeln!(w, " ({lock_type})")?;
+
+        let scope_str =3D match lock.scope {
+            0 =3D> "acquired and released",
+            1 =3D> "acquired (not released)",
+            2 =3D> "released (held on entry)",
+            3 =3D> "held by caller",
+            _ =3D> "unknown",
+        };
+        writeln!(w, "      Scope: {scope_str}")?;
+
+        if !lock.description.is_empty() {
+            writeln!(w, "      {}", lock.description)?;
+        }
+        Ok(())
+    }
+
+    fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std=
::io::Result<()> {
+        writeln!(w, "\nStructure Specifications ({count}):")
+    }
+
+    fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::=
StructSpec) -> std::io::Result<()> {
+        writeln!(w, "  {} (size=3D{}, align=3D{}):", spec.name, spec.size,=
 spec.alignment)?;
+        if !spec.description.is_empty() {
+            writeln!(w, "      {}", spec.description)?;
+        }
+
+        if !spec.fields.is_empty() {
+            writeln!(w, "      Fields ({}):", spec.field_count)?;
+            for field in &spec.fields {
+                write!(w, "        - {} ({}):", field.name, field.type_nam=
e)?;
+                if !field.description.is_empty() {
+                    write!(w, " {}", field.description)?;
+                }
+                writeln!(w)?;
+
+                // Show constraints if present
+                if field.min_value !=3D 0 || field.max_value !=3D 0 {
+                    writeln!(w, "          Range: [{}, {}]", field.min_val=
ue, field.max_value)?;
+                }
+                if field.valid_mask !=3D 0 {
+                    writeln!(w, "          Mask: {:#x}", field.valid_mask)=
?;
+                }
+            }
+        }
+        Ok(())
+    }
+
+    fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/formatter/rst.rs b/tools/kapi/src/formatter/rst=
.rs
new file mode 100644
index 0000000000000..51d0be911480b
--- /dev/null
+++ b/tools/kapi/src/formatter/rst.rs
@@ -0,0 +1,621 @@
+use super::OutputFormatter;
+use crate::extractor::{
+    AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,=
 ErrorSpec, LockSpec,
+    ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas=
kSpec, SignalSpec,
+    SocketStateSpec, StateTransitionSpec,
+};
+use std::io::Write;
+
+pub struct RstFormatter {
+    current_section_level: usize,
+}
+
+impl RstFormatter {
+    pub fn new() -> Self {
+        RstFormatter {
+            current_section_level: 0,
+        }
+    }
+
+    fn section_char(level: usize) -> char {
+        match level {
+            0 =3D> '=3D',
+            1 =3D> '-',
+            2 =3D> '~',
+            3 =3D> '^',
+            _ =3D> '"',
+        }
+    }
+}
+
+impl OutputFormatter for RstFormatter {
+    fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()=
> {
+        Ok(())
+    }
+
+    fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::i=
o::Result<()> {
+        writeln!(w, "\n{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(0).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) =
-> std::io::Result<()> {
+        writeln!(w, "* **{name}** (*{api_type}*)")
+    }
+
+    fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io:=
:Result<()> {
+        writeln!(w, "\n**Total specifications found:** {count}")
+    }
+
+    fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std:=
:io::Result<()> {
+        self.current_section_level =3D 0;
+        writeln!(w, "\n{name}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(0).to_string().repeat(name.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<(=
)> {
+        Ok(())
+    }
+
+    fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::R=
esult<()> {
+        writeln!(w, "**{desc}**")?;
+        writeln!(w)
+    }
+
+    fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::=
io::Result<()> {
+        writeln!(w, "{desc}")?;
+        writeln!(w)
+    }
+
+    fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Resul=
t<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Execution Context";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::=
Result<()> {
+        writeln!(w, "* {flag}")
+    }
+
+    fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<=
()> {
+        writeln!(w)
+    }
+
+    fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::=
io::Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D format!("Parameters ({count})");
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()=
> {
+        Ok(())
+    }
+
+    fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::=
Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D format!("Possible Errors ({count})");
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::=
Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Examples";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+        writeln!(w, ".. code-block:: c")?;
+        writeln!(w)?;
+        for line in examples.lines() {
+            writeln!(w, "   {line}")?;
+        }
+        writeln!(w)
+    }
+
+    fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result=
<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Notes";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+        writeln!(w, "{notes}")?;
+        writeln!(w)
+    }
+
+    fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::=
io::Result<()> {
+        writeln!(w, ":Available since: {version}")?;
+        writeln!(w)
+    }
+
+    fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> s=
td::io::Result<()> {
+        writeln!(w, ":Subsystem: {subsystem}")?;
+        writeln!(w)
+    }
+
+    fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Re=
sult<()> {
+        writeln!(w, ":Sysfs Path: {path}")?;
+        writeln!(w)
+    }
+
+    fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std=
::io::Result<()> {
+        writeln!(w, ":Permissions: {perms}")?;
+        writeln!(w)
+    }
+
+    // Networking-specific methods
+    fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec)=
 -> std::io::Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Socket State Requirements";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+
+        if !state.required_states.is_empty() {
+            writeln!(
+                w,
+                "**Required states:** {}",
+                state.required_states.join(", ")
+            )?;
+        }
+        if !state.forbidden_states.is_empty() {
+            writeln!(
+                w,
+                "**Forbidden states:** {}",
+                state.forbidden_states.join(", ")
+            )?;
+        }
+        if let Some(result) =3D &state.resulting_state {
+            writeln!(w, "**Resulting state:** {result}")?;
+        }
+        if let Some(cond) =3D &state.condition {
+            writeln!(w, "**Condition:** {cond}")?;
+        }
+        if let Some(protos) =3D &state.applicable_protocols {
+            writeln!(w, "**Applicable protocols:** {protos}")?;
+        }
+        writeln!(w)
+    }
+
+    fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::=
Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Protocol-Specific Behaviors";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn protocol_behavior(
+        &mut self,
+        w: &mut dyn Write,
+        behavior: &ProtocolBehaviorSpec,
+    ) -> std::io::Result<()> {
+        writeln!(w, "**{}**", behavior.applicable_protocols)?;
+        writeln!(w)?;
+        writeln!(w, "{}", behavior.behavior)?;
+        if let Some(flags) =3D &behavior.protocol_flags {
+            writeln!(w)?;
+            writeln!(w, "*Flags:* {flags}")?;
+        }
+        writeln!(w)
+    }
+
+    fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::R=
esult<()> {
+        Ok(())
+    }
+
+    fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Resul=
t<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Supported Address Families";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) =
-> std::io::Result<()> {
+        writeln!(w, "**{} ({})**", family.family_name, family.family)?;
+        writeln!(w)?;
+        writeln!(w, "* **Struct size:** {} bytes", family.addr_struct_size=
)?;
+        writeln!(
+            w,
+            "* **Address length:** {}-{} bytes",
+            family.min_addr_len, family.max_addr_len
+        )?;
+        if let Some(format) =3D &family.addr_format {
+            writeln!(w, "* **Format:** ``{format}``")?;
+        }
+        writeln!(
+            w,
+            "* **Features:** wildcard=3D{}, multicast=3D{}, broadcast=3D{}=
",
+            family.supports_wildcard, family.supports_multicast, family.su=
pports_broadcast
+        )?;
+        if let Some(special) =3D &family.special_addresses {
+            writeln!(w, "* **Special addresses:** {special}")?;
+        }
+        if family.port_range_max > 0 {
+            writeln!(
+                w,
+                "* **Port range:** {}-{}",
+                family.port_range_min, family.port_range_max
+            )?;
+        }
+        writeln!(w)
+    }
+
+    fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result=
<()> {
+        Ok(())
+    }
+
+    fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std=
::io::Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Buffer Specification";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+
+        if let Some(behaviors) =3D &spec.buffer_behaviors {
+            writeln!(w, "**Behaviors:** {behaviors}")?;
+        }
+        if let Some(min) =3D spec.min_buffer_size {
+            writeln!(w, "**Min size:** {min} bytes")?;
+        }
+        if let Some(max) =3D spec.max_buffer_size {
+            writeln!(w, "**Max size:** {max} bytes")?;
+        }
+        if let Some(optimal) =3D spec.optimal_buffer_size {
+            writeln!(w, "**Optimal size:** {optimal} bytes")?;
+        }
+        writeln!(w)
+    }
+
+    fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::=
io::Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Asynchronous Operation";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)?;
+
+        if let Some(modes) =3D &spec.supported_modes {
+            writeln!(w, "**Supported modes:** {modes}")?;
+        }
+        if let Some(errno) =3D spec.nonblock_errno {
+            writeln!(w, "**Non-blocking errno:** {errno}")?;
+        }
+        writeln!(w)
+    }
+
+    fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std:=
:io::Result<()> {
+        writeln!(w, "**Network Data Transfer:** {desc}")?;
+        writeln!(w)
+    }
+
+    fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result=
<()> {
+        self.current_section_level =3D 1;
+        let title =3D "Required Capabilities";
+        writeln!(w, "{title}")?;
+        writeln!(
+            w,
+            "{}",
+            Self::section_char(1).to_string().repeat(title.len())
+        )?;
+        writeln!(w)
+    }
+
+    fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> s=
td::io::Result<()> {
+        writeln!(w, "**{} ({})** - {}", cap.name, cap.capability, cap.acti=
on)?;
+        writeln!(w)?;
+        if !cap.allows.is_empty() {
+            writeln!(w, "* **Allows:** {}", cap.allows)?;
+        }
+        if !cap.without_cap.is_empty() {
+            writeln!(w, "* **Without capability:** {}", cap.without_cap)?;
+        }
+        if let Some(cond) =3D &cap.check_condition {
+            writeln!(w, "* **Condition:** {}", cond)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    // Stub implementations for new methods
+    fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::=
io::Result<()> {
+        writeln!(
+            w,
+            "**[{}] {}** (*{}*)",
+            param.index, param.name, param.type_name
+        )?;
+        writeln!(w)?;
+        writeln!(w, "  {}", param.description)?;
+
+        // Display flags
+        let mut flags =3D Vec::new();
+        if param.flags & 0x01 !=3D 0 {
+            flags.push("IN");
+        }
+        if param.flags & 0x02 !=3D 0 {
+            flags.push("OUT");
+        }
+        if param.flags & 0x04 !=3D 0 {
+            flags.push("USER");
+        }
+        if param.flags & 0x08 !=3D 0 {
+            flags.push("OPTIONAL");
+        }
+        if !flags.is_empty() {
+            writeln!(w, "  :Flags: {}", flags.join(", "))?;
+        }
+
+        if let Some(constraint) =3D &param.constraint {
+            writeln!(w, "  :Constraint: {}", constraint)?;
+        }
+
+        if let (Some(min), Some(max)) =3D (param.min_value, param.max_valu=
e) {
+            writeln!(w, "  :Range: {} to {}", min, max)?;
+        }
+
+        writeln!(w)
+    }
+
+    fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std:=
:io::Result<()> {
+        writeln!(w, "\nReturn Value")?;
+        writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(12))?;
+        writeln!(w)?;
+        writeln!(w, ":Type: {}", ret.type_name)?;
+        writeln!(w, ":Description: {}", ret.description)?;
+        if let Some(success) =3D ret.success_value {
+            writeln!(w, ":Success value: {}", success)?;
+        }
+        writeln!(w)
+    }
+
+    fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::=
Result<()> {
+        writeln!(w, "**{}** ({})", error.name, error.error_code)?;
+        writeln!(w)?;
+        writeln!(w, "  :Condition: {}", error.condition)?;
+        if !error.description.is_empty() {
+            writeln!(w, "  :Description: {}", error.description)?;
+        }
+        writeln!(w)
+    }
+
+    fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::i=
o::Result<()> {
+        Ok(())
+    }
+
+    fn signal(&mut self, _w: &mut dyn Write, _signal: &SignalSpec) -> std:=
:io::Result<()> {
+        Ok(())
+    }
+
+    fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> s=
td::io::Result<()> {
+        Ok(())
+    }
+
+    fn signal_mask(&mut self, _w: &mut dyn Write, _mask: &SignalMaskSpec) =
-> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std=
::io::Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D format!("Side Effects ({count})");
+        writeln!(w, "{}\n", title)?;
+        writeln!(
+            w,
+            "{}\n",
+            Self::section_char(1).to_string().repeat(title.len())
+        )
+    }
+
+    fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) =
-> std::io::Result<()> {
+        write!(w, "* **{}**", effect.target)?;
+        if effect.reversible {
+            write!(w, " *(reversible)*")?;
+        }
+        writeln!(w)?;
+        writeln!(w, "  {}", effect.description)?;
+        if let Some(cond) =3D &effect.condition {
+            writeln!(w, "  :Condition: {}", cond)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+
+    fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -=
> std::io::Result<()> {
+        self.current_section_level =3D 1;
+        let title =3D format!("State Transitions ({count})");
+        writeln!(w, "{}\n", title)?;
+        writeln!(
+            w,
+            "{}\n",
+            Self::section_char(1).to_string().repeat(title.len())
+        )
+    }
+
+    fn state_transition(
+        &mut self,
+        w: &mut dyn Write,
+        trans: &StateTransitionSpec,
+    ) -> std::io::Result<()> {
+        writeln!(
+            w,
+            "* **{}**: {} =E2=86=92 {}",
+            trans.object, trans.from_state, trans.to_state
+        )?;
+        writeln!(w, "  {}", trans.description)?;
+        if let Some(cond) =3D &trans.condition {
+            writeln!(w, "  :Condition: {}", cond)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Re=
sult<()> {
+        Ok(())
+    }
+
+    fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> st=
d::io::Result<()> {
+        Ok(())
+    }
+
+    fn constraint(
+        &mut self,
+        _w: &mut dyn Write,
+        _constraint: &ConstraintSpec,
+    ) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<(=
)> {
+        Ok(())
+    }
+
+    fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::R=
esult<()> {
+        self.current_section_level =3D 1;
+        let title =3D format!("Locks ({count})");
+        writeln!(w, "{}\n", title)?;
+        writeln!(
+            w,
+            "{}\n",
+            Self::section_char(1).to_string().repeat(title.len())
+        )
+    }
+
+    fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Res=
ult<()> {
+        write!(w, "* **{}**", lock.lock_name)?;
+        let lock_type_str =3D match lock.lock_type {
+            1 =3D> " *(mutex)*",
+            2 =3D> " *(spinlock)*",
+            3 =3D> " *(rwlock)*",
+            4 =3D> " *(semaphore)*",
+            5 =3D> " *(RCU)*",
+            _ =3D> "",
+        };
+        writeln!(w, "{}", lock_type_str)?;
+        if !lock.description.is_empty() {
+            writeln!(w, "  {}", lock.description)?;
+        }
+        writeln!(w)
+    }
+
+    fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> {
+        Ok(())
+    }
+
+    fn begin_struct_specs(&mut self, w: &mut dyn Write, _count: u32) -> st=
d::io::Result<()> {
+        writeln!(w)?;
+        writeln!(w, "Structure Specifications")?;
+        writeln!(w, "~~~~~~~~~~~~~~~~~~~~~~~")?;
+        writeln!(w)
+    }
+
+    fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::=
StructSpec) -> std::io::Result<()> {
+        writeln!(w, "**{}**", spec.name)?;
+        writeln!(w)?;
+
+        if !spec.description.is_empty() {
+            writeln!(w, "  {}", spec.description)?;
+            writeln!(w)?;
+        }
+
+        writeln!(w, "  :Size: {} bytes", spec.size)?;
+        writeln!(w, "  :Alignment: {} bytes", spec.alignment)?;
+        writeln!(w, "  :Fields: {}", spec.field_count)?;
+        writeln!(w)?;
+
+        if !spec.fields.is_empty() {
+            for field in &spec.fields {
+                writeln!(w, "  * **{}** ({})", field.name, field.type_name=
)?;
+                if !field.description.is_empty() {
+                    writeln!(w, "    {}", field.description)?;
+                }
+                if field.min_value !=3D 0 || field.max_value !=3D 0 {
+                    writeln!(w, "    Range: [{}, {}]", field.min_value, fi=
eld.max_value)?;
+                }
+            }
+            writeln!(w)?;
+        }
+
+        Ok(())
+    }
+
+    fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<=
()> {
+        Ok(())
+    }
+}
diff --git a/tools/kapi/src/main.rs b/tools/kapi/src/main.rs
new file mode 100644
index 0000000000000..2d219046f3287
--- /dev/null
+++ b/tools/kapi/src/main.rs
@@ -0,0 +1,116 @@
+//! kapi - Kernel API Specification Tool
+//!
+//! This tool extracts and displays kernel API specifications from multipl=
e sources:
+//! - Kernel source code (KAPI macros)
+//! - Compiled vmlinux binaries (`.kapi_specs` ELF section)
+//! - Running kernel via debugfs
+
+use anyhow::Result;
+use clap::Parser;
+use std::io::{self, Write};
+
+mod extractor;
+mod formatter;
+
+use extractor::{ApiExtractor, DebugfsExtractor, SourceExtractor, VmlinuxEx=
tractor};
+use formatter::{OutputFormat, create_formatter};
+
+#[derive(Parser, Debug)]
+#[command(author, version, about, long_about =3D None)]
+struct Args {
+    /// Path to the vmlinux file
+    #[arg(long, value_name =3D "PATH", group =3D "input")]
+    vmlinux: Option<String>,
+
+    /// Path to kernel source directory or file
+    #[arg(long, value_name =3D "PATH", group =3D "input")]
+    source: Option<String>,
+
+    /// Path to debugfs (defaults to /sys/kernel/debug if not specified)
+    #[arg(long, value_name =3D "PATH", group =3D "input")]
+    debugfs: Option<String>,
+
+    /// Optional: Name of specific API to show details for
+    api_name: Option<String>,
+
+    /// Output format
+    #[arg(long, short =3D 'f', default_value =3D "plain")]
+    format: String,
+}
+
+fn main() -> Result<()> {
+    let args =3D Args::parse();
+
+    let output_format: OutputFormat =3D args
+        .format
+        .parse()
+        .map_err(|e: String| anyhow::anyhow!(e))?;
+
+    let extractor: Box<dyn ApiExtractor> =3D match (args.vmlinux, args.sou=
rce, args.debugfs.clone()) {
+        (Some(vmlinux_path), None, None) =3D> Box::new(VmlinuxExtractor::n=
ew(&vmlinux_path)?),
+        (None, Some(source_path), None) =3D> Box::new(SourceExtractor::new=
(&source_path)?),
+        (None, None, Some(_) | None) =3D> {
+            // If debugfs is specified or no input is provided, use debugfs
+            Box::new(DebugfsExtractor::new(args.debugfs)?)
+        }
+        _ =3D> {
+            anyhow::bail!("Please specify only one of --vmlinux, --source,=
 or --debugfs")
+        }
+    };
+
+    display_apis(extractor.as_ref(), args.api_name, output_format)
+}
+
+fn display_apis(
+    extractor: &dyn ApiExtractor,
+    api_name: Option<String>,
+    output_format: OutputFormat,
+) -> Result<()> {
+    let mut formatter =3D create_formatter(output_format);
+    let mut stdout =3D io::stdout();
+
+    formatter.begin_document(&mut stdout)?;
+
+    if let Some(api_name_req) =3D api_name {
+        // Use the extractor to display API details
+        if let Some(_spec) =3D extractor.extract_by_name(&api_name_req)? {
+            extractor.display_api_details(&api_name_req, &mut *formatter, =
&mut stdout)?;
+        } else if output_format =3D=3D OutputFormat::Plain {
+            writeln!(stdout, "\nAPI '{}' not found.", api_name_req)?;
+            writeln!(stdout, "\nAvailable APIs:")?;
+            for spec in extractor.extract_all()? {
+                writeln!(stdout, "  {} ({})", spec.name, spec.api_type)?;
+            }
+        }
+    } else {
+        // Display list of APIs using the extractor
+        let all_specs =3D extractor.extract_all()?;
+
+        // Helper to display API list for a specific type
+        let mut display_api_type =3D |api_type: &str, title: &str| -> Resu=
lt<()> {
+            let filtered: Vec<_> =3D all_specs.iter()
+                .filter(|s| s.api_type =3D=3D api_type)
+                .collect();
+
+            if !filtered.is_empty() {
+                formatter.begin_api_list(&mut stdout, title)?;
+                for spec in filtered {
+                    formatter.api_item(&mut stdout, &spec.name, &spec.api_=
type)?;
+                }
+                formatter.end_api_list(&mut stdout)?;
+            }
+            Ok(())
+        };
+
+        display_api_type("syscall", "System Calls")?;
+        display_api_type("ioctl", "IOCTLs")?;
+        display_api_type("function", "Functions")?;
+        display_api_type("sysfs", "Sysfs Attributes")?;
+
+        formatter.total_specs(&mut stdout, all_specs.len())?;
+    }
+
+    formatter.end_document(&mut stdout)?;
+
+    Ok(())
+}
--=20
2.51.0

From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE91A2E62B4;
	Thu, 18 Dec 2025 20:42:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090571; cv=none;
 b=sRJY1uR2QIFk+4TCQ+7oi6AuN33hF+zm4g1JJGguyh/bUpirZ18ts7dj+Dk7OoK5nSBWo0sgpt4MagEoqYHnNFNQKhlZNVmXiwmaLWCwCLGaifkneBHaAoyloEXAIa2V5+OxwHJjWsQXpDrtnyu3w64adlpbNwGb95swcv0urFA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090571; c=relaxed/simple;
	bh=201PmRgYE1b+3daKnu5U4am8WX5QK9hEVu07e1v1kmc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=FlNTNwvHiidMPz+z4hEsthK3TjOhiD9uUE3ElU8YrzUZdKTvd/bN40QR6iln5yi/HK8MTCXPb5k53rV3ePBxnm4+GVQr8MKKw1FvWZH1en9QOLbkxfO74AUdGLp6RbLz8rtzNpyXvWaVAIO2pzPBVF2iPgrmOSuzncKk6ZYGkQU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=j5IIw4FP; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="j5IIw4FP"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24998C113D0;
	Thu, 18 Dec 2025 20:42:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090570;
	bh=201PmRgYE1b+3daKnu5U4am8WX5QK9hEVu07e1v1kmc=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=j5IIw4FPtkgCPI1cSTQmj9p+iSdliZmfR/xE2E1Ky0X3S46Uq8fzzwFSrrAd0dvvv
	 JZwydI5K9hjV42sQ2tRohvaKMNS25d3SH5JA8cFxwFxrYGXMgQwxkvz6BqWrg6dgZW
	 Ho1YUGSFgN5U0iDrSxapM7l2awMnaOtDYIsF3mmYPmMN1JrvhjWflW53w9Y5vt4+sb
	 rtqlUlecHoHMDw+AqBBrA3Gkr3fWPP19a/D2USVx++wO5YjscNDcX5tNZytIW4NRoK
	 MA3QKtCQWwtAKpzWIyM+omIXlTh6e8lWrbWPTkVJE5qL6Iboj2JHGBuBqfDaenC10e
	 vaYdebaL1YfGQ==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 05/15] kernel/api: add API specification for io_setup
Date: Thu, 18 Dec 2025 15:42:27 -0500
Message-ID: <20251218204239.4159453-6-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 228 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 216 insertions(+), 12 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 0a23a8c0717ff..36556e7a8e2c0 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1366,18 +1366,222 @@ static long read_events(struct kioctx *ctx, long m=
in_nr, long nr,
 	return ret;
 }
=20
-/* sys_io_setup:
- *	Create an aio_context capable of receiving at least nr_events.
- *	ctxp must not point to an aio_context that already exists, and
- *	must be initialized to 0 prior to the call.  On successful
- *	creation of the aio_context, *ctxp is filled in with the resulting=20
- *	handle.  May fail with -EINVAL if *ctxp is not initialized,
- *	if the specified nr_events exceeds internal limits.  May fail=20
- *	with -EAGAIN if the specified nr_events exceeds the user's limit=20
- *	of available events.  May fail with -ENOMEM if insufficient kernel
- *	resources are available.  May fail with -EFAULT if an invalid
- *	pointer is passed for ctxp.  Will fail with -ENOSYS if not
- *	implemented.
+/**
+ * sys_io_setup - Create an asynchronous I/O context
+ * @nr_events: Minimum number of concurrent AIO operations the context sho=
uld support
+ * @ctxp: Pointer to aio_context_t variable to receive the context handle
+ *
+ * long-desc: Creates an asynchronous I/O context capable of receiving at =
least
+ *   nr_events concurrent operations. The context handle is returned via c=
txp,
+ *   which must be initialized to 0 prior to the call. The returned context
+ *   handle is used with subsequent AIO operations (io_submit, io_getevent=
s,
+ *   io_cancel, io_destroy).
+ *
+ *   The AIO context consists of a memory-mapped ring buffer shared between
+ *   kernel and userspace for efficient completion notification. The kernel
+ *   internally allocates more capacity than requested to account for perc=
pu
+ *   batching (approximately nr_events * 2, but at least num_cpus * 8).
+ *
+ *   The context is bound to the calling process and cannot be shared acro=
ss
+ *   processes. Each process can have multiple AIO contexts, limited only =
by
+ *   the system-wide aio-max-nr sysctl.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: nr_events
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 1, 8388608
+ *   constraint: Must be greater than 0. Internal limit of approximately 8=
M events
+ *     prevents overflow when calculating ring buffer size (0x10000000 / 3=
2 bytes
+ *     per io_event). The kernel may allocate more capacity than requested=
 to
+ *     optimize for percpu batching.
+ *
+ * param: ctxp
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_INOUT | KAPI_PARAM_USER
+ *   size: sizeof(aio_context_t)
+ *   constraint-type: KAPI_CONSTRAINT_USER_PTR
+ *   constraint: Must be a valid userspace pointer to an aio_context_t var=
iable.
+ *     The memory pointed to MUST be initialized to 0 before the call. On =
success,
+ *     receives the context handle (actually the mmap address of the ring =
buffer).
+ *     The context handle is opaque and should not be interpreted by users=
pace
+ *     except to pass to other io_* syscalls.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: 0
+ *   desc: Returns 0 on success. On success, *ctxp contains the new contex=
t handle.
+ *
+ * error: EFAULT, Invalid pointer
+ *   desc: The ctxp pointer is invalid, not accessible, or points to memor=
y that
+ *     cannot be read or written. Returned from get_user() when reading the
+ *     initial value or from put_user() when storing the context handle.
+ *
+ * error: EINVAL, Invalid parameter
+ *   desc: Either *ctxp is not initialized to 0 (indicating an existing co=
ntext or
+ *     uninitialized memory), or nr_events is 0, or nr_events is too large=
 causing
+ *     internal overflow when calculating ring buffer size. The internal l=
imit is
+ *     approximately 0x10000000 / sizeof(struct io_event) events.
+ *
+ * error: EAGAIN, Resource limit exceeded
+ *   desc: The system-wide limit on AIO contexts would be exceeded. The li=
mit is
+ *     controlled by /proc/sys/fs/aio-max-nr (default 65536). Each context=
 counts
+ *     as nr_events toward this limit. Also returned if nr_events exceeds =
the
+ *     current aio-max-nr value. Unlike ENOMEM, this error indicates a pol=
icy
+ *     limit rather than physical resource exhaustion.
+ *
+ * error: ENOMEM, Insufficient memory
+ *   desc: Kernel could not allocate required memory for the AIO context. =
This
+ *     includes the kioctx structure, percpu data, ring buffer pages, or t=
he
+ *     anonymous file backing the ring buffer. Also returned if the kernel=
 could
+ *     not establish the memory mapping for the ring buffer, or if ioctx_t=
able
+ *     expansion failed.
+ *
+ * error: EINTR, Interrupted by signal
+ *   desc: A fatal signal was received while attempting to acquire the mma=
p_lock
+ *     for the ring buffer memory mapping. The operation was aborted befor=
e any
+ *     state was modified. Only fatal signals (SIGKILL) can cause this err=
or;
+ *     normal signals like SIGINT do not interrupt the operation.
+ *
+ * lock: aio_nr_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Global spinlock protecting the system-wide aio_nr counter. Held=
 briefly
+ *     to check and update the system-wide AIO context count.
+ *
+ * lock: mm->ioctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-mm spinlock protecting the ioctx_table. Held while adding t=
he new
+ *     context to the process's AIO context table.
+ *
+ * lock: ctx->ring_lock
+ *   type: KAPI_LOCK_MUTEX
+ *   desc: Per-context mutex protecting ring buffer setup. Held throughout=
 context
+ *     initialization to prevent page migration during setup, then release=
d once
+ *     the context is fully initialized.
+ *
+ * lock: mmap_lock
+ *   type: KAPI_LOCK_RWLOCK
+ *   desc: Process memory map write lock. Acquired via mmap_write_lock_kil=
lable()
+ *     during ring buffer mmap operation. This is where EINTR can occur.
+ *
+ * signal: SIGKILL
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RETURN
+ *   condition: Fatal signal pending during mmap_write_lock_killable
+ *   desc: Fatal signals can interrupt the context creation during the mma=
p phase.
+ *     The mmap_write_lock_killable() function checks for fatal signals an=
d returns
+ *     -EINTR if one is pending. Non-fatal signals do not interrupt this s=
yscall.
+ *   error: -EINTR
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   priority: 0
+ *   restartable: no
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: kioctx structure
+ *   desc: Allocates the main AIO context structure from kioctx_cachep sla=
b cache.
+ *     Contains ring buffer metadata, locks, and request tracking.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: percpu kioctx_cpu structures
+ *   desc: Allocates per-CPU structures for request batching via alloc_per=
cpu().
+ *     Used to reduce contention on the global request counter.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: ring buffer pages
+ *   desc: Allocates pages for the completion event ring buffer. The ring =
is backed
+ *     by an anonymous file on the internal "aio" filesystem and memory-ma=
pped into
+ *     the process address space.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_CREATE
+ *   target: anonymous inode and file
+ *   desc: Creates an anonymous inode and file on the internal aio filesys=
tem to
+ *     back the ring buffer mapping. This enables proper page migration su=
pport.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: process virtual memory
+ *   desc: Creates a new memory mapping (VMA) for the ring buffer in the p=
rocess
+ *     address space. The mapping is marked VM_DONTEXPAND and uses aio_rin=
g_vm_ops.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: mm->ioctx_table
+ *   desc: Adds the new context to the process's AIO context table. The ta=
ble is
+ *     dynamically expanded if needed (grows by 4x each time).
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: aio_nr (global counter)
+ *   desc: Increments the system-wide AIO context counter by nr_events. Th=
is counter
+ *     is visible via /proc/sys/fs/aio-nr and counts toward the aio-max-nr=
 limit.
+ *   reversible: yes
+ *
+ * state-trans: process AIO state
+ *   from: no AIO context (or fewer contexts)
+ *   to: has AIO context
+ *   condition: successful io_setup
+ *   desc: Process gains an AIO context that can be used for asynchronous =
I/O
+ *     operations. The context remains until explicitly destroyed via io_d=
estroy
+ *     or process exit.
+ *
+ * state-trans: system AIO resources
+ *   from: aio_nr =3D N
+ *   to: aio_nr =3D N + nr_events
+ *   condition: successful io_setup
+ *   desc: System-wide AIO resource counter increases. The counter tracks =
total
+ *     requested AIO capacity across all processes.
+ *
+ * constraint: System-wide AIO limit (aio-max-nr)
+ *   desc: The /proc/sys/fs/aio-max-nr sysctl (default 65536) limits the t=
otal
+ *     number of AIO events system-wide. Each io_setup call adds nr_events=
 to
+ *     the aio_nr counter. If aio_nr + nr_events would exceed aio_max_nr, =
the
+ *     call fails with EAGAIN. Administrators can increase aio-max-nr if n=
eeded.
+ *   expr: aio_nr + nr_events <=3D aio_max_nr
+ *
+ * constraint: Per-process context limit
+ *   desc: Each process can have multiple AIO contexts, limited only by the
+ *     system-wide aio-max-nr limit and available memory. The ioctx_table =
grows
+ *     dynamically to accommodate new contexts.
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscal=
l to be
+ *     available. If not configured, the syscall returns -ENOSYS. This is =
typically
+ *     enabled by default but may be disabled on embedded systems.
+ *
+ * constraint: Memory for ring buffer
+ *   desc: The kernel must be able to allocate sufficient contiguous pages=
 for the
+ *     ring buffer and establish the memory mapping. Large nr_events value=
s require
+ *     more memory and may fail with ENOMEM on memory-constrained systems.
+ *
+ * examples: aio_context_t ctx =3D 0; io_setup(128, &ctx);  // Create cont=
ext for 128 events
+ *   aio_context_t ctx =3D 0; io_setup(1024, &ctx);  // Create context for=
 1024 events
+ *
+ * notes: The returned context handle is actually the virtual address of t=
he ring
+ *   buffer mapping in the process address space. This allows userspace li=
braries
+ *   to directly access completion events without syscall overhead in some=
 cases.
+ *
+ *   The kernel internally doubles nr_events and ensures a minimum of num_=
cpus * 8
+ *   events for percpu batching efficiency. This means the actual ring cap=
acity may
+ *   be significantly larger than requested.
+ *
+ *   Historical note: A race condition between io_setup and io_destroy was=
 fixed
+ *   in commit 86b62a2cb4fc ("aio: fix io_setup/io_destroy race"). Earlier=
 kernels
+ *   could have the context freed while io_setup was still completing.
+ *
+ *   io_uring (since Linux 5.1) is a more modern alternative that provides=
 better
+ *   performance and more features. Consider using io_uring for new applic=
ations.
+ *
+ *   There is no glibc wrapper for this syscall. Use syscall(SYS_io_setup,=
 ...) or
+ *   the libaio library wrapper (note: libaio has slightly different error=
 semantics,
+ *   returning negative error numbers directly instead of -1 with errno).
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctx=
p)
 {
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6E9F2DA76F;
	Thu, 18 Dec 2025 20:42:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090571; cv=none;
 b=KQGZzd6lS6wXELOmEB07ota3rnUkvLMbXzBxJsbRXRtoF5PM7qs7LwQdLehcUcGoS/I1wKvqTdFkreIV256FkOhkrtZz4glcXS1q8mY3oy2yayrpmD12tpaPLGD/M7jvX3ZHlDCsOlWfn2uq61OKdqKk/3WsniXIKruycCpkpBc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090571; c=relaxed/simple;
	bh=oadfCL6p3szPYR2rpUi6FOTD0UWKkS2PI57OHH5y6lA=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=RANKyyTVymGQ14OeywkXkaXpnXO1VDcVl6hJSn22Ev9S30E1X8gMqEiw3HNHbUJdkdFHa9+/3gOyojVRayhrZb9QkQt2yc4Zgo2STnSqfIxP4WaiqwH39W6Pc6O0hHx/TgRZQ+sX4PiCffXNDsI8N6d0gIdEEmJUUmUfkVOEzaE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=fm6VCal9; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="fm6VCal9"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B6AFC19423;
	Thu, 18 Dec 2025 20:42:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090571;
	bh=oadfCL6p3szPYR2rpUi6FOTD0UWKkS2PI57OHH5y6lA=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=fm6VCal9OVCusmvNBANryV9Ku48KmSgUngHs6E9F9poBj/ENW8+xW3bdhfePPfrnc
	 ub9jivnDRBaNuMVq66Sxj7ZAADm5y2EReU2qsber1EMth8JhXo8b/IU6wp3RYHn0Do
	 5p9OXfqEcQ/JhfwXcWJyp8ZSe/tQ0UxloTleGr1adR60QIwFGI3+l6hwyPwnRc0R4A
	 e2G5m/mdx0T3olPLGGocY1v7fd1c6GAS3s5yBDlg8bj8/rKjD1yRtk8VFAb/Lx9VRs
	 wDVr2rVoY742WUaQYU4VvXfHC+phLNtAqFoFv0ZOb24IWtkahCmThWfuxkvi6BUXDO
	 htch9anzS6V6g==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 06/15] kernel/api: add API specification for io_destroy
Date: Thu, 18 Dec 2025 15:42:28 -0500
Message-ID: <20251218204239.4159453-7-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 184 insertions(+), 5 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 36556e7a8e2c0..ff2a8527e1b85 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1646,11 +1646,190 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_even=
ts, u32 __user *, ctx32p)
 }
 #endif
=20
-/* sys_io_destroy:
- *	Destroy the aio_context specified.  May cancel any outstanding=20
- *	AIOs and block on completion.  Will fail with -ENOSYS if not
- *	implemented.  May fail with -EINVAL if the context pointed to
- *	is invalid.
+/**
+ * sys_io_destroy - Destroy an asynchronous I/O context
+ * @ctx: AIO context handle returned by io_setup
+ *
+ * long-desc: Destroys the asynchronous I/O context identified by ctx. This
+ *   syscall will attempt to cancel all outstanding asynchronous I/O opera=
tions
+ *   against the context and block until all operations have completed. On=
ce
+ *   this syscall returns successfully, the context handle becomes invalid=
 and
+ *   must not be used with any other io_* syscalls.
+ *
+ *   The context's memory-mapped ring buffer is unmapped from the process =
address
+ *   space, and all associated kernel resources are freed. The system-wide=
 AIO
+ *   event counter (aio_nr) is decremented by the original nr_events value=
 that
+ *   was passed to io_setup when creating this context.
+ *
+ *   This syscall blocks until all in-flight I/O operations have completed=
. This
+ *   ensures that userspace buffers passed to io_submit are no longer acce=
ssed
+ *   by the kernel after io_destroy returns. The wait is NOT interruptible=
 by
+ *   signals, so callers cannot cancel this blocking behavior.
+ *
+ *   If two threads call io_destroy on the same context simultaneously, on=
ly the
+ *   first call will succeed; subsequent calls return -EINVAL as the conte=
xt is
+ *   already marked as dead.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: ctx
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid context handle previously returned by io_=
setup.
+ *     The handle is actually the virtual address of the ring buffer mappi=
ng in
+ *     the calling process's address space. A value of 0 is always invalid.
+ *     The context must not have been previously destroyed.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: 0
+ *   desc: Returns 0 on success. After successful return, the context hand=
le is
+ *     invalid and all resources have been released. All outstanding I/O
+ *     operations have completed.
+ *
+ * error: EINVAL, Invalid context
+ *   desc: The ctx argument does not refer to a valid AIO context in the c=
alling
+ *     process. This can occur if: (1) ctx was never returned by io_setup,
+ *     (2) ctx was returned by io_setup in a different process, (3) ctx was
+ *     already destroyed by a previous io_destroy call, (4) ctx is 0 or an
+ *     arbitrary invalid value, or (5) the ring buffer at the ctx address =
has
+ *     been corrupted (e.g., the id field no longer matches).
+ *
+ * lock: mm->ioctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-mm spinlock protecting the ioctx_table. Held briefly while
+ *     marking the context as dead and removing it from the process's AIO
+ *     context table.
+ *
+ * lock: RCU read lock
+ *   type: KAPI_LOCK_RCU
+ *   desc: RCU read-side critical section held during context lookup in
+ *     lookup_ioctx(). Protects against concurrent modification of the
+ *     ioctx_table.
+ *
+ * lock: ctx->ctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock held while cancelling outstanding I/O requ=
ests
+ *     in free_ioctx_users(). Protects the active_reqs list.
+ *
+ * lock: mmap_lock
+ *   type: KAPI_LOCK_RWLOCK
+ *   desc: Process memory map write lock acquired during vm_munmap() when
+ *     unmapping the ring buffer. May contend with other memory operations
+ *     in the same process.
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: ctx->dead flag
+ *   desc: Atomically sets the context's dead flag to 1, marking it as bei=
ng
+ *     destroyed. This prevents new I/O submissions and ensures subsequent
+ *     io_destroy calls return -EINVAL.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: mm->ioctx_table
+ *   desc: Removes the context from the process's AIO context table by set=
ting
+ *     the corresponding table entry to NULL. After this, lookup_ioctx will
+ *     no longer find this context.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: aio_nr (global counter)
+ *   desc: Decrements the system-wide AIO context counter by the context's
+ *     max_reqs value (the nr_events originally passed to io_setup). This
+ *     counter is visible via /proc/sys/fs/aio-nr.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: process virtual memory
+ *   desc: Unmaps the ring buffer from the process's address space via
+ *     vm_munmap(). The memory region at ctx becomes invalid.
+ *   condition: ctx->mmap_size > 0
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FREE_MEMORY
+ *   target: kioctx structure and associated resources
+ *   desc: Frees the AIO context structure, percpu data, ring buffer pages=
, and
+ *     the anonymous file backing the ring buffer. Deferred via RCU work q=
ueue
+ *     to ensure safe cleanup after all references are dropped.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SIGNAL_SEND
+ *   target: outstanding AIO operations
+ *   desc: Cancels all outstanding asynchronous I/O operations by invoking=
 their
+ *     ki_cancel callbacks. The specific effect depends on the operation t=
ype
+ *     (read, write, fsync, poll).
+ *   condition: active_reqs list is not empty
+ *   reversible: no
+ *
+ * state-trans: AIO context state
+ *   from: alive (ctx->dead =3D=3D 0)
+ *   to: dead (ctx->dead =3D=3D 1)
+ *   condition: successful atomic exchange in kill_ioctx
+ *   desc: The context transitions from usable to destroyed. Once dead, the
+ *     context cannot be used for any operations and will be freed after a=
ll
+ *     references are dropped.
+ *
+ * state-trans: process AIO state
+ *   from: has AIO context(s)
+ *   to: context removed (or no contexts)
+ *   condition: successful io_destroy
+ *   desc: The destroyed context is removed from the process's context tab=
le.
+ *     If this was the only context, the process no longer has any active
+ *     AIO contexts.
+ *
+ * state-trans: system AIO resources
+ *   from: aio_nr =3D N
+ *   to: aio_nr =3D N - max_reqs
+ *   condition: successful io_destroy
+ *   desc: System-wide AIO resource counter decreases, making room for oth=
er
+ *     processes to create new AIO contexts.
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscal=
l to be
+ *     available. If not configured, the syscall returns -ENOSYS. This is
+ *     typically enabled by default but may be disabled on embedded system=
s.
+ *
+ * constraint: Context must belong to calling process
+ *   desc: Each AIO context is bound to a specific process (mm_struct). A =
context
+ *     created by one process cannot be destroyed by another process, even=
 if
+ *     the context handle value is somehow known.
+ *   expr: ctx belongs to current->mm
+ *
+ * examples: io_destroy(ctx);  // Destroy context and wait for completion
+ *   if (io_destroy(ctx) =3D=3D -EINVAL) handle_error();  // Invalid conte=
xt
+ *
+ * notes: The man page documents EFAULT as a possible error, but code anal=
ysis
+ *   shows that EFAULT conditions (e.g., invalid ring buffer pointer) actu=
ally
+ *   result in EINVAL being returned, as lookup_ioctx returns NULL on any
+ *   failure to access the ring buffer header.
+ *
+ *   This syscall blocks in TASK_UNINTERRUPTIBLE state while waiting for
+ *   outstanding I/O operations to complete. This means the process cannot=
 be
+ *   interrupted by signals during this wait. In extreme cases with very s=
low
+ *   I/O devices, this could cause the process to appear hung.
+ *
+ *   Historical note: Before kernel 3.11, io_destroy blocked waiting for I=
/O
+ *   completion. A refactoring in 3.11 accidentally removed this behavior,
+ *   creating a race where userspace buffers could be freed while the kern=
el
+ *   was still using them. This was fixed by commit e02ba72aabfa that bloc=
ks
+ *   io_destroy until all context requests are completed.
+ *
+ *   Race condition handling: A race between io_destroy and io_submit was =
fixed
+ *   by commit 7137c6bd4552. A race between io_setup and io_destroy was fi=
xed
+ *   by commit 86b62a2cb4fc. Both fixes ensure proper synchronization via
+ *   reference counting.
+ *
+ *   io_uring (since Linux 5.1) is a more modern alternative that provides=
 better
+ *   performance and more features. Consider using io_uring for new applic=
ations.
+ *
+ *   There is no glibc wrapper for this syscall. Use syscall(SYS_io_destro=
y, ctx)
+ *   or the libaio library wrapper io_destroy(). Note: libaio has slightly
+ *   different error semantics, returning negative error numbers directly =
instead
+ *   of -1 with errno.
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
 {
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1EF42E92D4;
	Thu, 18 Dec 2025 20:42:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090572; cv=none;
 b=OdO/aE7E5I/uIR0IeMpEatCIPnrOWV0Gd9m2hRYtTM3q/JJJa3TlM7FiJ+gn7j6Z4QJB12ZimtHCu4RaKkLT7Kgnq9i5IrhhP1+THPEZq815vX5mp+RMbbByaAniidqcivlKTX9y3dECw4JwrxIcIDPtXysYJdFB7Lebkn33w4I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090572; c=relaxed/simple;
	bh=15QMa2QRWzuHNFPoY2VDNiHPRh2YTSroy0oCN7V30og=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=mg8S+JPgDJdv5vcpGyIabVpxw3HcDnjtzNer9AOrl6s19eJHWlFZNVn2+g4SjDAeeYELPJbRO/AzcFnNi0eE88f83Sa5LtB3jq+3sew28t93xZNBmNJX459+MVAogzaYHU4a8e7nrh4/TmBMRwetDRP86jM6q4MSEwF+GoyQ5ws=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=WaJ3h4kz; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="WaJ3h4kz"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id E831DC113D0;
	Thu, 18 Dec 2025 20:42:51 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090572;
	bh=15QMa2QRWzuHNFPoY2VDNiHPRh2YTSroy0oCN7V30og=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=WaJ3h4kzgOxMbfRPwjaKsQf1Q0ly1MlEO2Z6as2aBmSn2YHPh/S3laqZbnBjh0rNg
	 3fiKDg3f6HmZj8UKBTHgfzDR3lvymRX6uOKqyCxVX4Uq0/KKsHxIestUJ5y8NB/q/f
	 hQ+3qrQVFq2sNaTfXvx2Si/kbAz4m/8MZX0uJX5IWrPcfctHTpW7oC8ojb40zIL0Q1
	 wNHuHcRFtWzd4r9hepjKf3zIPrLodONtFc3oYWyEE3Am8WKdprLrnpv8iiQrPDEplk
	 L7lpLmhUz40izRmbA0Z0V8qZnANCmVZMpFua5MSResYufSi4AyPi6CadgERE8Aseb2
	 bqu9oatLRvmDw==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 07/15] kernel/api: add API specification for io_submit
Date: Thu, 18 Dec 2025 15:42:29 -0500
Message-ID: <20251218204239.4159453-8-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 308 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index ff2a8527e1b85..f6f1b3790c88b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2450,17 +2450,314 @@ static int io_submit_one(struct kioctx *ctx, struc=
t iocb __user *user_iocb,
 	return err;
 }
=20
-/* sys_io_submit:
- *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
- *	the number of iocbs queued.  May return -EINVAL if the aio_context
- *	specified by ctx_id is invalid, if nr is < 0, if the iocb at
- *	*iocbpp[0] is not properly initialized, if the operation specified
- *	is invalid for the file descriptor in the iocb.  May fail with
- *	-EFAULT if any of the data structures point to invalid data.  May
- *	fail with -EBADF if the file descriptor specified in the first
- *	iocb is invalid.  May fail with -EAGAIN if insufficient resources
- *	are available to queue any iocbs.  Will return 0 if nr is 0.  Will
- *	fail with -ENOSYS if not implemented.
+/**
+ * sys_io_submit - Submit asynchronous I/O operations for processing
+ * @ctx_id: AIO context handle returned by io_setup
+ * @nr: Number of I/O control blocks to submit
+ * @iocbpp: Array of pointers to iocb structures describing the operations
+ *
+ * long-desc: Submits one or more asynchronous I/O operations for processi=
ng
+ *   against a previously created AIO context. Each iocb structure describ=
es
+ *   a single I/O operation including the operation type, file descriptor,
+ *   buffer, size, and offset.
+ *
+ *   The syscall processes iocbs sequentially from the array. If an error
+ *   occurs while processing an iocb, submission stops at that point and
+ *   the number of successfully submitted operations is returned. This mea=
ns
+ *   partial submission is possible: if submitting 10 iocbs and the 5th fa=
ils,
+ *   4 is returned and iocbs 0-3 are queued for processing.
+ *
+ *   Supported operations (specified via aio_lio_opcode):
+ *   - IOCB_CMD_PREAD (0): Positioned read from file
+ *   - IOCB_CMD_PWRITE (1): Positioned write to file
+ *   - IOCB_CMD_FSYNC (2): Sync file data and metadata
+ *   - IOCB_CMD_FDSYNC (3): Sync file data only
+ *   - IOCB_CMD_POLL (5): Poll for events on file descriptor
+ *   - IOCB_CMD_NOOP (6): No operation (useful for testing)
+ *   - IOCB_CMD_PREADV (7): Positioned scatter read
+ *   - IOCB_CMD_PWRITEV (8): Positioned gather write
+ *
+ *   The iocb structure fields include:
+ *   - aio_data: User data copied to io_event on completion
+ *   - aio_lio_opcode: Operation type (one of IOCB_CMD_*)
+ *   - aio_fildes: File descriptor for the operation
+ *   - aio_buf: Buffer address (or iovec array for vectored ops)
+ *   - aio_nbytes: Buffer size (or iovec count for vectored ops)
+ *   - aio_offset: File offset for positioned operations
+ *   - aio_flags: Optional flags (IOCB_FLAG_RESFD, IOCB_FLAG_IOPRIO)
+ *   - aio_resfd: eventfd to signal on completion (if IOCB_FLAG_RESFD set)
+ *   - aio_rw_flags: Per-operation RWF_* flags
+ *   - aio_reqprio: I/O priority (if IOCB_FLAG_IOPRIO set)
+ *
+ *   After successful submission, operations complete asynchronously. Resu=
lts
+ *   are delivered to the completion ring buffer and can be retrieved via
+ *   io_getevents(). If aio_resfd specifies a valid eventfd, it is signaled
+ *   when each operation completes.
+ *
+ *   The actual I/O may complete synchronously if the data is cached or if
+ *   the underlying filesystem doesn't support truly asynchronous I/O. In
+ *   such cases, the operation is still reported via the completion ring.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: ctx_id
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid AIO context handle previously returned by
+ *     io_setup() for the current process. The context must not have been
+ *     destroyed. A value of 0 is always invalid. The handle is actually
+ *     the virtual address of the ring buffer mapping.
+ *
+ * param: nr
+ *   type: KAPI_TYPE_INT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, LONG_MAX
+ *   constraint: Must be >=3D 0. If 0, the syscall returns immediately wit=
h 0.
+ *     The actual number processed is capped to ctx->nr_events (the contex=
t's
+ *     capacity). Very large values are effectively limited by the context
+ *     capacity and available ring buffer slots.
+ *
+ * param: iocbpp
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid userspace pointer to an array of nr point=
ers
+ *     to struct iocb. Each iocb pointer must itself be valid and point to=
 a
+ *     properly initialized iocb structure. The iocb structures must have
+ *     aio_reserved2 set to 0 for forward compatibility.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_RANGE
+ *   success: >=3D 0
+ *   desc: Returns the number of iocbs successfully submitted (0 to nr). If
+ *     partial submission occurs due to an error, returns the count of
+ *     successfully submitted operations. Returns 0 if nr is 0.
+ *
+ * error: EINVAL, Invalid context or parameter
+ *   desc: Returned if ctx_id is invalid, nr is negative, aio_reserved2 is
+ *     non-zero, aio_lio_opcode is invalid, aio_buf/aio_nbytes overflow,
+ *     aio_resfd is not an eventfd, conflicting aio_rw_flags, file lacks
+ *     required operation support, invalid POLL/FSYNC parameters, or
+ *     invalid aio_reqprio class.
+ *
+ * error: EFAULT, Invalid memory access
+ *   desc: Returned if: (1) iocbpp is not a valid userspace pointer, (2) a=
ny
+ *     pointer in the iocbpp array is invalid, (3) the iocb data cannot be
+ *     copied from userspace, (4) aio_buf points to invalid memory, or
+ *     (5) the kernel cannot write the aio_key field back to userspace.
+ *
+ * error: EBADF, Bad file descriptor
+ *   desc: Returned if: (1) aio_fildes in an iocb does not refer to an open
+ *     file, (2) aio_resfd does not refer to a valid file descriptor when
+ *     IOCB_FLAG_RESFD is set, (3) the file is not opened with appropriate
+ *     mode for the operation (e.g., read on write-only file).
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ *   desc: Returned if insufficient slots are available in the completion
+ *     ring buffer. This typically means too many operations are already
+ *     in flight and the application should call io_getevents() to consume
+ *     completed events before submitting more.
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: Returned if: (1) IOCB_FLAG_IOPRIO is set and aio_reqprio specif=
ies
+ *     IOPRIO_CLASS_RT (real-time I/O priority) but the process lacks
+ *     CAP_SYS_ADMIN or CAP_SYS_NICE capability, or (2) RWF_NOAPPEND is
+ *     specified but the file has the append-only attribute (IS_APPEND).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ *   desc: Returned if: (1) unsupported aio_rw_flags are specified, (2)
+ *     RWF_NOWAIT is specified but the file doesn't support non-blocking I=
/O
+ *     (FMODE_NOWAIT not set), (3) RWF_ATOMIC is specified for a read or
+ *     the file doesn't support atomic writes, or (4) RWF_DONTCACHE is
+ *     specified but not supported by the filesystem or file is DAX-mapped.
+ *
+ * error: EOVERFLOW, Value too large
+ *   desc: Returned if aio_offset plus aio_nbytes would overflow and the
+ *     file does not support unsigned offsets. This check prevents reading
+ *     or writing past the maximum representable file position.
+ *
+ * error: ENOMEM, Out of memory
+ *   desc: Returned if memory allocation fails when preparing credentials
+ *     for IOCB_CMD_FSYNC operations, or if vectored I/O (preadv/pwritev)
+ *     requires allocating iovec arrays larger than the stack buffer.
+ *
+ * lock: RCU read lock
+ *   type: KAPI_LOCK_RCU
+ *   desc: Acquired during context lookup in lookup_ioctx(). Protects agai=
nst
+ *     concurrent modification of the ioctx_table while looking up the
+ *     context. Released before processing any iocbs.
+ *
+ * lock: ctx->completion_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock acquired briefly during request slot alloc=
ation
+ *     via user_refill_reqs_available() if the percpu request counter is e=
mpty.
+ *     Protects the ring buffer tail and completed_events counters.
+ *
+ * lock: ctx->ctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock acquired when adding cancellable requests =
to
+ *     the active_reqs list. This enables io_cancel() to find and cancel
+ *     in-flight operations.
+ *
+ * lock: blk_plug
+ *   type: KAPI_LOCK_CUSTOM
+ *   desc: Block layer plugging is enabled when nr > 2 (AIO_PLUG_THRESHOLD)
+ *     to batch block I/O requests for better performance. This is not a
+ *     traditional lock but affects I/O scheduling.
+ *
+ * signal: any
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_TRANSFORM
+ *   condition: Signal arrives during underlying read/write operation
+ *   desc: If a signal arrives during the underlying file read/write opera=
tion
+ *     and the operation returns ERESTARTSYS/ERESTARTNOINTR/etc., the error
+ *     is transformed to EINTR for the completion event. AIO operations ca=
nnot
+ *     be restarted in the traditional sense because other operations may =
have
+ *     already been submitted. The syscall itself (io_submit) is NOT inter=
rupted
+ *     by signals - only the individual async operations can be.
+ *   error: -EINTR (in io_event.res, not syscall return)
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: no
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: aio_kiocb structures
+ *   desc: Allocates one aio_kiocb structure per submitted operation from =
the
+ *     kiocb_cachep slab cache. These structures track the in-flight opera=
tions
+ *     and are freed after completion is recorded in the ring buffer.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: AIO context request counters
+ *   desc: Decrements the available request slot counter in the context.
+ *     Slots are reclaimed when completion events are consumed from the ri=
ng
+ *     buffer via io_getevents().
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: ctx->active_reqs list
+ *   desc: Cancellable operations (reads, writes, polls) are added to the
+ *     context's active_reqs list, enabling cancellation via io_cancel().
+ *   condition: Operation supports cancellation
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: iocb->aio_key field
+ *   desc: The kernel writes KIOCB_KEY (0) to the aio_key field of each
+ *     submitted iocb in userspace memory. This marks the iocb as submitted
+ *     and is checked by io_cancel() to validate the iocb.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: file reference count
+ *   desc: Increments the reference count of the file descriptor's struct =
file
+ *     via fget() for each submitted operation. The reference is released
+ *     when the operation completes (via fput() in iocb_destroy()).
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: target file(s)
+ *   desc: For write operations, the file content may be modified. For fsy=
nc
+ *     operations, dirty data is flushed to storage. The actual I/O may
+ *     complete synchronously or asynchronously depending on the filesyste=
m.
+ *   condition: IOCB_CMD_PWRITE, IOCB_CMD_PWRITEV, IOCB_CMD_FSYNC, IOCB_CM=
D_FDSYNC
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SCHEDULE
+ *   target: fsync work queue
+ *   desc: FSYNC and FDSYNC operations are scheduled to run on a workqueue
+ *     because vfs_fsync() can block. The operation runs asynchronously and
+ *     completion is signaled via the ring buffer.
+ *   condition: IOCB_CMD_FSYNC or IOCB_CMD_FDSYNC
+ *   reversible: no
+ *
+ * state-trans: iocb state
+ *   from: user-prepared iocb
+ *   to: submitted (aio_key set to KIOCB_KEY)
+ *   condition: successful submission of each iocb
+ *   desc: Each successfully submitted iocb transitions from user-prepared
+ *     state to submitted state, marked by the kernel writing KIOCB_KEY to
+ *     aio_key. The iocb remains in submitted state until completion.
+ *
+ * state-trans: AIO context slot availability
+ *   from: slots_available =3D N
+ *   to: slots_available =3D N - submitted_count
+ *   condition: successful submission
+ *   desc: Available slots in the context decrease by the number of succes=
sfully
+ *     submitted operations. Slots are reclaimed when io_getevents() consu=
mes
+ *     completion events.
+ *
+ * capability: CAP_SYS_ADMIN
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Use of IOPRIO_CLASS_RT (real-time I/O priority class)
+ *   without: Returns EPERM when attempting to use RT I/O priority
+ *   condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLAS=
S_RT
+ *
+ * capability: CAP_SYS_NICE
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Use of IOPRIO_CLASS_RT (alternative to CAP_SYS_ADMIN)
+ *   without: Returns EPERM when attempting to use RT I/O priority
+ *   condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLAS=
S_RT
+ *
+ * constraint: Ring buffer slot availability
+ *   desc: There must be available slots in the completion ring buffer for
+ *     each operation to be submitted. If all slots are occupied by pending
+ *     completion events, submission fails with EAGAIN. The number of slots
+ *     is determined by nr_events passed to io_setup(), though internal
+ *     doubling means more slots may be available.
+ *   expr: available_slots >=3D 1 for each submission
+ *
+ * constraint: Valid file descriptor per iocb
+ *   desc: Each iocb must reference a valid, open file descriptor via
+ *     aio_fildes. The file must be opened with appropriate access mode
+ *     for the requested operation (read access for PREAD, write access
+ *     for PWRITE, etc.).
+ *
+ * constraint: File must support operation
+ *   desc: For read/write operations, the underlying file must implement
+ *     read_iter/write_iter file operations. For fsync, the file must
+ *     implement fsync. For poll, the file must support vfs_poll().
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscall
+ *     to be available. If not configured, returns -ENOSYS.
+ *
+ * examples: struct iocb iocb, *iocbp =3D &iocb; io_submit(ctx, 1, &iocbp);
+ *   struct iocb iocbs[10], *ptrs[10]; io_submit(ctx, 10, ptrs);  // Batch=
 submit
+ *
+ * notes: Unlike traditional synchronous I/O, errors from io_submit() indi=
cate
+ *   submission failures, not I/O errors. Actual I/O errors are reported v=
ia
+ *   the res field of struct io_event when retrieved via io_getevents().
+ *
+ *   The return value indicates how many iocbs were successfully submitted.
+ *   If this is less than nr, the application should check which operation
+ *   failed (it's the one at index =3D return_value) and handle the error.
+ *   Previously submitted operations in the batch are still queued.
+ *
+ *   For vectored operations (PREADV/PWRITEV), aio_buf points to an array
+ *   of struct iovec and aio_nbytes contains the iovec count. The maximum
+ *   iovec count is UIO_MAXIOV (1024).
+ *
+ *   Block layer plugging is automatically enabled for batches larger than
+ *   2 operations, improving I/O merging and reducing per-I/O overhead.
+ *
+ *   The COMPAT_SYSCALL variant handles 32-bit userspace on 64-bit kernels,
+ *   using compat_uptr_t for the iocbpp array elements.
+ *
+ *   Historical note: commit d6b2615f7d31d ("aio: simplify - and fix - fge=
t/fput
+ *   for io_submit()") fixed file descriptor reference counting issues. Ea=
rlier
+ *   kernels could leak file references on certain error paths.
+ *
+ *   io_uring (since Linux 5.1) is a more modern and performant alternativ=
e.
+ *   Consider using io_uring_enter() for new applications requiring async =
I/O.
+ *
+ *   There is no glibc wrapper; use syscall(SYS_io_submit, ...) or the lib=
aio
+ *   library. The libaio wrapper io_submit() returns negative error numbers
+ *   directly rather than returning -1 and setting errno.
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 		struct iocb __user * __user *, iocbpp)
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AD332DC774;
	Thu, 18 Dec 2025 20:42:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090573; cv=none;
 b=Gl7UNtW5015L0MhbvgV/FJuwsiL1jxH+qRIVljKJz6aiUrkeXVCozhT5Mn0ZHFTykxx6IH1kShvS/kXAVzKPEao7KpvG/inxmq+IEALfcDsuptEW3z+ZFgje87pKsXMAOyZx9DBrRuz2iWXrTMyglpLaxYXYFa8vxbBZ71aOLe4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090573; c=relaxed/simple;
	bh=TWRyLzOK+gmM4eFtawhYOHlyCd0Mmshknky6n5T1bvA=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=YKz7Jy6px1QwN4pjoW4PujLdbIZqRrKFxBFLxBpmAsCr/mU3hvzRMQFAmvu9U4pXf/b/0xQIML2cLUcnDL6Yy2QCS933+Thi4Ey2bwyTJX1YP7qJ2FvPv4zTrDxSBs1bIx2qC0TPkQT9cvAkTMT8+qrS2uyrZGauw077bBhATUE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=N9x40JpJ; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="N9x40JpJ"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id D51C5C16AAE;
	Thu, 18 Dec 2025 20:42:52 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090573;
	bh=TWRyLzOK+gmM4eFtawhYOHlyCd0Mmshknky6n5T1bvA=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=N9x40JpJzNzswbsy0KgfTXDPQ+7AX/0Cb23mAAxbniebGBmJroQoF6ztOd1IbJ05q
	 2QNRSt22Fd1ZXIbpxPMGH3u2RqmR+YAHzwbyVaVVTMj9rvhsjHS1sXPB+R31/0uwFc
	 ik7zr1FK3s8tVRl4bKhflAfh6MUW523SmtWAzQ0HRck+hbiAG+ZXoNMXqTD6tq57AJ
	 90oM0JljpKVftqKgnBGojSs3ga6DMg/h41bz2GUAG46lwu5QiSWJNUXK04cz/ZPp7k
	 mp96toMdHgXqN+9b3WGEJQbs9fRlqXlKAAclomIir0NndPOHsftP52k9HUT8UWFxMp
	 TSssBV800vqCg==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 08/15] kernel/api: add API specification for io_cancel
Date: Thu, 18 Dec 2025 15:42:30 -0500
Message-ID: <20251218204239.4159453-9-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 237 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f6f1b3790c88b..710517c9a990d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2843,15 +2843,243 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_conte=
xt_t, ctx_id,
 }
 #endif
=20
-/* sys_io_cancel:
- *	Attempts to cancel an iocb previously passed to io_submit.  If
- *	the operation is successfully cancelled, the resulting event is
- *	copied into the memory pointed to by result without being placed
- *	into the completion queue and 0 is returned.  May fail with
- *	-EFAULT if any of the data structures pointed to are invalid.
- *	May fail with -EINVAL if aio_context specified by ctx_id is
- *	invalid.  May fail with -EAGAIN if the iocb specified was not
- *	cancelled.  Will fail with -ENOSYS if not implemented.
+/**
+ * sys_io_cancel - Attempt to cancel an outstanding asynchronous I/O opera=
tion
+ * @ctx_id: AIO context handle returned by io_setup
+ * @iocb: Pointer to the iocb structure that was previously submitted
+ * @result: Unused parameter (historically for result storage, now ignored)
+ *
+ * long-desc: Attempts to cancel an asynchronous I/O operation that was
+ *   previously submitted via io_submit(). The syscall searches for the
+ *   specified iocb in the context's active request list and invokes the
+ *   operation-specific cancellation callback if found.
+ *
+ *   The cancellation behavior depends on the type of I/O operation:
+ *   - For poll operations (IOCB_CMD_POLL): The request is marked as cance=
lled
+ *     and a work item is scheduled to complete the cancellation.
+ *   - For USB gadget I/O: The USB endpoint dequeue function is called, wh=
ich
+ *     triggers the completion callback with -ECONNRESET status.
+ *   - For most direct I/O operations: Cancellation is typically not suppo=
rted
+ *     as these operations do not register a cancel callback.
+ *
+ *   If the iocb is found and has a registered cancellation callback, that
+ *   callback is invoked and the iocb is removed from the active request l=
ist.
+ *   The completion event is delivered via the ring buffer (not via the re=
sult
+ *   parameter, which is now unused for this purpose).
+ *
+ *   On successful cancellation initiation, the syscall returns -EINPROGRE=
SS
+ *   (not 0) to indicate that cancellation is in progress. This is because
+ *   the actual completion may occur asynchronously via the cancel callbac=
k.
+ *
+ *   Important limitations:
+ *   - Most file I/O operations do not support cancellation
+ *   - The iocb must still be pending (not yet completed)
+ *   - The iocb must have been submitted via io_submit (aio_key =3D=3D KIO=
CB_KEY)
+ *   - Only operations that register a ki_cancel callback can be cancelled
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC
+ *
+ * param: ctx_id
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid AIO context handle previously returned by
+ *     io_setup() for the current process. The context must not have been
+ *     destroyed via io_destroy(). A value of 0 is always invalid. The han=
dle
+ *     is actually the virtual address of the ring buffer mapping, and must
+ *     belong to the calling process's address space.
+ *
+ * param: iocb
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   size: sizeof(struct iocb)
+ *   constraint-type: KAPI_CONSTRAINT_USER_PTR
+ *   constraint: Must be a valid userspace pointer to a struct iocb that w=
as
+ *     previously submitted via io_submit(). The iocb's aio_key field must
+ *     contain KIOCB_KEY (0), which is written by the kernel during io_sub=
mit.
+ *     A NULL pointer will result in EFAULT. The iocb must still be pending
+ *     (present in the context's active_reqs list) for cancellation to suc=
ceed.
+ *
+ * param: result
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ *   constraint-type: KAPI_CONSTRAINT_NONE
+ *   constraint: This parameter is no longer used by the kernel. It was
+ *     historically intended to receive the io_event result on successful
+ *     cancellation, but completion events are now always delivered via the
+ *     ring buffer. May be NULL.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: -EINPROGRESS
+ *   desc: Returns -EINPROGRESS when the cancellation callback was success=
fully
+ *     invoked and the request is being cancelled. This is the expected re=
turn
+ *     value on successful cancellation initiation. The completion event w=
ill
+ *     be delivered via the ring buffer. Note that this is different from =
the
+ *     man page which claims 0 is returned on success.
+ *
+ * error: EFAULT, Cannot read iocb from userspace
+ *   desc: Returned if the iocb pointer is invalid or points to memory that
+ *     cannot be read. Specifically, the kernel attempts to read the aio_k=
ey
+ *     field from the iocb via get_user() and returns EFAULT if this fails.
+ *     A NULL iocb pointer will trigger this error.
+ *
+ * error: EINVAL, iocb not submitted via io_submit
+ *   desc: Returned if the aio_key field of the iocb does not contain KIOC=
B_KEY
+ *     (which is 0). The kernel sets aio_key to KIOCB_KEY when an iocb is
+ *     successfully submitted via io_submit(). If aio_key contains a diffe=
rent
+ *     value, it indicates the iocb was never successfully submitted, is
+ *     corrupted, or the memory has been reused.
+ *
+ * error: EINVAL, Invalid AIO context
+ *   desc: Returned if ctx_id does not refer to a valid AIO context. This =
can
+ *     occur if: (1) the context was never created, (2) the context was
+ *     destroyed via io_destroy(), (3) the ctx_id is 0, (4) the ring buffer
+ *     header cannot be read from userspace, (5) the context belongs to a
+ *     different process, or (6) the context's internal ID doesn't match.
+ *
+ * error: EINVAL, iocb not found or not cancellable
+ *   desc: Returned if the specified iocb is not present in the context's
+ *     active request list. This occurs when: (1) the operation has already
+ *     completed and the completion event is in the ring buffer, (2) the
+ *     operation was never submitted to this context, (3) the iocb pointer
+ *     does not match any pending operation (comparison is by pointer value
+ *     converted to u64), or (4) the operation did not register a cancella=
tion
+ *     callback (though in this case EINVAL comes from the default ret val=
ue).
+ *     Note: The man page documents EAGAIN for this case, but the actual
+ *     implementation returns EINVAL.
+ *
+ * error: ENOSYS, AIO not implemented
+ *   desc: Returned if the kernel was compiled without CONFIG_AIO support.
+ *     This error is returned by the syscall dispatch mechanism before the
+ *     io_cancel implementation is even reached.
+ *
+ * error: (driver-specific), Cancellation callback failed
+ *   desc: If the iocb is found and its ki_cancel callback is invoked, the
+ *     callback's return value is propagated to userspace if non-zero. For
+ *     USB gadget operations, usb_ep_dequeue() may return various errors
+ *     including EINVAL if the request wasn't queued. The aio_poll_cancel
+ *     callback always returns 0. Driver-specific cancellation functions
+ *     may return other error codes.
+ *
+ * lock: RCU read lock
+ *   type: KAPI_LOCK_RCU
+ *   desc: Acquired in lookup_ioctx() during context lookup. Protects agai=
nst
+ *     concurrent modification of the mm->ioctx_table while searching for =
the
+ *     context. Released before any spinlocks are acquired.
+ *
+ * lock: ctx->ctx_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   desc: Per-context spinlock acquired with interrupts disabled via
+ *     spin_lock_irq(). Held while iterating through the active_reqs list
+ *     searching for the iocb, while invoking the ki_cancel callback, and
+ *     while removing the iocb from the list. The cancel callback is invok=
ed
+ *     with this lock held, so callbacks must not sleep and must be IRQ-sa=
fe.
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: ctx->active_reqs list
+ *   desc: If the iocb is found and its cancellation callback is invoked, =
the
+ *     kiocb is removed from the context's active_reqs list via list_del_i=
nit().
+ *     This prevents the iocb from being found by subsequent io_cancel cal=
ls.
+ *   condition: iocb found and ki_cancel callback invoked
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: Pending I/O operation
+ *   desc: The cancellation callback may modify the state of the underlying
+ *     I/O operation. For poll operations, the cancelled flag is set. For =
USB
+ *     operations, the USB request is dequeued which triggers the completi=
on
+ *     callback. The completion event is delivered via the ring buffer.
+ *   condition: ki_cancel callback is invoked
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_SCHEDULE
+ *   target: aio_poll work queue
+ *   desc: For poll operations (IOCB_CMD_POLL), the aio_poll_cancel callba=
ck
+ *     schedules a work item via schedule_work() to complete the cancellat=
ion
+ *     asynchronously. This work item will eventually deliver the completi=
on
+ *     event to the ring buffer.
+ *   condition: Cancelling a poll operation
+ *   reversible: no
+ *
+ * state-trans: kiocb state
+ *   from: in_flight (in active_reqs list)
+ *   to: cancelling (removed from list, cancel callback invoked)
+ *   condition: iocb found and ki_cancel invoked
+ *   desc: When the iocb is found in the active_reqs list and its cancella=
tion
+ *     callback is invoked, the kiocb transitions from in-flight to cancel=
ling
+ *     state. The kiocb is removed from the active_reqs list, preventing
+ *     duplicate cancellation attempts. Final completion occurs asynchrono=
usly.
+ *
+ * state-trans: poll_iocb cancelled flag
+ *   from: false
+ *   to: true
+ *   condition: aio_poll_cancel is invoked
+ *   desc: For poll operations, the aio_poll_cancel callback sets the canc=
elled
+ *     flag on the poll_iocb structure. This signals to the poll completion
+ *     handler that the operation was cancelled rather than completed norm=
ally.
+ *
+ * constraint: Operation must support cancellation
+ *   desc: Only operations that register a ki_cancel callback can be cance=
lled.
+ *     Operations that don't set this callback (most direct I/O operations)
+ *     will never appear in the active_reqs list and thus cannot be cancel=
led.
+ *     Currently, only IOCB_CMD_POLL operations in the kernel AIO subsystem
+ *     and USB gadget operations support cancellation.
+ *
+ * constraint: Timing window for cancellation
+ *   desc: The iocb must still be pending at the time io_cancel is called.
+ *     There is an inherent race condition: the operation may complete
+ *     naturally between the time the application decides to cancel and wh=
en
+ *     io_cancel is invoked. In this case, EINVAL is returned because the
+ *     iocb is no longer in the active_reqs list.
+ *
+ * constraint: CONFIG_AIO required
+ *   desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscall
+ *     to be available. If not configured, ENOSYS is returned.
+ *
+ * examples: io_cancel(ctx, &iocb, NULL);  // Cancel with unused result pa=
ram
+ *   if (io_cancel(ctx, &iocb, NULL) =3D=3D -EINPROGRESS) handle_cancellat=
ion();
+ *   ret =3D io_cancel(ctx, &iocb, NULL); if (ret && ret !=3D -EINPROGRESS=
) error();
+ *
+ * notes: The return value semantics are unusual: -EINPROGRESS indicates
+ *   successful cancellation initiation, not an error. This is because the
+ *   actual cancellation may complete asynchronously, with the completion
+ *   event delivered via the ring buffer.
+ *
+ *   The result parameter is completely ignored by current kernels. It was
+ *   historically used to return the io_event directly, but since commit
+ *   28468cbed92e ("Revert 'fs/aio: Make io_cancel() generate completions
+ *   again'"), completion events are always delivered via the ring buffer.
+ *   Applications should use io_getevents() to retrieve the cancelled
+ *   operation's completion event.
+ *
+ *   The man page documents EAGAIN as a possible error when "the iocb spec=
ified
+ *   was not cancelled", but code analysis shows that EINVAL is actually
+ *   returned in this case. The man page is outdated in this regard.
+ *
+ *   The aio_key field must equal KIOCB_KEY (0) because the kernel writes =
this
+ *   value during io_submit. If an application attempts to cancel an iocb
+ *   before submitting it, or after the memory has been reused, this check
+ *   will fail with EINVAL.
+ *
+ *   For poll operations specifically, the cancellation is marked but the
+ *   actual completion may be delayed until a worker processes it. The
+ *   -EINPROGRESS return value reflects this asynchronous completion model.
+ *
+ *   USB gadget operations are an exception: when usb_ep_dequeue() is call=
ed,
+ *   it typically completes the request synchronously with -ECONNRESET sta=
tus
+ *   in the completion callback.
+ *
+ *   There is no glibc wrapper for this syscall. Applications must use
+ *   syscall(SYS_io_cancel, ...) or the libaio library. The libaio wrapper
+ *   returns negative error numbers directly rather than returning -1 and
+ *   setting errno.
+ *
+ *   io_uring (since Linux 5.1) provides a more capable and widely-support=
ed
+ *   async I/O interface with better cancellation support via IORING_OP_AS=
YNC_CANCEL.
+ *
+ * since-version: 2.5
  */
 SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, io=
cb,
 		struct io_event __user *, result)
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8408D2FDC55;
	Thu, 18 Dec 2025 20:42:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090574; cv=none;
 b=UoQ0kQWui1uwxL8pESu5/yyFzO5kt+OH8ZIhXUnkt6VGMditDL7wIpt+v6c/D4MaGdrSEuGIHKnhulOFuXHr+VrJJgmuhZKXHBz/VVoEVO5Ff/QuuhMtfb6Jt6tDCy/E0GK7ObWAE85nK8n94gnYe1wXSlt+oqARfBPvKdO5SpU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090574; c=relaxed/simple;
	bh=xlCNwYq6t1K/Oh4koGEQ7yQiu7YrMCTvErifTUwoWj8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=gtsU8uNTdpJb9wHWSsaVjWWiY6WADrCq9vWZR2g/hs9YtmLNRfm1Amv6eh81eJjfZe01jW6LOK+HcqtqY8R8/5U1o39SL+LzjYLZ9FJXHbq1a7BTuwrD27mOJEkBWKAm1DJ4wy/N/EgIbFSv1MDm6/1MCEs/gtcv8PHUF0hxtkI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=q7ihrVzC; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="q7ihrVzC"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD622C113D0;
	Thu, 18 Dec 2025 20:42:53 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090574;
	bh=xlCNwYq6t1K/Oh4koGEQ7yQiu7YrMCTvErifTUwoWj8=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=q7ihrVzCNwuIW9WK97CFNOhHYvkL9Gg0YHIC2Rzauaq5MUracAm5tT2GrjMczLPEQ
	 sfLBmuT6v0OuH4aQuGyc9IWllluv8hnbo2+xpDuVCmOz79IK98QahlLULMGvFoV299
	 O86BfysuZz8XLvYbwyYLyr6E9rriXJ3mDFzz7EP7t6+YgRT5TjvPmHEanQln6Bjwcr
	 ZvTGfCelyZn+2Vk9qQfMZ6aobXhjQpDNkgS2GgoLjqyVp6SsD4HXyS2ADoEo3gVRHk
	 13/LgvJzHnTm/GzJYwNlXx+crzXNAkSrJsdQdTT8XTb6zRtf9faSr64+0qgqgyBpDA
	 ENwhDilrA9CkQ==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 09/15] kernel/api: add API specification for setxattr
Date: Thu, 18 Dec 2025 15:42:31 -0500
Message-ID: <20251218204239.4159453-10-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/xattr.c | 310 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 310 insertions(+)

diff --git a/fs/xattr.c b/fs/xattr.c
index 32d445fb60aaf..02a946227129e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -740,6 +740,316 @@ SYSCALL_DEFINE6(setxattrat, int, dfd, const char __us=
er *, pathname, unsigned in
 			       args.flags);
 }
=20
+/**
+ * sys_setxattr - Set an extended attribute value on a file
+ * @pathname: Path to the file on which to set the extended attribute
+ * @name: Null-terminated name of the extended attribute (includes namespa=
ce prefix)
+ * @value: Buffer containing the attribute value to set
+ * @size: Size of the value buffer in bytes
+ * @flags: Flags controlling attribute creation/replacement behavior
+ *
+ * long-desc: Sets the value of an extended attribute identified by name on
+ *   the file specified by pathname. Extended attributes are name:value pa=
irs
+ *   associated with inodes (files, directories, symbolic links, etc.) that
+ *   extend the normal attributes (stat data) associated with all inodes.
+ *
+ *   The attribute name must include a namespace prefix. Valid namespaces =
are:
+ *   - "user." - User-defined attributes (regular files and directories on=
ly)
+ *   - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN)
+ *   - "security." - Security module attributes (e.g., SELinux, Smack, cap=
abilities)
+ *   - "system." - System attributes (e.g., POSIX ACLs via system.posix_ac=
l_access)
+ *
+ *   The value can be arbitrary binary data or text. A zero-length value is
+ *   permitted and creates an attribute with an empty value (different from
+ *   removing the attribute).
+ *
+ *   This syscall follows symbolic links. Use lsetxattr() to operate on the
+ *   symbolic link itself, or fsetxattr() to operate on an open file descr=
iptor.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: pathname
+ *   type: KAPI_TYPE_PATH
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_USER_PATH
+ *   constraint: Must be a valid null-terminated path string in user memor=
y.
+ *     The path is resolved following symbolic links. Maximum path length =
is
+ *     PATH_MAX (4096 bytes). The file must exist and the caller must have
+ *     appropriate permissions.
+ *
+ * param: name
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_USER_STRING
+ *   range: 1, 255
+ *   constraint: Must be a valid null-terminated string in user memory con=
taining
+ *     the extended attribute name with namespace prefix (e.g., "user.myat=
tr").
+ *     The name (including prefix) must be between 1 and XATTR_NAME_MAX (2=
55)
+ *     characters. An empty name returns ERANGE.
+ *
+ * param: value
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid pointer to user memory containing the att=
ribute
+ *     value, or NULL if size is 0. When size is non-zero, the pointer mus=
t be
+ *     valid and accessible for size bytes.
+ *
+ * param: size
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, 65536
+ *   constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX
+ *     (65536 bytes). Zero is permitted and creates an attribute with empt=
y value.
+ *     Filesystem-specific limits may be smaller (e.g., ext4 limits total =
xattr
+ *     space to one filesystem block, typically 4KB).
+ *
+ * param: flags
+ *   type: KAPI_TYPE_INT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_MASK
+ *   valid-mask: XATTR_CREATE | XATTR_REPLACE
+ *   constraint: Controls creation/replacement behavior. Valid values are =
0,
+ *     XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if t=
he
+ *     attribute already exists. XATTR_REPLACE fails if the attribute does=
 not
+ *     exist. With flags=3D0, the attribute is created if it doesn't exist=
 or
+ *     replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually ex=
clusive.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: 0
+ *   desc: Returns 0 on success. The extended attribute is set with the sp=
ecified
+ *     value. Any previous value for the attribute is replaced.
+ *
+ * error: ENOENT, File not found
+ *   desc: The file specified by pathname does not exist, or a directory c=
omponent
+ *     in the path does not exist. Returned from path lookup (filename_loo=
kup).
+ *
+ * error: EACCES, Permission denied
+ *   desc: Permission denied during path resolution (search permission on =
a directory
+ *     component) or write access to the file is denied based on DAC permi=
ssions.
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: Returned in several cases: (1) The file is marked immutable (ch=
attr +i)
+ *     or append-only (chattr +a). (2) For trusted.* namespace, caller lac=
ks
+ *     CAP_SYS_ADMIN in the filesystem's user namespace. (3) For security.*
+ *     namespace (except security.capability), caller lacks CAP_SYS_ADMIN.
+ *     (4) For user.* namespace on sticky directories, caller is not the o=
wner
+ *     and lacks CAP_FOWNER. (5) The inode has an unmapped ID in an idmapp=
ed mount.
+ *
+ * error: ENODATA, Attribute not found
+ *   desc: XATTR_REPLACE was specified but the named attribute does not ex=
ist on
+ *     the file. Also returned when reading trusted.* without CAP_SYS_ADMI=
N (for
+ *     read operations, but included here for completeness with the flag).
+ *
+ * error: EEXIST, Attribute already exists
+ *   desc: XATTR_CREATE was specified but the named attribute already exis=
ts on
+ *     the file.
+ *
+ * error: ERANGE, Name out of range
+ *   desc: The attribute name is empty (zero length) or exceeds XATTR_NAME=
_MAX
+ *     (255 characters). Returned from import_xattr_name() via strncpy_fro=
m_user().
+ *
+ * error: E2BIG, Value too large
+ *   desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Return=
ed from
+ *     setxattr_copy() before attempting to copy the value from userspace.
+ *
+ * error: EINVAL, Invalid argument
+ *   desc: The flags parameter contains bits other than XATTR_CREATE and
+ *     XATTR_REPLACE. Also returned for malformed capability values when s=
etting
+ *     security.capability, or when the xattr name doesn't match any handl=
er prefix.
+ *
+ * error: EFAULT, Bad address
+ *   desc: One of the user pointers (pathname, name, or value) is invalid =
or
+ *     points to memory that cannot be accessed. Returned from strncpy_fro=
m_user()
+ *     for pathname/name or vmemdup_user()/copy_from_user() for value.
+ *
+ * error: ENOMEM, Out of memory
+ *   desc: Kernel could not allocate memory to copy the attribute value fr=
om
+ *     userspace (via vmemdup_user), or for namespace capability conversion
+ *     (cap_convert_nscap allocates memory for v3 capability format).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ *   desc: The filesystem does not support extended attributes (IOP_XATTR =
not set),
+ *     or no xattr handler exists for the given namespace prefix, or the h=
andler
+ *     does not implement the set operation. Also returned for POSIX ACL x=
attrs
+ *     (system.posix_acl_*) when CONFIG_FS_POSIX_ACL is disabled.
+ *
+ * error: EROFS, Read-only filesystem
+ *   desc: The filesystem containing the file is mounted read-only. Return=
ed from
+ *     mnt_want_write() before attempting any modification.
+ *
+ * error: EIO, I/O error
+ *   desc: The inode is marked as bad (is_bad_inode), indicating filesystem
+ *     corruption or I/O failure. Also may be returned by filesystem-speci=
fic
+ *     xattr handler operations.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ *   desc: The user's disk quota for extended attributes has been exceeded.
+ *     Filesystem-specific error returned from the handler's set operation.
+ *
+ * error: ENOSPC, No space left on device
+ *   desc: The filesystem has insufficient space to store the extended att=
ribute.
+ *     Filesystem-specific error from handler's set operation.
+ *
+ * error: ELOOP, Too many symbolic links
+ *   desc: Too many symbolic links were encountered during path resolution
+ *     (more than MAXSYMLINKS, typically 40).
+ *
+ * error: ENAMETOOLONG, Filename too long
+ *   desc: The pathname or a component of the pathname exceeds the system =
limit
+ *     (PATH_MAX or NAME_MAX).
+ *
+ * error: ENOTDIR, Not a directory
+ *   desc: A component of the path prefix is not a directory.
+ *
+ * error: ESTALE, Stale file handle
+ *   desc: The file handle became stale during the operation (NFS). The sy=
scall
+ *     automatically retries with LOOKUP_REVAL in this case.
+ *
+ * lock: inode->i_rwsem
+ *   type: KAPI_LOCK_MUTEX
+ *   desc: The inode's read-write semaphore is acquired exclusively via in=
ode_lock()
+ *     before calling __vfs_setxattr_locked() and released via inode_unloc=
k() after.
+ *     This serializes concurrent xattr modifications on the same inode.
+ *
+ * lock: sb->s_writers (superblock freeze protection)
+ *   type: KAPI_LOCK_SEMAPHORE
+ *   desc: Write access to the mount is acquired via mnt_want_write() whic=
h calls
+ *     sb_start_write(). This prevents filesystem freeze during the operat=
ion.
+ *     Released via mnt_drop_write() after the operation completes.
+ *
+ * lock: file_rwsem (delegation breaking)
+ *   type: KAPI_LOCK_SEMAPHORE
+ *   desc: If the file has NFSv4 delegations, the percpu file_rwsem is acq=
uired
+ *     during delegation breaking in __break_lease(). The syscall may wait=
 for
+ *     delegation holders to acknowledge the break.
+ *
+ * signal: Any
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RESTART
+ *   condition: Signal arrives during interruptible waits (delegation brea=
king)
+ *   desc: The syscall may wait for NFSv4 delegation holders to release th=
eir
+ *     delegations. During this wait, signals can interrupt the operation.=
 If a
+ *     signal is pending, the wait may be interrupted and the operation re=
tried.
+ *     Most blocking points in this syscall use non-interruptible waits.
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: Kernel buffer for attribute value
+ *   desc: The attribute value is copied from userspace to a kernel buffer
+ *     allocated via vmemdup_user(). This memory is freed (kvfree) after t=
he
+ *     operation completes, regardless of success or failure.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: File's extended attributes
+ *   desc: On success, the specified extended attribute is created or modi=
fied.
+ *     The change is typically persisted to storage synchronously or async=
hronously
+ *     depending on filesystem and mount options.
+ *   reversible: yes
+ *   condition: Operation succeeds
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: Inode flags (S_NOSEC)
+ *   desc: When setting security.* attributes, the S_NOSEC flag is cleared=
 from
+ *     the inode. This flag is an optimization that indicates no security =
xattrs
+ *     exist; clearing it ensures proper security checks on subsequent acc=
esses.
+ *   condition: Setting security.* namespace attribute
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: fsnotify event
+ *   desc: On success, fsnotify_xattr() is called to notify any registered
+ *     watchers (inotify, fanotify) of the extended attribute modification.
+ *     This generates an IN_ATTRIB event.
+ *   condition: Operation succeeds
+ *
+ * state-trans: extended attribute
+ *   from: nonexistent or has old value
+ *   to: has new value
+ *   condition: Operation succeeds with flags=3D0 or appropriate flags
+ *   desc: The extended attribute transitions from not existing (or having=
 its
+ *     previous value) to containing the new value. With XATTR_CREATE, the
+ *     attribute must not exist beforehand. With XATTR_REPLACE, it must ex=
ist.
+ *
+ * capability: CAP_SYS_ADMIN
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Setting trusted.* namespace attributes and most security.* at=
tributes
+ *   without: Setting trusted.* returns EPERM. Setting security.* (except
+ *     security.capability) returns EPERM. The check uses ns_capable() aga=
inst
+ *     the filesystem's user namespace.
+ *   condition: Attribute name starts with "trusted." or "security." (exce=
pt
+ *     security.capability)
+ *
+ * capability: CAP_SETFCAP
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Setting the security.capability extended attribute
+ *   without: Setting security.capability returns EPERM
+ *   condition: Attribute name is "security.capability". Checked via
+ *     capable_wrt_inode_uidgid() which considers the inode's ownership.
+ *
+ * capability: CAP_FOWNER
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypassing owner check for user.* on sticky directories
+ *   without: Non-owners cannot set user.* attributes on files in sticky
+ *     directories without this capability
+ *   condition: Setting user.* namespace attribute on a file in a sticky d=
irectory
+ *
+ * constraint: Filesystem support
+ *   desc: The filesystem must support extended attributes (have IOP_XATTR=
 flag
+ *     set and provide xattr handlers). Common filesystems supporting xatt=
rs
+ *     include ext4, XFS, Btrfs, and tmpfs. Some filesystems (e.g., FAT, o=
lder
+ *     ext2) do not support extended attributes.
+ *
+ * constraint: Filesystem-specific size limits
+ *   desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may i=
mpose
+ *     smaller limits. For example, ext4 limits all xattrs on an inode to =
fit
+ *     in a single filesystem block (typically 4KB). XFS and ReiserFS supp=
ort
+ *     the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG.
+ *
+ * constraint: user.* namespace restrictions
+ *   desc: The user.* namespace is only supported on regular files and dir=
ectories.
+ *     Attempting to set user.* attributes on other file types (symlinks, =
devices,
+ *     sockets, FIFOs) returns EPERM (for write) or ENODATA (for read).
+ *
+ * constraint: LSM checks
+ *   desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose ad=
ditional
+ *     restrictions via security_inode_setxattr() hook. These can return v=
arious
+ *     error codes depending on the security policy. The LSM is called aft=
er
+ *     permission checks but before the actual xattr modification.
+ *
+ * examples: setxattr("/path/file", "user.comment", "test", 4, 0);  // Set=
 user attr
+ *   setxattr("/path/file", "user.new", "val", 3, XATTR_CREATE);  // Creat=
e only
+ *   setxattr("/path/file", "user.existing", "new", 3, XATTR_REPLACE);  //=
 Replace
+ *
+ * notes: Extended attributes provide a way to associate arbitrary metadat=
a with
+ *   files beyond the standard stat attributes. They are commonly used for:
+ *   - SELinux security contexts (security.selinux)
+ *   - File capabilities (security.capability)
+ *   - POSIX ACLs (system.posix_acl_access, system.posix_acl_default)
+ *   - User-defined metadata (user.* namespace)
+ *
+ *   The trusted.* namespace is designed for use by privileged processes t=
o store
+ *   data that should not be accessible to unprivileged users (e.g., during
+ *   backup/restore operations).
+ *
+ *   NFSv4 delegation support means this syscall may need to wait for remo=
te
+ *   clients to release their delegations before the operation can complet=
e.
+ *   This can introduce unbounded delays in pathological cases.
+ *
+ *   For security.capability specifically, the kernel may convert between =
v2
+ *   (non-namespaced) and v3 (namespaced) capability formats depending on =
the
+ *   filesystem's user namespace and caller's capabilities.
+ *
+ *   The setxattrat() syscall (added in Linux 6.17) provides more flexibil=
ity
+ *   with AT_FDCWD and AT_* flags for specifying the file location.
+ *
+ * since-version: 2.4
+ */
 SYSCALL_DEFINE5(setxattr, const char __user *, pathname,
 		const char __user *, name, const void __user *, value,
 		size_t, size, int, flags)
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DE182DC76E;
	Thu, 18 Dec 2025 20:42:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090575; cv=none;
 b=Ctc6PswLq8Nus8JqKE7fQwJDqbnZMRsAgN8Iodj4hwTQ+bJa1f9CDFAjETvO+Au3SrXBoQNAdgXrvITnf0fpRJHw2qYn4cuA12aX7LPnAC80ZbOQM0Tva4ZUNbF+oQiUUfUArDIwawM9ylrcJrbr/NA4oTGGo8fDTev8hhs2R34=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090575; c=relaxed/simple;
	bh=zAaMAeb+vma2V6BMJ/mtlrX8goEiJYIzNdlfNY7U8Ww=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=SVfgSq34SL71AFXXIjags5vA2jlD2bkaQf3iIv9RYdGDRaoN+rU86xaBjbVgFvy92g6SWa50NaPEGxQst+aMitVOrnnIxEPfpAe/O6jRBt8FiGDq0egwFiJUQdR0lsLQkE/U10JYZb6Jt0TyYkDIfRY5PNTIdskzOA9X3qjAI/c=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=WTJZ/lw8; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="WTJZ/lw8"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5ECCC16AAE;
	Thu, 18 Dec 2025 20:42:54 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090575;
	bh=zAaMAeb+vma2V6BMJ/mtlrX8goEiJYIzNdlfNY7U8Ww=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=WTJZ/lw8rYIFpL9b9tkPKX213FV1bCRVLL+jHd9ImLjY2+Ny0T2ZYFHQowKRf6XBp
	 C73uqLOOMexjugPzLwLBJkPBzgTW7z04Ald2jtglKkukLpruvtFYqJUTpFBO2Gvyao
	 SUq4Jk1Z3vZt+fWkvbqQQWoS2hWnsii8Miu7waxREfHIohVLsZ++d+yioq6vqcXl9G
	 aFnbnMyPVh+gbF5705BavUCXfsCkmDfJtkaomBWvoYRTJBpaqPgPB4384jnrQsx6hN
	 ZL6KDFxZ4xF1/p0xUTeIy3zLW2T1rtgDBlIZyVRxppN20iZiiWEoGGMSphDHibIppU
	 TRKMy8GlJfziw==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 10/15] kernel/api: add API specification for lsetxattr
Date: Thu, 18 Dec 2025 15:42:32 -0500
Message-ID: <20251218204239.4159453-11-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/xattr.c | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 327 insertions(+)

diff --git a/fs/xattr.c b/fs/xattr.c
index 02a946227129e..466dcaf7ba83e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -1057,6 +1057,333 @@ SYSCALL_DEFINE5(setxattr, const char __user *, path=
name,
 	return path_setxattrat(AT_FDCWD, pathname, 0, name, value, size, flags);
 }
=20
+/**
+ * sys_lsetxattr - Set an extended attribute value on a symbolic link
+ * @pathname: Path to the file or symbolic link on which to set the attrib=
ute
+ * @name: Null-terminated name of the extended attribute (includes namespa=
ce prefix)
+ * @value: Buffer containing the attribute value to set
+ * @size: Size of the value buffer in bytes
+ * @flags: Flags controlling attribute creation/replacement behavior
+ *
+ * long-desc: Sets the value of an extended attribute identified by name on
+ *   the file specified by pathname. Unlike setxattr(), this syscall does =
not
+ *   follow symbolic links - if pathname refers to a symbolic link, the
+ *   extended attribute is set on the link itself, not on the file it refe=
rs to.
+ *
+ *   Extended attributes are name:value pairs associated with inodes (file=
s,
+ *   directories, symbolic links, etc.) that extend the normal attributes
+ *   (stat data) associated with all inodes.
+ *
+ *   The attribute name must include a namespace prefix. Valid namespaces =
are:
+ *   - "user." - User-defined attributes (regular files and directories on=
ly)
+ *   - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN)
+ *   - "security." - Security module attributes (e.g., SELinux, Smack, cap=
abilities)
+ *   - "system." - System attributes (e.g., POSIX ACLs via system.posix_ac=
l_access)
+ *
+ *   The value can be arbitrary binary data or text. A zero-length value is
+ *   permitted and creates an attribute with an empty value (different from
+ *   removing the attribute).
+ *
+ *   Note that not all filesystems support extended attributes on symbolic=
 links.
+ *   Additionally, the user.* namespace is not available on symbolic links=
 since
+ *   they are not regular files or directories.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: pathname
+ *   type: KAPI_TYPE_PATH
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_USER_PATH
+ *   constraint: Must be a valid null-terminated path string in user memor=
y.
+ *     The path is resolved WITHOUT following symbolic links - if the final
+ *     component is a symbolic link, the operation applies to the link its=
elf.
+ *     Maximum path length is PATH_MAX (4096 bytes). The file or link must
+ *     exist and the caller must have appropriate permissions.
+ *
+ * param: name
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_USER_STRING
+ *   range: 1, 255
+ *   constraint: Must be a valid null-terminated string in user memory con=
taining
+ *     the extended attribute name with namespace prefix (e.g., "security.=
selinux").
+ *     The name (including prefix) must be between 1 and XATTR_NAME_MAX (2=
55)
+ *     characters. An empty name returns ERANGE. Note that user.* namespac=
e is
+ *     not supported on symbolic links.
+ *
+ * param: value
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid pointer to user memory containing the att=
ribute
+ *     value, or NULL if size is 0. When size is non-zero, the pointer mus=
t be
+ *     valid and accessible for size bytes.
+ *
+ * param: size
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, 65536
+ *   constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX
+ *     (65536 bytes). Zero is permitted and creates an attribute with empt=
y value.
+ *     Filesystem-specific limits may be smaller (e.g., ext4 limits total =
xattr
+ *     space to one filesystem block, typically 4KB).
+ *
+ * param: flags
+ *   type: KAPI_TYPE_INT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_MASK
+ *   valid-mask: XATTR_CREATE | XATTR_REPLACE
+ *   constraint: Controls creation/replacement behavior. Valid values are =
0,
+ *     XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if t=
he
+ *     attribute already exists. XATTR_REPLACE fails if the attribute does=
 not
+ *     exist. With flags=3D0, the attribute is created if it doesn't exist=
 or
+ *     replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually ex=
clusive.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: 0
+ *   desc: Returns 0 on success. The extended attribute is set with the sp=
ecified
+ *     value on the symbolic link itself. Any previous value for the attri=
bute
+ *     is replaced.
+ *
+ * error: ENOENT, File or symlink not found
+ *   desc: The file or symbolic link specified by pathname does not exist,=
 or a
+ *     directory component in the path does not exist. Returned from path =
lookup.
+ *
+ * error: EACCES, Permission denied
+ *   desc: Permission denied during path resolution (search permission on =
a directory
+ *     component) or write access to the file is denied based on DAC permi=
ssions.
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: Returned in several cases: (1) The file is marked immutable (ch=
attr +i)
+ *     or append-only (chattr +a). (2) For trusted.* namespace, caller lac=
ks
+ *     CAP_SYS_ADMIN in the filesystem's user namespace. (3) For security.*
+ *     namespace (except security.capability), caller lacks CAP_SYS_ADMIN.
+ *     (4) For user.* namespace on sticky directories, caller is not the o=
wner
+ *     and lacks CAP_FOWNER. (5) The inode has an unmapped ID in an idmapp=
ed mount.
+ *     (6) Attempting to set user.* namespace on a symbolic link (not supp=
orted).
+ *
+ * error: ENODATA, Attribute not found
+ *   desc: XATTR_REPLACE was specified but the named attribute does not ex=
ist on
+ *     the symbolic link.
+ *
+ * error: EEXIST, Attribute already exists
+ *   desc: XATTR_CREATE was specified but the named attribute already exis=
ts on
+ *     the symbolic link.
+ *
+ * error: ERANGE, Name out of range
+ *   desc: The attribute name is empty (zero length) or exceeds XATTR_NAME=
_MAX
+ *     (255 characters). Returned from import_xattr_name() via strncpy_fro=
m_user().
+ *
+ * error: E2BIG, Value too large
+ *   desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Return=
ed from
+ *     setxattr_copy() before attempting to copy the value from userspace.
+ *
+ * error: EINVAL, Invalid argument
+ *   desc: The flags parameter contains bits other than XATTR_CREATE and
+ *     XATTR_REPLACE. Also returned for malformed capability values when s=
etting
+ *     security.capability, or when the xattr name doesn't match any handl=
er prefix.
+ *
+ * error: EFAULT, Bad address
+ *   desc: One of the user pointers (pathname, name, or value) is invalid =
or
+ *     points to memory that cannot be accessed. Returned from strncpy_fro=
m_user()
+ *     for pathname/name or vmemdup_user()/copy_from_user() for value.
+ *
+ * error: ENOMEM, Out of memory
+ *   desc: Kernel could not allocate memory to copy the attribute value fr=
om
+ *     userspace (via vmemdup_user), or for namespace capability conversion
+ *     (cap_convert_nscap allocates memory for v3 capability format).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ *   desc: The filesystem does not support extended attributes on symbolic=
 links,
+ *     or no xattr handler exists for the given namespace prefix, or the h=
andler
+ *     does not implement the set operation. Many filesystems do not suppo=
rt
+ *     setting xattrs on symbolic links.
+ *
+ * error: EROFS, Read-only filesystem
+ *   desc: The filesystem containing the symbolic link is mounted read-onl=
y.
+ *     Returned from mnt_want_write() before attempting any modification.
+ *
+ * error: EIO, I/O error
+ *   desc: The inode is marked as bad (is_bad_inode), indicating filesystem
+ *     corruption or I/O failure. Also may be returned by filesystem-speci=
fic
+ *     xattr handler operations.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ *   desc: The user's disk quota for extended attributes has been exceeded.
+ *     Filesystem-specific error returned from the handler's set operation.
+ *
+ * error: ENOSPC, No space left on device
+ *   desc: The filesystem has insufficient space to store the extended att=
ribute.
+ *     Filesystem-specific error from handler's set operation.
+ *
+ * error: ELOOP, Too many symbolic links
+ *   desc: Too many symbolic links were encountered during path resolution=
 of
+ *     directory components (more than MAXSYMLINKS, typically 40). Note th=
at the
+ *     final component (the target of the operation) is not followed.
+ *
+ * error: ENAMETOOLONG, Filename too long
+ *   desc: The pathname or a component of the pathname exceeds the system =
limit
+ *     (PATH_MAX or NAME_MAX).
+ *
+ * error: ENOTDIR, Not a directory
+ *   desc: A component of the path prefix is not a directory.
+ *
+ * error: ESTALE, Stale file handle
+ *   desc: The file handle became stale during the operation (NFS). The sy=
scall
+ *     automatically retries with LOOKUP_REVAL in this case.
+ *
+ * lock: inode->i_rwsem
+ *   type: KAPI_LOCK_MUTEX
+ *   acquired: true
+ *   released: true
+ *   desc: The inode's read-write semaphore is acquired exclusively via in=
ode_lock()
+ *     before calling __vfs_setxattr_locked() and released via inode_unloc=
k() after.
+ *     This serializes concurrent xattr modifications on the same inode.
+ *
+ * lock: sb->s_writers (superblock freeze protection)
+ *   type: KAPI_LOCK_SEMAPHORE
+ *   acquired: true
+ *   released: true
+ *   desc: Write access to the mount is acquired via mnt_want_write() whic=
h calls
+ *     sb_start_write(). This prevents filesystem freeze during the operat=
ion.
+ *     Released via mnt_drop_write() after the operation completes.
+ *
+ * lock: file_rwsem (delegation breaking)
+ *   type: KAPI_LOCK_SEMAPHORE
+ *   acquired: true
+ *   released: true
+ *   desc: If the file has NFSv4 delegations, the percpu file_rwsem is acq=
uired
+ *     during delegation breaking in __break_lease(). The syscall may wait=
 for
+ *     delegation holders to acknowledge the break.
+ *
+ * signal: Any
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RESTART
+ *   condition: Signal arrives during interruptible waits (delegation brea=
king)
+ *   desc: The syscall may wait for NFSv4 delegation holders to release th=
eir
+ *     delegations. During this wait, signals can interrupt the operation.=
 If a
+ *     signal is pending, the wait may be interrupted and the operation re=
tried.
+ *     Most blocking points in this syscall use non-interruptible waits.
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: Kernel buffer for attribute value
+ *   desc: The attribute value is copied from userspace to a kernel buffer
+ *     allocated via vmemdup_user(). This memory is freed (kvfree) after t=
he
+ *     operation completes, regardless of success or failure.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: Symbolic link's extended attributes
+ *   desc: On success, the specified extended attribute is created or modi=
fied
+ *     on the symbolic link itself. The change is typically persisted to s=
torage
+ *     synchronously or asynchronously depending on filesystem and mount o=
ptions.
+ *   reversible: yes
+ *   condition: Operation succeeds
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: Inode flags (S_NOSEC)
+ *   desc: When setting security.* attributes, the S_NOSEC flag is cleared=
 from
+ *     the inode. This flag is an optimization that indicates no security =
xattrs
+ *     exist; clearing it ensures proper security checks on subsequent acc=
esses.
+ *   condition: Setting security.* namespace attribute
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: fsnotify event
+ *   desc: On success, fsnotify_xattr() is called to notify any registered
+ *     watchers (inotify, fanotify) of the extended attribute modification.
+ *     This generates an IN_ATTRIB event.
+ *   condition: Operation succeeds
+ *
+ * state-trans: extended attribute
+ *   from: nonexistent or has old value
+ *   to: has new value
+ *   condition: Operation succeeds with flags=3D0 or appropriate flags
+ *   desc: The extended attribute on the symbolic link transitions from not
+ *     existing (or having its previous value) to containing the new value.
+ *     With XATTR_CREATE, the attribute must not exist beforehand. With
+ *     XATTR_REPLACE, it must exist.
+ *
+ * capability: CAP_SYS_ADMIN
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Setting trusted.* namespace attributes and most security.* at=
tributes
+ *   without: Setting trusted.* returns EPERM. Setting security.* (except
+ *     security.capability) returns EPERM. The check uses ns_capable() aga=
inst
+ *     the filesystem's user namespace.
+ *   condition: Attribute name starts with "trusted." or "security." (exce=
pt
+ *     security.capability)
+ *
+ * capability: CAP_SETFCAP
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Setting the security.capability extended attribute
+ *   without: Setting security.capability returns EPERM
+ *   condition: Attribute name is "security.capability". Checked via
+ *     capable_wrt_inode_uidgid() which considers the inode's ownership.
+ *
+ * capability: CAP_FOWNER
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypassing owner check for user.* on sticky directories
+ *   without: Non-owners cannot set user.* attributes on files in sticky
+ *     directories without this capability
+ *   condition: Setting user.* namespace attribute on a file in a sticky d=
irectory
+ *
+ * constraint: Filesystem support for symlinks
+ *   desc: Not all filesystems support extended attributes on symbolic lin=
ks.
+ *     Some filesystems (like ext4) may only support certain xattr namespa=
ces
+ *     on symlinks. The user.* namespace is explicitly not supported on sy=
mbolic
+ *     links since they are not regular files or directories.
+ *
+ * constraint: Filesystem-specific size limits
+ *   desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may i=
mpose
+ *     smaller limits. For example, ext4 limits all xattrs on an inode to =
fit
+ *     in a single filesystem block (typically 4KB). XFS and ReiserFS supp=
ort
+ *     the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG.
+ *
+ * constraint: user.* namespace restrictions on symlinks
+ *   desc: The user.* namespace is only supported on regular files and dir=
ectories.
+ *     Attempting to set user.* attributes on symbolic links returns EPERM.
+ *     This is because user.* xattrs have permission semantics that don't =
apply
+ *     to symbolic links which anyone can follow.
+ *
+ * constraint: LSM checks
+ *   desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose ad=
ditional
+ *     restrictions via security_inode_setxattr() hook. These can return v=
arious
+ *     error codes depending on the security policy. The LSM is called aft=
er
+ *     permission checks but before the actual xattr modification.
+ *
+ * examples: lsetxattr("/path/symlink", "security.selinux", ctx, len, 0); =
 // Set SELinux context on link
+ *   lsetxattr("/path/symlink", "trusted.overlay.opaque", "y", 1, XATTR_CR=
EATE);  // Set overlay attr
+ *
+ * notes: This syscall is primarily used for security labeling of symbolic=
 links
+ *   themselves (as opposed to their targets). Common use cases include:
+ *   - SELinux security contexts on symbolic links (security.selinux)
+ *   - Overlay filesystem metadata (trusted.overlay.*)
+ *   - IMA/EVM integrity metadata (security.ima, security.evm)
+ *
+ *   Unlike regular files and directories, symbolic links do not support t=
he
+ *   user.* xattr namespace. This is because user.* xattrs require ownersh=
ip
+ *   or capability checks that don't make sense for symlinks which can be
+ *   followed by anyone with directory access.
+ *
+ *   The trusted.* namespace on symbolic links requires CAP_SYS_ADMIN and =
is
+ *   commonly used by overlay filesystems to store metadata about redirect=
ed
+ *   or opaque directories.
+ *
+ *   NFSv4 delegation support means this syscall may need to wait for remo=
te
+ *   clients to release their delegations before the operation can complet=
e.
+ *
+ *   This syscall was introduced alongside setxattr(), fsetxattr(), and the
+ *   corresponding get/list/remove variants in Linux 2.4 to provide the
+ *   non-following behavior needed for backup/restore tools and security
+ *   labeling of links.
+ *
+ * since-version: 2.4
+ */
 SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname,
 		const char __user *, name, const void __user *, value,
 		size_t, size, int, flags)
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 567AE2DF15C;
	Thu, 18 Dec 2025 20:42:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090576; cv=none;
 b=EAXcYqbd7TTTSbCD5Vo6cqQ3sHu6m65wZ3sZL4IlbUEBVYDKBZwN01WM/VH9SWcxIQ8H+GMtY4P3TdtYIKtN3ecO0M604AjXEy173Z6cgufyRpVGneKKe1RTck9HyLIS4uiLrnfg6vGPEw1Vi/5KRdw3F2Vc791dubFVJmz2Sik=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090576; c=relaxed/simple;
	bh=UressouWWb0CVfcr/8Lw1heICtBbKVFNZZUC5WY7CNk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=UHV5vGNEfITPxSwJz+1KFWeMWtUITorV81tVWuVTyiOUSPdRax17iCgjKabx9rzApnyzUSyUsDxrJqvJ2miLy2RiG1q64eWDu/RYvBonAdgwWdued/7QnQAII7XbVuAE47STtnSNR++xgc0L/+gvoO2P5nYmCIDaMRUvVEdjPmM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=Ti1XsvOl; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="Ti1XsvOl"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FFCDC4CEFB;
	Thu, 18 Dec 2025 20:42:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090576;
	bh=UressouWWb0CVfcr/8Lw1heICtBbKVFNZZUC5WY7CNk=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=Ti1XsvOlWeDwUMnXGbMLW56pQkKeKOzYCzBZaiHxuN4MeK+7os/YgKuBOm+jHZVfP
	 b3ipD28uEzsWLzSN1qpUuE+SXqG5SU5Y/4ck80cW+uG6ZL2l3IqUM+GfXOdbaaJFY0
	 ded6pi9w7cr1FYRE9LNVVkBXaYxFYWdkcLNk8iP23JqA7h5I7MB1uy5B3NXPxJHnrA
	 QRqDysT3JjDjo8WIvaiO5Y+qcrbGYNZdw1lo1TwNvl0yL66GRYC0n2cqimDV5k7uTC
	 8zQO54JwFjOzUrGrfOi0pvouPlitVCwW3lWJnY8VDClQNUTFLcsBRX5PFShpjq6ScX
	 sXQ0f6xJKY77w==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 11/15] kernel/api: add API specification for fsetxattr
Date: Thu, 18 Dec 2025 15:42:33 -0500
Message-ID: <20251218204239.4159453-12-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/xattr.c | 322 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 322 insertions(+)

diff --git a/fs/xattr.c b/fs/xattr.c
index 466dcaf7ba83e..8a27c11905f7e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -1392,6 +1392,328 @@ SYSCALL_DEFINE5(lsetxattr, const char __user *, pat=
hname,
 			       value, size, flags);
 }
=20
+/**
+ * sys_fsetxattr - Set an extended attribute value on an open file descrip=
tor
+ * @fd: File descriptor of the file on which to set the extended attribute
+ * @name: Null-terminated name of the extended attribute (includes namespa=
ce prefix)
+ * @value: Buffer containing the attribute value to set
+ * @size: Size of the value buffer in bytes
+ * @flags: Flags controlling attribute creation/replacement behavior
+ *
+ * long-desc: Sets the value of an extended attribute identified by name on
+ *   the file referred to by the open file descriptor fd. Extended attribu=
tes
+ *   are name:value pairs associated with inodes (files, directories, symb=
olic
+ *   links, etc.) that extend the normal attributes (stat data) associated=
 with
+ *   all inodes.
+ *
+ *   This syscall is similar to setxattr() but operates on an already-open=
 file
+ *   descriptor rather than a pathname. This is useful when the file is al=
ready
+ *   open, when the caller wants to avoid race conditions between opening =
and
+ *   setting attributes, or when operating on file descriptors that cannot=
 be
+ *   easily reopened.
+ *
+ *   The attribute name must include a namespace prefix. Valid namespaces =
are:
+ *   - "user." - User-defined attributes (regular files and directories on=
ly)
+ *   - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN)
+ *   - "security." - Security module attributes (e.g., SELinux, Smack, cap=
abilities)
+ *   - "system." - System attributes (e.g., POSIX ACLs via system.posix_ac=
l_access)
+ *
+ *   The value can be arbitrary binary data or text. A zero-length value is
+ *   permitted and creates an attribute with an empty value (different from
+ *   removing the attribute).
+ *
+ *   The file descriptor must have been opened for writing to modify exten=
ded
+ *   attributes. The file descriptor cannot be an O_PATH file descriptor.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ *   type: KAPI_TYPE_FD
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid file descriptor returned by open(), creat=
(),
+ *     or similar syscalls. The file descriptor cannot be an O_PATH file
+ *     descriptor. The file must be on a filesystem that is not mounted
+ *     read-only. AT_FDCWD (-100) is NOT valid for this syscall as it oper=
ates
+ *     on file descriptors, not directory handles.
+ *
+ * param: name
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_USER_STRING
+ *   range: 1, 255
+ *   constraint: Must be a valid null-terminated string in user memory con=
taining
+ *     the extended attribute name with namespace prefix (e.g., "user.myat=
tr").
+ *     The name (including prefix) must be between 1 and XATTR_NAME_MAX (2=
55)
+ *     characters. An empty name returns ERANGE.
+ *
+ * param: value
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must be a valid pointer to user memory containing the att=
ribute
+ *     value, or NULL if size is 0. When size is non-zero, the pointer mus=
t be
+ *     valid and accessible for size bytes.
+ *
+ * param: size
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, 65536
+ *   constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX
+ *     (65536 bytes). Zero is permitted and creates an attribute with empt=
y value.
+ *     Filesystem-specific limits may be smaller (e.g., ext4 limits total =
xattr
+ *     space to one filesystem block, typically 4KB).
+ *
+ * param: flags
+ *   type: KAPI_TYPE_INT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_MASK
+ *   valid-mask: XATTR_CREATE | XATTR_REPLACE
+ *   constraint: Controls creation/replacement behavior. Valid values are =
0,
+ *     XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if t=
he
+ *     attribute already exists. XATTR_REPLACE fails if the attribute does=
 not
+ *     exist. With flags=3D0, the attribute is created if it doesn't exist=
 or
+ *     replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually ex=
clusive.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_ERROR_CHECK
+ *   success: 0
+ *   desc: Returns 0 on success. The extended attribute is set with the sp=
ecified
+ *     value. Any previous value for the attribute is replaced.
+ *
+ * error: EBADF, Bad file descriptor
+ *   desc: The file descriptor fd is not valid or is not open for writing.=
 This
+ *     is returned from the fd class lookup when the file descriptor does =
not
+ *     refer to an open file.
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: Returned when: (1) file is immutable or append-only, (2) truste=
d.*
+ *     without CAP_SYS_ADMIN, (3) security.* (except capability) without
+ *     CAP_SYS_ADMIN, (4) user.* on sticky dir without ownership/CAP_FOWNE=
R,
+ *     (5) unmapped ID in idmapped mount, (6) user.* on non-regular/non-di=
r.
+ *
+ * error: ENODATA, Attribute not found
+ *   desc: XATTR_REPLACE was specified but the named attribute does not ex=
ist on
+ *     the file. Also returned when reading trusted.* without CAP_SYS_ADMI=
N.
+ *
+ * error: EEXIST, Attribute already exists
+ *   desc: XATTR_CREATE was specified but the named attribute already exis=
ts on
+ *     the file.
+ *
+ * error: ERANGE, Name out of range
+ *   desc: The attribute name is empty (zero length) or exceeds XATTR_NAME=
_MAX
+ *     (255 characters). Returned from import_xattr_name() via strncpy_fro=
m_user().
+ *
+ * error: E2BIG, Value too large
+ *   desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Return=
ed from
+ *     setxattr_copy() before attempting to copy the value from userspace.
+ *
+ * error: EINVAL, Invalid argument
+ *   desc: The flags parameter contains bits other than XATTR_CREATE and
+ *     XATTR_REPLACE. Also returned for malformed capability values when s=
etting
+ *     security.capability (invalid header format, invalid rootid mapping)=
, or
+ *     when the xattr name doesn't match any handler prefix.
+ *
+ * error: EFAULT, Bad address
+ *   desc: One of the user pointers (name or value) is invalid or points to
+ *     memory that cannot be accessed. Returned from strncpy_from_user() f=
or
+ *     name or vmemdup_user()/copy_from_user() for value.
+ *
+ * error: ENOMEM, Out of memory
+ *   desc: Kernel could not allocate memory to copy the attribute value fr=
om
+ *     userspace (via vmemdup_user), or for namespace capability conversion
+ *     (cap_convert_nscap allocates memory for v3 capability format).
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ *   desc: The filesystem does not support extended attributes (IOP_XATTR =
not set),
+ *     or no xattr handler exists for the given namespace prefix, or the h=
andler
+ *     does not implement the set operation. Also returned for POSIX ACL x=
attrs
+ *     (system.posix_acl_*) when CONFIG_FS_POSIX_ACL is disabled.
+ *
+ * error: EROFS, Read-only filesystem
+ *   desc: The filesystem containing the file is mounted read-only. Return=
ed from
+ *     mnt_want_write_file() before attempting any modification.
+ *
+ * error: EIO, I/O error
+ *   desc: The inode is marked as bad (is_bad_inode), indicating filesystem
+ *     corruption or I/O failure. Also may be returned by filesystem-speci=
fic
+ *     xattr handler operations.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ *   desc: The user's disk quota for extended attributes has been exceeded.
+ *     Filesystem-specific error returned from the handler's set operation.
+ *
+ * error: ENOSPC, No space left on device
+ *   desc: The filesystem has insufficient space to store the extended att=
ribute.
+ *     Filesystem-specific error from handler's set operation.
+ *
+ * error: EACCES, Permission denied
+ *   desc: Write access to the file is denied based on DAC permissions. Th=
e caller
+ *     does not have appropriate permission to modify xattrs on this file.
+ *
+ * lock: inode->i_rwsem
+ *   type: KAPI_LOCK_MUTEX
+ *   acquired: true
+ *   released: true
+ *   desc: The inode's read-write semaphore is acquired exclusively via in=
ode_lock()
+ *     before calling __vfs_setxattr_locked() and released via inode_unloc=
k() after.
+ *     This serializes concurrent xattr modifications on the same inode.
+ *
+ * lock: sb->s_writers (superblock freeze protection)
+ *   type: KAPI_LOCK_SEMAPHORE
+ *   acquired: true
+ *   released: true
+ *   desc: Write access to the mount is acquired via mnt_want_write_file()=
 which
+ *     calls sb_start_write(). This prevents filesystem freeze during the =
operation.
+ *     Released via mnt_drop_write_file() after the operation completes.
+ *
+ * lock: file_rwsem (delegation breaking)
+ *   type: KAPI_LOCK_SEMAPHORE
+ *   acquired: true
+ *   released: true
+ *   desc: If the file has NFSv4 delegations, the percpu file_rwsem is acq=
uired
+ *     during delegation breaking in __break_lease(). The syscall may wait=
 for
+ *     delegation holders to acknowledge the break.
+ *
+ * signal: Any
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RESTART
+ *   condition: Signal arrives during interruptible wait for delegation br=
eaking
+ *   desc: The syscall may wait for NFSv4 delegation holders to release th=
eir
+ *     delegations via wait_event_interruptible_timeout() in __break_lease=
().
+ *     During this wait, signals can interrupt the operation. If a signal =
is
+ *     pending, the wait is interrupted and the operation may be retried by
+ *     the kernel automatically if the signal disposition allows (SA_RESTA=
RT).
+ *   error: -ERESTARTSYS
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_ALLOC_MEMORY
+ *   target: Kernel buffer for attribute value
+ *   desc: The attribute value is copied from userspace to a kernel buffer
+ *     allocated via vmemdup_user(). This memory is freed (kvfree) after t=
he
+ *     operation completes, regardless of success or failure.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: File's extended attributes
+ *   desc: On success, the specified extended attribute is created or modi=
fied.
+ *     The change is typically persisted to storage synchronously or async=
hronously
+ *     depending on filesystem and mount options.
+ *   reversible: yes
+ *   condition: Operation succeeds
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: Inode flags (S_NOSEC)
+ *   desc: When setting security.* attributes, the S_NOSEC flag is cleared=
 from
+ *     the inode. This flag is an optimization that indicates no security =
xattrs
+ *     exist; clearing it ensures proper security checks on subsequent acc=
esses.
+ *   condition: Setting security.* namespace attribute
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: fsnotify event
+ *   desc: On success, fsnotify_xattr() is called to notify any registered
+ *     watchers (inotify, fanotify) of the extended attribute modification.
+ *     This generates an IN_ATTRIB event.
+ *   condition: Operation succeeds
+ *
+ * state-trans: extended attribute
+ *   from: nonexistent or has old value
+ *   to: has new value
+ *   condition: Operation succeeds with flags=3D0 or appropriate flags
+ *   desc: The extended attribute transitions from not existing (or having=
 its
+ *     previous value) to containing the new value. With XATTR_CREATE, the
+ *     attribute must not exist beforehand. With XATTR_REPLACE, it must ex=
ist.
+ *
+ * capability: CAP_SYS_ADMIN
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Setting trusted.* namespace attributes and most security.* at=
tributes
+ *   without: Setting trusted.* returns EPERM. Setting security.* (except
+ *     security.capability) returns EPERM. The check uses ns_capable() aga=
inst
+ *     the filesystem's user namespace.
+ *   condition: Attribute name starts with "trusted." or "security." (exce=
pt
+ *     security.capability)
+ *
+ * capability: CAP_SETFCAP
+ *   type: KAPI_CAP_GRANT_PERMISSION
+ *   allows: Setting the security.capability extended attribute
+ *   without: Setting security.capability returns EPERM
+ *   condition: Attribute name is "security.capability". Checked via
+ *     capable_wrt_inode_uidgid() which considers the inode's ownership.
+ *
+ * capability: CAP_FOWNER
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypassing owner check for user.* on sticky directories
+ *   without: Non-owners cannot set user.* attributes on files in sticky
+ *     directories without this capability
+ *   condition: Setting user.* namespace attribute on a file in a sticky d=
irectory
+ *
+ * constraint: Filesystem support
+ *   desc: The filesystem must support extended attributes (have IOP_XATTR=
 flag
+ *     set and provide xattr handlers). Common filesystems supporting xatt=
rs
+ *     include ext4, XFS, Btrfs, and tmpfs. Some filesystems (e.g., FAT, o=
lder
+ *     ext2) do not support extended attributes.
+ *
+ * constraint: Filesystem-specific size limits
+ *   desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may i=
mpose
+ *     smaller limits. For example, ext4 limits all xattrs on an inode to =
fit
+ *     in a single filesystem block (typically 4KB). XFS and ReiserFS supp=
ort
+ *     the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG.
+ *
+ * constraint: user.* namespace restrictions
+ *   desc: The user.* namespace is only supported on regular files and dir=
ectories.
+ *     Attempting to set user.* attributes on other file types (symlinks, =
devices,
+ *     sockets, FIFOs) returns EPERM (for write) or ENODATA (for read).
+ *
+ * constraint: LSM checks
+ *   desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose ad=
ditional
+ *     restrictions via security_inode_setxattr() hook. These can return v=
arious
+ *     error codes depending on the security policy. The LSM is called aft=
er
+ *     permission checks but before the actual xattr modification.
+ *
+ * constraint: File descriptor must not be O_PATH
+ *   desc: The file descriptor must be a regular file descriptor, not one =
opened
+ *     with O_PATH. O_PATH file descriptors do not provide access to the f=
ile
+ *     contents or metadata modification operations.
+ *
+ * examples: fsetxattr(fd, "user.comment", "test", 4, 0);  // Set user attr
+ *   fsetxattr(fd, "user.new", "val", 3, XATTR_CREATE);  // Create only, f=
ail if exists
+ *   fsetxattr(fd, "user.existing", "new", 3, XATTR_REPLACE);  // Replace =
only
+ *   fsetxattr(fd, "user.empty", "", 0, 0);  // Create attribute with empt=
y value
+ *
+ * notes: Extended attributes provide a way to associate arbitrary metadat=
a with
+ *   files beyond the standard stat attributes. They are commonly used for:
+ *   - SELinux security contexts (security.selinux)
+ *   - File capabilities (security.capability)
+ *   - POSIX ACLs (system.posix_acl_access, system.posix_acl_default)
+ *   - User-defined metadata (user.* namespace)
+ *
+ *   Using fsetxattr() with an already-open file descriptor avoids potenti=
al
+ *   TOCTOU (time-of-check-time-of-use) race conditions that can occur when
+ *   using setxattr() with a pathname, where the file might be replaced be=
tween
+ *   opening and setting the attribute.
+ *
+ *   The trusted.* namespace is designed for use by privileged processes t=
o store
+ *   data that should not be accessible to unprivileged users (e.g., during
+ *   backup/restore operations).
+ *
+ *   NFSv4 delegation support means this syscall may need to wait for remo=
te
+ *   clients to release their delegations before the operation can complet=
e.
+ *   This can introduce unbounded delays in pathological cases.
+ *
+ *   For security.capability specifically, the kernel may convert between =
v2
+ *   (non-namespaced) and v3 (namespaced) capability formats depending on =
the
+ *   filesystem's user namespace and caller's capabilities.
+ *
+ *   Unlike setxattr() and lsetxattr(), fsetxattr() does not involve path
+ *   resolution, so errors related to path traversal (ENOENT, ENOTDIR,
+ *   ENAMETOOLONG, ELOOP, ESTALE) are not possible.
+ *
+ * since-version: 2.4
+ */
 SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 		const void __user *,value, size_t, size, int, flags)
 {
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40C932F8BC3;
	Thu, 18 Dec 2025 20:42:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090577; cv=none;
 b=Q0l2yfI0AyfXnQx7ro0linkunAGeKs+y/CS5MEUsBGHY25/sXq/2S9Tyw1RkQkoiqWCc9K7F++XnQ9rqhFO59e7/O17B+rlDDck0UlxBgHR7FpM8RHnpfUl28HnH96LYK3nRi97q0vV84fHhdjNeqXjTemKO8kPHC1V3S8VseFE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090577; c=relaxed/simple;
	bh=BM0Tp6/hpETGMj8bjOuQYX6DLQT32oEdSKx7sQoel2s=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=NzQOnAFVMfQu0Iacq/+KfrIXJgx7Hhdy+3qHpzm+Qoib7kFo25mzPmCflo7vD14bWavGvgeSq9jE0h6eZ8B3nOGf5iSuEf5yRWqCmWL6Sfpbbwi94wE6JBvlFhue35al/zL2Fs2e3tmnTBrnTmtEyXRChUq3sGQ+q/ILln3Jr5A=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=Y1hfNbMR; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="Y1hfNbMR"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 797B1C19421;
	Thu, 18 Dec 2025 20:42:56 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090577;
	bh=BM0Tp6/hpETGMj8bjOuQYX6DLQT32oEdSKx7sQoel2s=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=Y1hfNbMRRXTJ9MDb15OkxqLB90RWP/BXKiBbTCybYy8o69bnOKc5T+zJbaAE9nKQv
	 FBhmc2IY52Qtpl4yYh4jkIwL486Ushx27gjTLJeHFbGVLbXckXUGFkqOrgc0yl2w/U
	 C6w+ixpfjcg9sxW0WZnZnFfp3yslcBEZflxpZE34quuELTRDN4uxH5MB6a0fSWOSdm
	 MbqtoCIu7NL66WThH8F7NrXgpQHeYI0yMiI9KesttRAIFLoHFFFIzI1y1RsPCjc5dh
	 cECeGN+76Tu1Gz+W26ETkONUtZxdWjjuAm+te+AC6n4288ezMFQAryxhNSrWJ5MYcR
	 mZU6RqnGKFAOw==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 12/15] kernel/api: add API specification for sys_open
Date: Thu, 18 Dec 2025 15:42:34 -0500
Message-ID: <20251218204239.4159453-13-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/open.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 318 insertions(+)

diff --git a/fs/open.c b/fs/open.c
index f328622061c56..343e6d3798ec3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1437,6 +1437,324 @@ int do_sys_open(int dfd, const char __user *filenam=
e, int flags, umode_t mode)
 }
=20
=20
+/**
+ * sys_open - Open or create a file
+ * @filename: Pathname of the file to open or create
+ * @flags: File access mode and behavior flags (O_RDONLY, O_WRONLY, O_RDWR=
, etc.)
+ * @mode: File permission bits for newly created files (only with O_CREAT/=
O_TMPFILE)
+ *
+ * long-desc: Opens the file specified by pathname. If O_CREAT or O_TMPFIL=
E is
+ *   specified in flags, the file is created if it does not exist; its mod=
e is
+ *   set according to the mode parameter modified by the process's umask.
+ *
+ *   The flags argument must include one of the following access modes: O_=
RDONLY
+ *   (read-only), O_WRONLY (write-only), or O_RDWR (read/write). These are=
 the
+ *   low-order two bits of flags. In addition, zero or more file creation =
and
+ *   file status flags can be bitwise-ORed in flags.
+ *
+ *   File creation flags: O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC, O_DIRECTORY,
+ *   O_NOFOLLOW, O_CLOEXEC, O_TMPFILE. These flags affect open behavior.
+ *
+ *   File status flags: O_APPEND, O_ASYNC, O_DIRECT, O_DSYNC, O_LARGEFILE,
+ *   O_NOATIME, O_NONBLOCK (O_NDELAY), O_PATH, O_SYNC. These become part o=
f the
+ *   file's open file description and can be retrieved/modified with fcntl=
().
+ *
+ *   The return value is a file descriptor, a small nonnegative integer us=
ed in
+ *   subsequent system calls (read, write, lseek, fcntl, etc.) to refer to=
 the
+ *   open file. The file descriptor returned by a successful open is the l=
owest-
+ *   numbered file descriptor not currently open for the process.
+ *
+ *   On 64-bit systems, O_LARGEFILE is automatically added to the flags. O=
n 32-bit
+ *   systems, files larger than 2GB require O_LARGEFILE to be explicitly s=
et.
+ *
+ *   This syscall is a legacy interface. Modern code should prefer openat(=
) for
+ *   relative path operations and openat2() for additional control via res=
olve
+ *   flags. The open() call is equivalent to openat(AT_FDCWD, pathname, fl=
ags).
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: filename
+ *   type: KAPI_TYPE_PATH
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_USER_PATH
+ *   constraint: Must be a valid null-terminated path string in user memor=
y.
+ *     Maximum path length is PATH_MAX (4096 bytes) including null termina=
tor.
+ *     For relative paths, resolution starts from current working director=
y.
+ *     The path is followed (symlinks resolved) unless O_NOFOLLOW is speci=
fied.
+ *
+ * param: flags
+ *   type: KAPI_TYPE_INT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_MASK
+ *   valid-mask: O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTT=
Y |
+ *               O_TRUNC | O_APPEND | O_NONBLOCK | O_DSYNC | O_SYNC | FASY=
NC |
+ *               O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | O_NOA=
TIME |
+ *               O_CLOEXEC | O_PATH | O_TMPFILE
+ *   constraint: Must include exactly one of O_RDONLY (0), O_WRONLY (1), or
+ *     O_RDWR (2) as the access mode. Additional flags may be ORed. Invali=
d flag
+ *     combinations (e.g., O_DIRECTORY|O_CREAT, O_PATH with incompatible f=
lags,
+ *     O_TMPFILE without O_DIRECTORY, O_TMPFILE with read-only mode) return
+ *     EINVAL. Unknown flags are silently ignored for backward compatibili=
ty
+ *     (unlike openat2 which rejects them).
+ *
+ * param: mode
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_MASK
+ *   valid-mask: S_ISUID | S_ISGID | S_ISVTX | S_IRWXU | S_IRWXG | S_IRWXO
+ *   constraint: Only meaningful when O_CREAT or O_TMPFILE is specified in
+ *     flags. Specifies the file mode bits (permissions and setuid/setgid/=
sticky
+ *     bits) for a newly created file. The effective mode is (mode & ~umas=
k).
+ *     When O_CREAT/O_TMPFILE is not set, mode is ignored. Mode values exc=
eeding
+ *     S_IALLUGO (07777) are masked off.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_FD
+ *   success: >=3D 0
+ *   desc: On success, returns a new file descriptor (non-negative integer=
).
+ *     The returned file descriptor is the lowest-numbered descriptor not
+ *     currently open for the process. On error, returns -1 and errno is s=
et.
+ *
+ * error: EACCES, Permission denied
+ *   desc: The requested access to the file is not allowed, or search perm=
ission
+ *     is denied for one of the directories in the path prefix of pathname=
, or
+ *     the file did not exist yet and write access to the parent directory=
 is
+ *     not allowed, or O_TRUNC is specified but write permission is denied=
, or
+ *     the file is on a filesystem mounted with noexec and MAY_EXEC was im=
plied.
+ *
+ * error: EBUSY, Device or resource busy
+ *   desc: O_EXCL was specified in flags and pathname refers to a block de=
vice
+ *     that is in use by the system (e.g., it is mounted).
+ *
+ * error: EDQUOT, Disk quota exceeded
+ *   desc: O_CREAT is specified and the file does not exist, and the user'=
s quota
+ *     of disk blocks or inodes on the filesystem has been exhausted.
+ *
+ * error: EEXIST, File exists
+ *   desc: O_CREAT and O_EXCL were specified in flags, but pathname alread=
y exists.
+ *     This error is atomic with respect to file creation - it prevents ra=
ce
+ *     conditions (TOCTOU) when creating files.
+ *
+ * error: EFAULT, Bad address
+ *   desc: pathname points outside the process's accessible address space.
+ *
+ * error: EINTR, Interrupted system call
+ *   desc: The call was interrupted by a signal handler before completing =
file
+ *     open. This can occur during lock acquisition or when breaking lease=
s.
+ *
+ * error: EINVAL, Invalid argument
+ *   desc: Returned for several conditions: (1) Invalid O_* flag combinati=
ons
+ *     (O_DIRECTORY|O_CREAT, O_TMPFILE without O_DIRECTORY, O_TMPFILE with
+ *     read-only access, O_PATH with flags other than O_DIRECTORY|O_NOFOLL=
OW|
+ *     O_CLOEXEC). (2) mode contains bits outside S_IALLUGO when O_CREAT/O=
_TMPFILE
+ *     is set (openat2 only). (3) O_DIRECT requested but filesystem doesn't
+ *     support it. (4) The filesystem does not support O_SYNC or O_DSYNC.
+ *
+ * error: EISDIR, Is a directory
+ *   desc: pathname refers to a directory and the access requested involved
+ *     writing (O_WRONLY, O_RDWR, or O_TRUNC). Also returned when O_TMPFIL=
E is
+ *     used on a directory that doesn't support tmpfile operations.
+ *
+ * error: ELOOP, Too many symbolic links
+ *   desc: Too many symbolic links were encountered in resolving pathname,=
 or
+ *     O_NOFOLLOW was specified but pathname refers to a symbolic link.
+ *
+ * error: EMFILE, Too many open files
+ *   desc: The per-process limit on the number of open file descriptors ha=
s been
+ *     reached. This limit is RLIMIT_NOFILE (default typically 1024, max s=
et by
+ *     /proc/sys/fs/nr_open).
+ *
+ * error: ENAMETOOLONG, File name too long
+ *   desc: pathname was too long, exceeding PATH_MAX (4096) bytes, or a si=
ngle
+ *     path component exceeded NAME_MAX (usually 255) bytes.
+ *
+ * error: ENFILE, Too many open files in system
+ *   desc: The system-wide limit on the total number of open files has been
+ *     reached (/proc/sys/fs/file-max). Processes with CAP_SYS_ADMIN can e=
xceed
+ *     this limit.
+ *
+ * error: ENODEV, No such device
+ *   desc: pathname refers to a special file that has no corresponding dev=
ice, or
+ *     the file's inode has no file operations assigned.
+ *
+ * error: ENOENT, No such file or directory
+ *   desc: A directory component in pathname does not exist or is a dangli=
ng
+ *     symbolic link, or O_CREAT is not set and the named file does not ex=
ist,
+ *     or pathname is an empty string (unless AT_EMPTY_PATH is used with o=
penat2).
+ *
+ * error: ENOMEM, Out of memory
+ *   desc: The kernel could not allocate sufficient memory for the file st=
ructure,
+ *     path lookup structures, or the filename buffer.
+ *
+ * error: ENOSPC, No space left on device
+ *   desc: O_CREAT was specified and the file does not exist, and the dire=
ctory
+ *     or filesystem containing the file has no room for a new file entry.
+ *
+ * error: ENOTDIR, Not a directory
+ *   desc: A component used as a directory in pathname is not actually a d=
irectory,
+ *     or O_DIRECTORY was specified and pathname was not a directory.
+ *
+ * error: ENXIO, No such device or address
+ *   desc: O_NONBLOCK | O_WRONLY is set and the named file is a FIFO and no
+ *     process has the FIFO open for reading. Also returned when opening a=
 device
+ *     special file that does not exist.
+ *
+ * error: EOPNOTSUPP, Operation not supported
+ *   desc: The filesystem containing pathname does not support O_TMPFILE.
+ *
+ * error: EOVERFLOW, Value too large for defined data type
+ *   desc: pathname refers to a regular file that is too large to be opene=
d.
+ *     This occurs on 32-bit systems without O_LARGEFILE when the file size
+ *     exceeds 2GB (2^31 - 1 bytes).
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: O_NOATIME flag was specified but the effective UID of the calle=
r did
+ *     not match the owner of the file and the caller is not privileged, o=
r the
+ *     file is append-only and O_TRUNC was specified or write mode without
+ *     O_APPEND, or the file is immutable, or a seal prevents the operatio=
n.
+ *
+ * error: EROFS, Read-only file system
+ *   desc: pathname refers to a file on a read-only filesystem and write a=
ccess
+ *     was requested.
+ *
+ * error: ETXTBSY, Text file busy
+ *   desc: pathname refers to an executable image which is currently being
+ *     executed, or to a swap file, and write access or truncation was req=
uested.
+ *
+ * error: EWOULDBLOCK, Resource temporarily unavailable
+ *   desc: O_NONBLOCK was specified and an incompatible lease is held on t=
he file.
+ *
+ * lock: files->file_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   acquired: true
+ *   released: true
+ *   desc: Acquired when allocating a file descriptor slot. Held briefly d=
uring
+ *     fd allocation via alloc_fd() and released before the syscall return=
s.
+ *
+ * lock: inode->i_rwsem (parent directory)
+ *   type: KAPI_LOCK_RWLOCK
+ *   acquired: conditional
+ *   released: true
+ *   desc: Write lock acquired on parent directory inode when creating a n=
ew file
+ *     (O_CREAT). Acquired via inode_lock_nested() in lookup path. May use
+ *     killable variant which can return EINTR on fatal signal.
+ *
+ * lock: RCU read-side
+ *   type: KAPI_LOCK_RCU
+ *   acquired: true
+ *   released: true
+ *   desc: Path lookup uses RCU mode initially for performance. If RCU loo=
kup
+ *     fails (returns -ECHILD), falls back to reference-based lookup.
+ *
+ * signal: Any signal
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RETURN
+ *   condition: When blocked on interruptible or killable operations
+ *   desc: The syscall may be interrupted during path lookup, lock acquisi=
tion,
+ *     or lease breaking. Fatal signals (SIGKILL, etc.) will interrupt kil=
lable
+ *     operations. Non-fatal signals may interrupt interruptible operation=
s.
+ *   error: -EINTR
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY
+ *   target: file descriptor, file structure, dentry cache
+ *   desc: Allocates a new file descriptor in the process's fd table. Allo=
cates
+ *     a struct file from the filp slab cache. May allocate dentries and i=
nodes
+ *     during path lookup. System-wide file count (nr_files) is incremente=
d.
+ *   reversible: yes
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: filesystem, inode
+ *   condition: When O_CREAT is specified and file doesn't exist
+ *   desc: Creates a new file on the filesystem. Creates new inode, alloca=
tes
+ *     data blocks as needed, and creates directory entry. Updates parent
+ *     directory mtime and ctime.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: file content
+ *   condition: When O_TRUNC is specified for existing file
+ *   desc: Truncates the file to zero length, releasing data blocks. Updat=
es
+ *     file mtime and ctime. May trigger notifications to lease holders.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: inode timestamps
+ *   condition: Unless O_NOATIME is specified
+ *   desc: Opens for reading may update inode access time (atime) unless m=
ounted
+ *     with noatime/relatime or O_NOATIME is specified. Opens for writing =
that
+ *     truncate or create update mtime and ctime.
+ *
+ * capability: CAP_DAC_OVERRIDE
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypass file read, write, and execute permission checks
+ *   without: Standard DAC (discretionary access control) checks are appli=
ed
+ *   condition: Checked when file permission would otherwise deny access
+ *
+ * capability: CAP_DAC_READ_SEARCH
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypass read permission on files and search permission on dire=
ctories
+ *   without: Must have read permission on file or search permission on di=
rectory
+ *   condition: Checked during path traversal and file open
+ *
+ * capability: CAP_FOWNER
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Use O_NOATIME on files not owned by caller
+ *   without: O_NOATIME returns EPERM if caller is not file owner
+ *   condition: Checked when O_NOATIME is specified and caller is not owner
+ *
+ * capability: CAP_SYS_ADMIN
+ *   type: KAPI_CAP_INCREASE_LIMIT
+ *   allows: Exceed the system-wide file limit (file-max)
+ *   without: Returns ENFILE when system limit is reached
+ *   condition: Checked in alloc_empty_file() when nr_files >=3D max_files
+ *
+ * constraint: RLIMIT_NOFILE (per-process fd limit)
+ *   desc: The returned file descriptor must be less than the process's
+ *     RLIMIT_NOFILE limit. Default is typically 1024, maximum is controll=
ed
+ *     by /proc/sys/fs/nr_open (default 1048576). Exceeding returns EMFILE.
+ *   expr: fd < rlimit(RLIMIT_NOFILE)
+ *
+ * constraint: file-max (system-wide limit)
+ *   desc: System-wide limit on open files in /proc/sys/fs/file-max. Proce=
sses
+ *     without CAP_SYS_ADMIN receive ENFILE when this limit is reached. The
+ *     limit is computed based on system memory at boot time.
+ *   expr: nr_files < files_stat.max_files || capable(CAP_SYS_ADMIN)
+ *
+ * constraint: PATH_MAX
+ *   desc: Maximum length of pathname including null terminator is PATH_MAX
+ *     (4096 bytes). Individual path components must not exceed NAME_MAX (=
255).
+ *
+ * examples: fd =3D open("/etc/passwd", O_RDONLY);  // Read existing file
+ *   fd =3D open("/tmp/newfile", O_WRONLY | O_CREAT | O_TRUNC, 0644);  // =
Create/truncate
+ *   fd =3D open("/tmp/lockfile", O_WRONLY | O_CREAT | O_EXCL, 0600);  // =
Exclusive create
+ *   fd =3D open("/dev/null", O_RDWR);  // Open device
+ *   fd =3D open("/tmp", O_RDONLY | O_DIRECTORY);  // Open directory
+ *   fd =3D open("/tmp", O_TMPFILE | O_RDWR, 0600);  // Anonymous temp file
+ *
+ * notes: The distinction between O_RDONLY, O_WRONLY, and O_RDWR is critic=
al.
+ *   O_RDONLY is defined as 0, so (flags & O_RDONLY) will be true for all =
flags.
+ *   Test access mode using (flags & O_ACCMODE) =3D=3D O_RDONLY.
+ *
+ *   When O_CREAT is specified without O_EXCL, there is a race condition b=
etween
+ *   testing for file existence and creating it. Use O_CREAT | O_EXCL for =
atomic
+ *   exclusive file creation.
+ *
+ *   O_CLOEXEC should be used in multithreaded programs to prevent file de=
scriptor
+ *   leaks to child processes between fork() and execve().
+ *
+ *   O_DIRECT has alignment requirements that vary by filesystem. Use stat=
x()
+ *   with STATX_DIOALIGN (Linux 6.1+) to query requirements. Unaligned I/O=
 may
+ *   fail with EINVAL or fall back to buffered I/O.
+ *
+ *   O_PATH opens a file descriptor that can be used only for certain oper=
ations
+ *   (fstat, dup, fcntl, close, fchdir on directories, as dirfd for *at() =
calls).
+ *   I/O operations will fail with EBADF.
+ *
+ * since-version: 1.0
+ */
 SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, =
mode)
 {
 	if (force_o_largefile())
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29D5032573E;
	Thu, 18 Dec 2025 20:42:58 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090578; cv=none;
 b=C7aFOdq8aA5AtDfrGZ/z0suvXhE+TZtImsEmcvjji1GzmOmfTuDG10wpm678+c5WryUhEgvnQHWn+HFE+vLqkzL+2/28X3RwdTXMcnindG/mp9EYLWjNJmjrLzkJZJVNfyLgc/oFjaSSdhJZMYn8TKUeSl5U5MS5UkcuCN7Yx5g=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090578; c=relaxed/simple;
	bh=6SDsm5fa4pPqv3CmHlvFDJLLRTXdJ+yKSKEOq2Xv97E=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=pUh8iXPzEJFT/HvrI+HK0Wd26L/QfR0XRUwu/ZudOgsGuKd3AwsR/HOrsrRu0+IYyf/OXOWHjBcKclGtIaviNeHoSyusUcMRmjEmH9qWc2O98EqC6cC+TG9/vZ6ebHzq5CVJZKSsUFeVR1U42JQ8a8tP5FvtJi7y82RCyV+smBk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=FjwuDJ3K; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="FjwuDJ3K"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 61D65C4CEFB;
	Thu, 18 Dec 2025 20:42:57 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090578;
	bh=6SDsm5fa4pPqv3CmHlvFDJLLRTXdJ+yKSKEOq2Xv97E=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=FjwuDJ3KjYo/uHWrF0yaqJ1AjEOPb8wLbKcHMM6tzivdxRy0KvHCJOdIWigf3qMdZ
	 JR1bCTdU4a9aLRJVnlCWZwDkkj37UR7NRVwLauSp2yFXE+Z5iDuMhEcLvHZKVXBlxI
	 KQlkS/xLpUCxTASwAKQHisMxTESvySu/mgA2rlQP4cSbMSt43t0+ET3nH3vYJX/5zf
	 NdlsFZF42FIbQ3sO9+TV6G9IhgI/1ZcHOC18eG7ZSop92hfwhYjGI1esuGKxhjKZnU
	 klGm5aPvDJZfpa+7BX3HBONMHRklR9KS8XQmX2gCtwixiiVgNdbrsfHjjG32pRVeys
	 nO7QPEuUDqCgg==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 13/15] kernel/api: add API specification for sys_close
Date: Thu, 18 Dec 2025 15:42:35 -0500
Message-ID: <20251218204239.4159453-14-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/open.c | 247 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 243 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 343e6d3798ec3..26d8ee8336405 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1868,10 +1868,249 @@ int filp_close(struct file *filp, fl_owner_t id)
 }
 EXPORT_SYMBOL(filp_close);
=20
-/*
- * Careful here! We test whether the file pointer is NULL before
- * releasing the fd. This ensures that one clone task can't release
- * an fd while another clone is opening it.
+/**
+ * sys_close - Close a file descriptor
+ * @fd: The file descriptor to close
+ *
+ * long-desc: Terminates access to an open file descriptor, releasing the =
file
+ *   descriptor for reuse by subsequent open(), dup(), or similar syscalls=
. Any
+ *   advisory record locks (POSIX locks, OFD locks, and flock locks) held =
on the
+ *   associated file are released. When this is the last file descriptor
+ *   referring to the underlying open file description, associated resourc=
es are
+ *   freed. If the file was previously unlinked, the file itself is delete=
d when
+ *   the last reference is closed.
+ *
+ *   CRITICAL: The file descriptor is ALWAYS closed, even when close() ret=
urns
+ *   an error. This differs from POSIX semantics where the state of the fi=
le
+ *   descriptor is unspecified after EINTR. On Linux, the fd is released e=
arly
+ *   in close() processing before flush operations that may fail. Therefor=
e,
+ *   retrying close() after an error return is DANGEROUS and may close an
+ *   unrelated file descriptor that was assigned to another thread.
+ *
+ *   Errors returned from close() (EIO, ENOSPC, EDQUOT) indicate that the =
final
+ *   flush of buffered data failed. These errors commonly occur on network
+ *   filesystems like NFS when write errors are deferred to close time. A
+ *   successful return from close() does NOT guarantee that data has been
+ *   successfully written to disk; the kernel uses buffer cache to defer w=
rites.
+ *   To ensure data persistence, call fsync() before close().
+ *
+ *   On close, the following cleanup operations are performed: POSIX advis=
ory
+ *   locks are removed, dnotify registrations are cleaned up, the file is
+ *   flushed if the file operations define a flush callback, and the file
+ *   reference is released. If this was the last reference, additional cle=
anup
+ *   includes: fsnotify close notification, epoll cleanup, flock and lease
+ *   removal, FASYNC cleanup, the file's release callback invocation, and
+ *   the file structure deallocation.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ *   type: KAPI_TYPE_FD
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, INT_MAX
+ *   constraint: Must be a valid, open file descriptor for the current pro=
cess.
+ *     The value 0, 1, or 2 (stdin, stdout, stderr) may be closed like any=
 other
+ *     fd, though this is unusual and may cause issues with libraries that=
 assume
+ *     these descriptors are valid. The parameter is unsigned int to match=
 kernel
+ *     file descriptor table indexing, but values exceeding INT_MAX are ef=
fectively
+ *     invalid due to internal checks.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_EXACT
+ *   success: 0
+ *   desc: Returns 0 on success. On error, returns a negative error code.
+ *     IMPORTANT: Even when an error is returned, the file descriptor is s=
till
+ *     closed and must not be used again. The error indicates a problem wi=
th
+ *     the final flush operation, not that the fd remains open.
+ *
+ * error: EBADF, Bad file descriptor
+ *   desc: The file descriptor fd is not a valid open file descriptor, or =
was
+ *     already closed. This is the only error that indicates the fd was NOT
+ *     closed (because it was never open to begin with). Occurs when fd is=
 out
+ *     of range, has no file assigned, or was already closed.
+ *
+ * error: EINTR, Interrupted system call
+ *   desc: The flush operation was interrupted by a signal before completi=
on.
+ *     This occurs when a file's flush callback (e.g., NFS) performs an
+ *     interruptible wait that receives a signal. IMPORTANT: Despite this =
error,
+ *     the file descriptor IS closed and must not be used again. This error
+ *     is generated by converting kernel-internal restart codes (ERESTARTS=
YS,
+ *     ERESTARTNOINTR, ERESTARTNOHAND, ERESTART_RESTARTBLOCK) to EINTR bec=
ause
+ *     restarting the syscall would be incorrect once the fd is freed.
+ *
+ * error: EIO, I/O error
+ *   desc: An I/O error occurred during the flush of buffered data to the
+ *     underlying storage. This typically indicates a hardware error, netw=
ork
+ *     failure on NFS, or other storage system error. The file descriptor =
is
+ *     still closed. Previously buffered write data may have been lost.
+ *
+ * error: ENOSPC, No space left on device
+ *   desc: There was insufficient space on the storage device to flush buf=
fered
+ *     writes. This is common on NFS when the server runs out of space bet=
ween
+ *     write() and close(). The file descriptor is still closed.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ *   desc: The user's disk quota was exceeded while attempting to flush bu=
ffered
+ *     writes. Common on NFS when quota is exceeded between write() and cl=
ose().
+ *     The file descriptor is still closed.
+ *
+ * lock: files->file_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   acquired: true
+ *   released: true
+ *   desc: Acquired via file_close_fd() to atomically lookup and remove th=
e fd
+ *     from the file descriptor table. Held only during the table manipula=
tion;
+ *     released before flush and final cleanup operations. This ensures th=
at
+ *     another thread cannot allocate the same fd number while close is in
+ *     progress.
+ *
+ * lock: file->f_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   acquired: true
+ *   released: true
+ *   desc: Acquired during epoll cleanup (eventpoll_release_file) and dnot=
ify
+ *     cleanup to safely unlink the file from monitoring structures. May a=
lso
+ *     be acquired during lock context operations.
+ *
+ * lock: ep->mtx
+ *   type: KAPI_LOCK_MUTEX
+ *   acquired: true
+ *   released: true
+ *   desc: Acquired during epoll cleanup if the file was monitored by epol=
l.
+ *     Used to safely remove the file from epoll interest lists.
+ *
+ * lock: flc_lock
+ *   type: KAPI_LOCK_SPINLOCK
+ *   acquired: true
+ *   released: true
+ *   desc: File lock context spinlock, acquired during locks_remove_file()=
 to
+ *     safely remove POSIX, flock, and lease locks associated with the fil=
e.
+ *
+ * signal: pending_signals
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RETURN
+ *   condition: When flush callback performs interruptible wait
+ *   desc: If the file's flush callback (e.g., nfs_file_flush) performs an
+ *     interruptible wait and a signal is pending, the wait is interrupted.
+ *     Any kernel restart codes are converted to EINTR since close cannot =
be
+ *     restarted after the fd is freed.
+ *   error: -EINTR
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY | KAPI_EFFECT_IRREVERSIBLE
+ *   target: File descriptor table entry
+ *   desc: The file descriptor is removed from the process's file descript=
or
+ *     table, making the fd number available for reuse by subsequent open(=
),
+ *     dup(), or similar calls. This occurs BEFORE any flush or cleanup th=
at
+ *     might fail, making the operation irreversible regardless of return =
value.
+ *   condition: Always (when fd is valid)
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_LOCK_RELEASE
+ *   target: POSIX advisory locks, OFD locks, flock locks
+ *   desc: All advisory locks held on the file by this process are removed.
+ *     POSIX locks are removed via locks_remove_posix() during filp_flush(=
).
+ *     All lock types (POSIX, OFD, flock) are removed via locks_remove_fil=
e()
+ *     during __fput() when this is the last reference.
+ *   condition: File has FMODE_OPENED and !(FMODE_PATH)
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_RESOURCE_DESTROY
+ *   target: File leases
+ *   desc: Any file leases held on the file are removed during locks_remov=
e_file()
+ *     when this is the last reference to the open file description.
+ *   condition: File had leases and this is the last close
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: dnotify registrations
+ *   desc: Directory notification (dnotify) registrations associated with =
this
+ *     file are cleaned up via dnotify_flush(). This only applies to direc=
tories.
+ *   condition: File is a directory with dnotify registrations
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: epoll interest lists
+ *   desc: If the file was being monitored by epoll instances, it is remov=
ed
+ *     from those interest lists via eventpoll_release().
+ *   condition: File was added to epoll instances
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: Buffered data
+ *   desc: The file's flush callback is invoked if defined (e.g., NFS calls
+ *     nfs_file_flush). This attempts to write any buffered data to storage
+ *     and may return errors (EIO, ENOSPC, EDQUOT) if the flush fails. The
+ *     success of this flush is NOT guaranteed even with a 0 return; use
+ *     fsync() before close() to ensure data persistence.
+ *   condition: File has a flush callback and was opened for writing
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FREE_MEMORY
+ *   target: struct file and related structures
+ *   desc: When this is the last reference to the file, __fput() is called
+ *     synchronously (fput_close_sync), which frees the file structure, re=
leases
+ *     the dentry and mount references, and invokes the file's release cal=
lback.
+ *   condition: This is the last reference to the file
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_FILESYSTEM
+ *   target: Unlinked file deletion
+ *   desc: If the file was previously unlinked (deleted) but kept open, cl=
osing
+ *     the last reference causes the actual file data to be removed from t=
he
+ *     filesystem and the inode to be freed.
+ *   condition: File was unlinked and this is the last reference
+ *   reversible: no
+ *
+ * state-trans: file_descriptor
+ *   from: open
+ *   to: closed/free
+ *   condition: Valid fd passed to close
+ *   desc: The file descriptor transitions from open (usable) to closed (i=
nvalid).
+ *     The fd number becomes available for reuse. This transition occurs e=
arly
+ *     in close() processing, before any operations that might fail.
+ *
+ * state-trans: file_reference_count
+ *   from: n
+ *   to: n-1 (or freed if n was 1)
+ *   condition: Always on successful fd lookup
+ *   desc: The file's reference count is decremented. If this was the last
+ *     reference, the file is fully cleaned up and freed.
+ *
+ * constraint: File Descriptor Reuse Race
+ *   desc: Because the fd is freed early in close() processing, another th=
read
+ *     may receive the same fd number from a concurrent open() before clos=
e()
+ *     returns. Applications must not retry close() after an error return,=
 as
+ *     this could close an unrelated file opened by another thread.
+ *   expr: After close(fd) returns (even with error), fd is invalid
+ *
+ * examples: close(fd);  // Basic usage - ignore errors (common but not id=
eal)
+ *   if (close(fd) =3D=3D -1) perror("close");  // Log errors for debugging
+ *   fsync(fd); close(fd);  // Ensure data persistence before closing
+ *
+ * notes: This syscall has subtle non-POSIX semantics: the fd is ALWAYS cl=
osed
+ *   regardless of the return value. POSIX specifies that on EINTR, the st=
ate
+ *   of the fd is unspecified, but Linux always closes it. HP-UX requires
+ *   retrying close() on EINTR, but doing so on Linux may close an unrelat=
ed
+ *   fd that was reassigned by another thread. For portable code, the safe=
st
+ *   approach is to check for errors but never retry close().
+ *
+ *   Error codes from the flush callback (EIO, ENOSPC, EDQUOT) indicate th=
at
+ *   previously written data may have been lost. These errors are particul=
arly
+ *   common on NFS where write errors are often deferred to close time.
+ *
+ *   The driver's release() callback errors are explicitly ignored by the
+ *   kernel, so device driver cleanup errors are not propagated to userspa=
ce.
+ *
+ *   Calling close() on a file descriptor while another thread is using it
+ *   (e.g., in a blocking read() or write()) has implementation-defined
+ *   behavior. On Linux, the blocked operation continues on the underlying
+ *   file and may complete even after close() returns.
+ *
+ * since-version: 1.0
  */
 SYSCALL_DEFINE1(close, unsigned int, fd)
 {
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1513933064A;
	Thu, 18 Dec 2025 20:42:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090579; cv=none;
 b=lRPkZNcyuwnlMnmolwHXWbsLCQ2w4ks1jm8b6elSedaybn/hHvOoKGHvggNOCuKoWZKAt27303QqCda5xSWEAi7CXIvU57LlEacP4uciDP3k0qcU9qmIWxVO/UHMXoHUrhnELU0eqsZsvR3gv1L0PybqljTT2E3QB2Hp1nYKnqc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090579; c=relaxed/simple;
	bh=iJZKXJCaRCjR9e8zmoORtXZj353a0RM7OTM7Pkd37e0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=PaKG7Yukh0Ol0wPYhfySmGBxdToAcvEzLo+CsktRIC/YRiZSSLkHAf0bhWcIRpnC+jSQ6o03SQkZ/c357w9VGMwMsVGsHDiIHCvsvTtOpv2NH3Z4JeC6Jk/upqQaFPBl6/qZ3icz9jcymbv0JKgm/JVgiVCotqk+ndo7PGdrXJc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=A41kOROO; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="A41kOROO"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C9A5C113D0;
	Thu, 18 Dec 2025 20:42:58 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090579;
	bh=iJZKXJCaRCjR9e8zmoORtXZj353a0RM7OTM7Pkd37e0=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=A41kOROOnYPnHtXpGMwOokIa6i5Y69O6hfthXj12fcY3I3O4ycsdnDU1D+bXKQRfk
	 I60IaMqOqYigaC6PihwirdKyesBa3zmEPU2aDBxboXeJWpxM5h7JXB/rABMjPoUZyW
	 oE+DbCRDeuOevHs2bt3SRyGM3A4hcZvj+VClGY5R8klElhwOYVa4D3o3zOR+xqlHSe
	 Stu+FsbwhUUB1dqg1+Jq4xseXRaCi4c2EVx2X5at4SMoZECNFEmO8rHhY/w6fSdG32
	 KNPEq8A9usV+2rTwdzq3jWj6Rfo2u4XWsWcWrUUwB+aI0eDABDgq1c0DNo5sngApFC
	 kV43OcoSiBUVw==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 14/15] kernel/api: add API specification for sys_read
Date: Thu, 18 Dec 2025 15:42:36 -0500
Message-ID: <20251218204239.4159453-15-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/read_write.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 287 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index 833bae068770a..422046a666b1d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -719,6 +719,293 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, =
size_t count)
 	return ret;
 }
=20
+/**
+ * sys_read - Read data from a file descriptor
+ * @fd: File descriptor to read from
+ * @buf: User-space buffer to read data into
+ * @count: Maximum number of bytes to read
+ *
+ * long-desc: Attempts to read up to count bytes from file descriptor fd i=
nto
+ *   the buffer starting at buf. For seekable files (regular files, block
+ *   devices), the read begins at the current file offset, and the file of=
fset
+ *   is advanced by the number of bytes read. For non-seekable files (pipe=
s,
+ *   FIFOs, sockets, character devices), the file offset is not used.
+ *
+ *   If count is zero and fd refers to a regular file, read() may detect e=
rrors
+ *   as described below. In the absence of errors, or if read() does not c=
heck
+ *   for errors, a read() with a count of 0 returns zero and has no other =
effects.
+ *
+ *   On success, the number of bytes read is returned (zero indicates end =
of
+ *   file for regular files). It is not an error if this number is smaller=
 than
+ *   the number of bytes requested; this may happen because fewer bytes are
+ *   actually available right now (maybe because we were close to end-of-f=
ile,
+ *   or because we are reading from a pipe, socket, or terminal), or becau=
se
+ *   read() was interrupted by a signal.
+ *
+ *   On Linux, read() transfers at most MAX_RW_COUNT (0x7ffff000, approxim=
ately
+ *   2GB) bytes per call, regardless of whether the filesystem would allow=
 more.
+ *   This is to avoid issues with signed arithmetic overflow on 32-bit sys=
tems.
+ *
+ *   POSIX allows reads that are interrupted after reading some data to ei=
ther
+ *   return -1 (with errno set to EINTR) or return the number of bytes alr=
eady
+ *   read. Linux follows the latter behavior: if data has been read before=
 a
+ *   signal arrives, the call returns the bytes read rather than failing.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ *   type: KAPI_TYPE_FD
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, INT_MAX
+ *   constraint: Must be a valid, open file descriptor with read permissio=
n.
+ *     The file must have been opened with O_RDONLY or O_RDWR. Special val=
ues
+ *     like AT_FDCWD are not valid. File descriptors for directories return
+ *     EISDIR. Standard file descriptors 0 (stdin), 1 (stdout), 2 (stderr)=
 are
+ *     valid if open and readable.
+ *
+ * param: buf
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_OUT | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must point to a valid, writable user-space memory region =
of at
+ *     least count bytes. The buffer is validated via access_ok() before a=
ny
+ *     read operation. NULL is invalid and will return EFAULT. The buffer =
may
+ *     be partially written if an error occurs mid-read. For O_DIRECT read=
s,
+ *     the buffer may need to be aligned to the filesystem's block size (v=
aries
+ *     by filesystem, check via statx() with STATX_DIOALIGN).
+ *
+ * param: count
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, SIZE_MAX
+ *   constraint: Maximum number of bytes to read. Clamped internally to
+ *     MAX_RW_COUNT (INT_MAX & PAGE_MASK, approximately 0x7ffff000 bytes) =
to
+ *     prevent signed overflow issues. A count of 0 returns immediately wi=
th 0
+ *     without accessing the file (but may still detect errors). Large val=
ues
+ *     are not errors but will be clamped. Cast to ssize_t must not be neg=
ative.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_RANGE
+ *   success: >=3D 0
+ *   desc: On success, returns the number of bytes read (non-negative). Ze=
ro
+ *     indicates end-of-file (EOF) for regular files, or no data available
+ *     from a device that does not block. The return value may be less than
+ *     count if fewer bytes were available (short read). Partial reads are
+ *     not errors. On error, returns a negative error code.
+ *
+ * error: EBADF, Bad file descriptor
+ *   desc: fd is not a valid file descriptor, or fd was not opened for rea=
ding.
+ *     This includes file descriptors opened with O_WRONLY, O_PATH, or file
+ *     descriptors that have been closed. Also returned if the file struct=
ure
+ *     does not have FMODE_READ set.
+ *
+ * error: EFAULT, Bad address
+ *   desc: buf points outside the accessible address space. The buffer add=
ress
+ *     failed access_ok() validation. Can also occur if a fault happens du=
ring
+ *     copy_to_user() when transferring data to user space after the read
+ *     completes in kernel space.
+ *
+ * error: EINVAL, Invalid argument
+ *   desc: Returned in several cases: (1) The file descriptor refers to an
+ *     object that is not suitable for reading (no read or read_iter metho=
d).
+ *     (2) The file was opened with O_DIRECT and the buffer alignment, off=
set,
+ *     or count does not meet the filesystem's alignment requirements. (3)=
 For
+ *     timerfd file descriptors, the buffer is smaller than 8 bytes. (4) T=
he
+ *     count argument, when cast to ssize_t, is negative.
+ *
+ * error: EISDIR, Is a directory
+ *   desc: fd refers to a directory. Directories cannot be read using read=
();
+ *     use getdents64() instead. This error is returned by the generic_rea=
d_dir()
+ *     handler installed for directory file operations.
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ *   desc: fd refers to a file (pipe, socket, device) that is marked non-b=
locking
+ *     (O_NONBLOCK) and the read would block. Also returned with IOCB_NOWA=
IT
+ *     when data is not immediately available. Equivalent to EWOULDBLOCK.
+ *     The application should retry the read later or use select/poll/epol=
l.
+ *
+ * error: EINTR, Interrupted system call
+ *   desc: The call was interrupted by a signal before any data was read. =
This
+ *     only occurs if no data has been transferred; if some data was read =
before
+ *     the signal, the call returns the number of bytes read. The caller s=
hould
+ *     typically restart the read.
+ *
+ * error: EIO, Input/output error
+ *   desc: A low-level I/O error occurred. For regular files, this typical=
ly
+ *     indicates a hardware error on the storage device, a filesystem erro=
r,
+ *     or a network filesystem timeout. For terminals, this may indicate t=
he
+ *     controlling terminal has been closed for a background process.
+ *
+ * error: EOVERFLOW, Value too large for defined data type
+ *   desc: The file position plus count would exceed LLONG_MAX. Also retur=
ned
+ *     when reading from certain files (e.g., some /proc files) where the =
file
+ *     position would overflow. For files without FOP_UNSIGNED_OFFSET flag,
+ *     negative file positions are not allowed.
+ *
+ * error: ENOBUFS, No buffer space available
+ *   desc: Returned when reading from pipe-based watch queues (CONFIG_WATC=
H_QUEUE)
+ *     when the buffer is too small to hold a complete notification, or wh=
en
+ *     reading packets from pipes with PIPE_BUF_FLAG_WHOLE set.
+ *
+ * error: ERESTARTSYS, Restart system call (internal)
+ *   desc: Internal error code indicating the syscall should be restarted.=
 This
+ *     is typically translated to EINTR if SA_RESTART is not set on the si=
gnal
+ *     handler, or the syscall is transparently restarted if SA_RESTART is=
 set.
+ *     User space should not see this error code directly.
+ *
+ * error: EACCES, Permission denied
+ *   desc: The security subsystem (LSM such as SELinux or AppArmor) denied
+ *     the read operation via security_file_permission(). This can occur e=
ven
+ *     if the file was successfully opened, as LSM policies may enforce pe=
r-
+ *     operation checks.
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: Returned by fanotify permission events (CONFIG_FANOTIFY_ACCESS_=
PERMISSIONS)
+ *     when a user-space fanotify listener denies the read operation via
+ *     fsnotify_file_area_perm().
+ *
+ * lock: file->f_pos_lock
+ *   type: KAPI_LOCK_MUTEX
+ *   acquired: conditional
+ *   released: true
+ *   desc: For regular files that require atomic position updates (FMODE_A=
TOMIC_POS),
+ *     the f_pos_lock mutex is acquired by fdget_pos() at syscall entry an=
d released
+ *     by fdput_pos() at syscall exit. This serializes concurrent reads th=
at share
+ *     the same file description. Not acquired for files opened with FMODE=
_STREAM
+ *     (pipes, sockets) or when the file is not shared.
+ *
+ * lock: Filesystem-specific locks
+ *   type: KAPI_LOCK_CUSTOM
+ *   acquired: conditional
+ *   released: true
+ *   desc: The filesystem's read_iter or read method may acquire additiona=
l locks.
+ *     For regular files, this typically includes the inode's i_rwsem for =
certain
+ *     operations. For pipes, the pipe->mutex is acquired. For sockets, so=
cket
+ *     lock is acquired. These are internal to the file operation and rele=
ased
+ *     before return.
+ *
+ * lock: RCU read-side
+ *   type: KAPI_LOCK_RCU
+ *   acquired: conditional
+ *   released: true
+ *   desc: Used during file descriptor lookup via fdget(). RCU read lock p=
rotects
+ *     access to the file descriptor table. Released by fdput() at syscall=
 exit.
+ *
+ * signal: Any signal
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RETURN
+ *   condition: When blocked waiting for data on interruptible operations
+ *   desc: The syscall may be interrupted by signals while waiting for dat=
a to
+ *     become available (pipes, sockets, terminals) or waiting for locks. =
If
+ *     interrupted before any data is read, returns -EINTR or -ERESTARTSYS.
+ *     If data has already been read, returns the number of bytes read.
+ *   error: -EINTR
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_FILE_POSITION
+ *   target: file->f_pos
+ *   condition: For seekable files when read succeeds (returns > 0)
+ *   desc: The file offset (f_pos) is advanced by the number of bytes read.
+ *     For stream files (FMODE_STREAM such as pipes and sockets), the offs=
et
+ *     is not used or modified. The offset update is protected by f_pos_lo=
ck
+ *     when the file is shared between threads/processes.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: inode access time (atime)
+ *   condition: When read succeeds and O_NOATIME is not set
+ *   desc: Updates the file's access time (atime) via touch_atime(). The u=
pdate
+ *     may be suppressed by mount options (noatime, relatime), the O_NOATI=
ME
+ *     flag, or if the filesystem does not support atime. Relatime only up=
dates
+ *     atime if it is older than mtime or ctime, or more than a day old.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: task I/O accounting
+ *   condition: Always
+ *   desc: Updates the current task's I/O accounting statistics. The rchar=
 field
+ *     (read characters) is incremented by bytes read via add_rchar(). The=
 syscr
+ *     field (syscall read count) is incremented via inc_syscr(). These st=
atistics
+ *     are visible in /proc/[pid]/io. Updated regardless of success or fai=
lure.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: fsnotify events
+ *   condition: When read returns > 0
+ *   desc: Generates an FS_ACCESS fsnotify event via fsnotify_access() all=
owing
+ *     inotify, fanotify, and dnotify watchers to be notified of the read.=
 This
+ *     occurs after data transfer completes successfully.
+ *   reversible: no
+ *
+ * capability: CAP_DAC_OVERRIDE
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypass discretionary access control on read permission
+ *   without: Standard DAC checks are enforced
+ *   condition: Checked via security_file_permission() during rw_verify_ar=
ea()
+ *
+ * capability: CAP_DAC_READ_SEARCH
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypass read permission checks on regular files
+ *   without: Must have read permission on file
+ *   condition: Checked by LSM hooks during the read operation
+ *
+ * constraint: MAX_RW_COUNT
+ *   desc: The count parameter is silently clamped to MAX_RW_COUNT (INT_MA=
X &
+ *     PAGE_MASK, approximately 2GB minus one page) to prevent integer ove=
rflow
+ *     in internal calculations. This is transparent to the caller; the sy=
scall
+ *     succeeds but reads at most MAX_RW_COUNT bytes.
+ *   expr: actual_count =3D min(count, MAX_RW_COUNT)
+ *
+ * constraint: File must be open for reading
+ *   desc: The file descriptor must have been opened with O_RDONLY or O_RD=
WR.
+ *     Files opened with O_WRONLY or O_PATH cannot be read and return EBAD=
F.
+ *     The file must have both FMODE_READ and FMODE_CAN_READ flags set.
+ *   expr: (file->f_mode & FMODE_READ) && (file->f_mode & FMODE_CAN_READ)
+ *
+ * examples: n =3D read(fd, buf, sizeof(buf));  // Basic read
+ *   n =3D read(STDIN_FILENO, buf, 1024);  // Read from stdin
+ *   while ((n =3D read(fd, buf, 4096)) > 0) { process(buf, n); }  // Read=
 loop
+ *   if (read(fd, buf, count) =3D=3D 0) { handle_eof(); }  // Check for EOF
+ *
+ * notes: The behavior of read() varies significantly depending on the typ=
e of
+ *   file descriptor:
+ *
+ *   - Regular files: Reads from current position, advances position, retu=
rns 0
+ *     at EOF. Short reads are rare but possible near EOF or on signal.
+ *
+ *   - Pipes and FIFOs: Blocking by default. Returns available data (up to=
 count)
+ *     or blocks until data is available. Returns 0 when all writers have =
closed.
+ *     O_NONBLOCK returns EAGAIN when empty instead of blocking.
+ *
+ *   - Sockets: Similar to pipes. Specific behavior depends on socket type=
 and
+ *     protocol. MSG_* flags can be specified via recv() for more control.
+ *
+ *   - Terminals: Line-buffered in canonical mode; read returns when newli=
ne is
+ *     entered or buffer is full. Raw mode returns immediately when data a=
vailable.
+ *     Special handling for signals (SIGINT on Ctrl+C, etc.).
+ *
+ *   - Device special files: Behavior is device-specific. Some devices sup=
port
+ *     seeking, others do not. Read size may be constrained by device.
+ *
+ *   Race condition: Concurrent reads from the same file description (not =
just
+ *   file descriptor) can race on the file position. Linux 3.14+ provides =
atomic
+ *   position updates for regular files via f_pos_lock, but applications s=
hould
+ *   use pread() for concurrent positioned reads.
+ *
+ *   O_DIRECT reads bypass the page cache and typically require aligned bu=
ffers
+ *   and positions. Alignment requirements are filesystem-specific; use st=
atx()
+ *   with STATX_DIOALIGN (Linux 6.1+) to query. Unaligned O_DIRECT reads f=
ail
+ *   with EINVAL on most filesystems.
+ *
+ *   For splice(2)-like zero-copy reads, consider using splice(), sendfile=
(),
+ *   or copy_file_range() instead of read() + write().
+ *
+ * since-version: 1.0
+ */
 SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
 {
 	return ksys_read(fd, buf, count);
--=20
2.51.0
From nobody Sat Feb  7 10:08:18 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id F334C336EEE;
	Thu, 18 Dec 2025 20:42:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766090580; cv=none;
 b=ijXn4NCD+2F4Y6wIphe4sgO6RyPA9KtBTYWn3a0c3Vo0oRrHdelqaEP5i7u5e5Q5M9e9nHGt5cIIvGSWiRGETnAaiTvwBgWrqKVB6eLLzjlUodumbL2snZlK9LiYPlf1STdK2Qjr5JTVBce+i8Vxr2IpUy7QvrAATHxR1b25cas=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766090580; c=relaxed/simple;
	bh=oL3b1IdHz7HdHacAQkdzJP1OcLLI3zfDFml5NRFS38s=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=MTh8wiAF+0gXyfcUYX03To0JtNlPRequnsbqi/UPRghXzdNdC2r0ZS2d+vwAVy9iMBQ8c0pgXIaV5LlL45yHvEgkcg+KUnhqQYDY99qPkVLDOPWzf4h6TNcHJqdbAeGRpmewSlaI+1JNSWgH4A/+hRpvFcn6Mrg0VfLfkO5gxz8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=rI1Szzzi; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="rI1Szzzi"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 36279C19421;
	Thu, 18 Dec 2025 20:42:59 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1766090579;
	bh=oL3b1IdHz7HdHacAQkdzJP1OcLLI3zfDFml5NRFS38s=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=rI1Szzzice7aDP+OFJBuhoZ3KxNQ4WkxGN3J9nLZNLSTWQBmiaVB7hRwBPajFFOUf
	 YmsG+f/3uaIq3WZCqQzFgihxRzaDsNmNTBUegYtv/+czHsxU6vh6IgIwNOU9isPLKR
	 CO4E3CuOlOGSsZbn52k4nh7+/NHdVZII59C5MOsNyEix/8K2Vjp4/sVsEBtZWxQ8GA
	 uxK/Ry6vsaDLl+OPcvfMLBERZqTC7VcG+1T7FWQXMHM20r9UPnDYtfvMCOg8gnVRyg
	 Lh7duABLHg9rBVnh1Yb7W9SZLO2EzqGjKCcv+wE41QhNR/Z+PmKUOs+NZqa0sRtE8t
	 egq+peiDyNtjQ==
From: Sasha Levin <sashal@kernel.org>
To: linux-api@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tools@kernel.org,
	gpaoloni@redhat.com,
	Sasha Levin <sashal@kernel.org>
Subject: [RFC PATCH v5 15/15] kernel/api: add API specification for sys_write
Date: Thu, 18 Dec 2025 15:42:37 -0500
Message-ID: <20251218204239.4159453-16-sashal@kernel.org>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org>
References: <20251218204239.4159453-1-sashal@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/read_write.c | 377 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 377 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index 422046a666b1d..685bf6b9bd3b1 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1030,6 +1030,383 @@ ssize_t ksys_write(unsigned int fd, const char __us=
er *buf, size_t count)
 	return ret;
 }
=20
+/**
+ * sys_write - Write data to a file descriptor
+ * @fd: File descriptor to write to
+ * @buf: User-space buffer containing data to write
+ * @count: Maximum number of bytes to write
+ *
+ * long-desc: Attempts to write up to count bytes from the buffer starting=
 at
+ *   buf to the file referred to by the file descriptor fd. For seekable f=
iles
+ *   (regular files, block devices), the write begins at the current file =
offset,
+ *   and the file offset is advanced by the number of bytes written. If th=
e file
+ *   was opened with O_APPEND, the file offset is first set to the end of =
the
+ *   file before writing. For non-seekable files (pipes, FIFOs, sockets, c=
haracter
+ *   devices), the file offset is not used and writing occurs at the curre=
nt
+ *   position as defined by the device.
+ *
+ *   The number of bytes written may be less than count if, for example, t=
here is
+ *   insufficient space on the underlying physical medium, or the RLIMIT_F=
SIZE
+ *   resource limit is encountered, or the call was interrupted by a signal
+ *   handler after having written less than count bytes. In the event of a
+ *   successful partial write, the caller should make another write() call=
 to
+ *   transfer the remaining bytes. This behavior is called a "short write."
+ *
+ *   On Linux, write() transfers at most MAX_RW_COUNT (0x7ffff000, approxi=
mately
+ *   2GB minus one page) bytes per call, regardless of whether the file or
+ *   filesystem would allow more. This prevents signed arithmetic overflow.
+ *
+ *   For regular files, a successful write() does not guarantee that data =
has been
+ *   committed to disk. Use fsync(2) or fdatasync(2) if durability is requ=
ired.
+ *   For O_SYNC or O_DSYNC files, the kernel automatically syncs data on w=
rite.
+ *
+ *   POSIX permits writes that are interrupted after partial writes to eit=
her
+ *   return -1 with errno=3DEINTR, or to return the count of bytes already=
 written.
+ *   Linux implements the latter behavior: if some data has been written b=
efore
+ *   a signal arrives, write() returns the number of bytes written rather =
than
+ *   failing with EINTR.
+ *
+ * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
+ *
+ * param: fd
+ *   type: KAPI_TYPE_FD
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, INT_MAX
+ *   constraint: Must be a valid, open file descriptor with write permissi=
on.
+ *     The file must have been opened with O_WRONLY or O_RDWR. File descri=
ptors
+ *     opened with O_RDONLY, O_PATH, or that have been closed return EBADF.
+ *     Standard file descriptors 0 (stdin), 1 (stdout), 2 (stderr) are val=
id if
+ *     open and writable. AT_FDCWD and other special values are not valid.
+ *
+ * param: buf
+ *   type: KAPI_TYPE_USER_PTR
+ *   flags: KAPI_PARAM_IN | KAPI_PARAM_USER
+ *   constraint-type: KAPI_CONSTRAINT_CUSTOM
+ *   constraint: Must point to a valid, readable user-space memory region =
of at
+ *     least count bytes. The buffer is validated via access_ok() before a=
ny
+ *     write operation. NULL is invalid and returns EFAULT. For O_DIRECT w=
rites,
+ *     the buffer may need to be aligned to the filesystem's block size (v=
aries
+ *     by filesystem; query with statx() using STATX_DIOALIGN on Linux 6.1=
+).
+ *
+ * param: count
+ *   type: KAPI_TYPE_UINT
+ *   flags: KAPI_PARAM_IN
+ *   constraint-type: KAPI_CONSTRAINT_RANGE
+ *   range: 0, SIZE_MAX
+ *   constraint: Maximum number of bytes to write. Clamped internally to
+ *     MAX_RW_COUNT (INT_MAX & PAGE_MASK, approximately 0x7ffff000 bytes) =
to
+ *     prevent signed overflow. A count of 0 returns 0 immediately without=
 any
+ *     file operations. Cast to ssize_t must not be negative.
+ *
+ * return:
+ *   type: KAPI_TYPE_INT
+ *   check-type: KAPI_RETURN_RANGE
+ *   success: >=3D 0
+ *   desc: On success, returns the number of bytes written (non-negative).=
 Zero
+ *     indicates that nothing was written (count was 0, or no space availa=
ble
+ *     for non-blocking writes). The return value may be less than count d=
ue to
+ *     resource limits, signal interruption, or device constraints (short =
write).
+ *     On error, returns a negative error code.
+ *
+ * error: EBADF, Bad file descriptor
+ *   desc: fd is not a valid file descriptor, or fd was not opened for wri=
ting.
+ *     This includes file descriptors opened with O_RDONLY, O_PATH, or file
+ *     descriptors that have been closed. Also returned if the file struct=
ure
+ *     does not have FMODE_WRITE or FMODE_CAN_WRITE set.
+ *
+ * error: EFAULT, Bad address
+ *   desc: buf points outside the accessible address space. The buffer add=
ress
+ *     failed access_ok() validation. Can also occur if a fault happens du=
ring
+ *     copy_from_user() when reading data from user space.
+ *
+ * error: EINVAL, Invalid argument
+ *   desc: Returned in several cases: (1) The file descriptor refers to an
+ *     object that is not suitable for writing (no write or write_iter met=
hod).
+ *     (2) The file was opened with O_DIRECT and the buffer alignment, off=
set,
+ *     or count does not meet the filesystem's alignment requirements. (3)=
 The
+ *     count argument, when cast to ssize_t, is negative. (4) For IOCB_NOW=
AIT
+ *     operations on non-O_DIRECT files that don't support WASYNC.
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ *   desc: fd refers to a file (pipe, socket, device) that is marked non-b=
locking
+ *     (O_NONBLOCK) and the write would block because the buffer is full. =
Also
+ *     returned with IOCB_NOWAIT when data cannot be written immediately.
+ *     Equivalent to EWOULDBLOCK. The application should retry later or use
+ *     select/poll/epoll to wait for writability.
+ *
+ * error: EINTR, Interrupted system call
+ *   desc: The call was interrupted by a signal before any data was writte=
n. This
+ *     only occurs if no data has been transferred; if some data was writt=
en
+ *     before the signal, the call returns the number of bytes written. The
+ *     caller should typically restart the write.
+ *
+ * error: EPIPE, Broken pipe
+ *   desc: fd refers to a pipe or socket whose reading end has been closed.
+ *     When this condition occurs, the calling process also receives a SIG=
PIPE
+ *     signal unless MSG_NOSIGNAL is used (for sockets) or IOCB_NOSIGNAL i=
s set.
+ *     If the signal is caught or ignored, EPIPE is still returned.
+ *
+ * error: EFBIG, File too large
+ *   desc: An attempt was made to write a file that exceeds the implementa=
tion-
+ *     defined maximum file size or the file size limit (RLIMIT_FSIZE) of =
the
+ *     process. When RLIMIT_FSIZE is exceeded, the process also receives S=
IGXFSZ.
+ *     For files not opened with O_LARGEFILE on 32-bit systems, the limit =
is 2GB.
+ *
+ * error: ENOSPC, No space left on device
+ *   desc: The device containing the file has no room for the data. This c=
an
+ *     occur mid-write resulting in a short write followed by ENOSPC on re=
try.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ *   desc: The user's quota of disk blocks on the filesystem has been exha=
usted.
+ *     Like ENOSPC, this can result in a short write.
+ *
+ * error: EIO, Input/output error
+ *   desc: A low-level I/O error occurred while modifying the inode or wri=
ting
+ *     data. This typically indicates hardware failure, filesystem corrupt=
ion,
+ *     or network filesystem timeout. Some data may have been written.
+ *
+ * error: EPERM, Operation not permitted
+ *   desc: The operation was prevented: (1) by a file seal (F_SEAL_WRITE or
+ *     F_SEAL_FUTURE_WRITE on memfd/shmem), (2) writing to an immutable in=
ode
+ *     (IS_IMMUTABLE), (3) by an LSM hook denying the operation, or (4) by=
 a
+ *     fanotify permission event denying the write.
+ *
+ * error: EOVERFLOW, Value too large for defined data type
+ *   desc: The file position plus count would exceed LLONG_MAX. Also retur=
ned
+ *     when the offset would exceed filesystem limits after the write.
+ *
+ * error: EDESTADDRREQ, Destination address required
+ *   desc: fd is a datagram socket for which no peer address has been set =
using
+ *     connect(2). Use sendto(2) to specify the destination address.
+ *
+ * error: ETXTBSY, Text file busy
+ *   desc: The file is being used as a swap file (IS_SWAPFILE).
+ *
+ * error: EXDEV, Cross-device link
+ *   desc: When writing to a pipe that has been configured as a watch queue
+ *     (CONFIG_WATCH_QUEUE), direct write() calls are not supported.
+ *
+ * error: ENOMEM, Out of memory
+ *   desc: Insufficient kernel memory was available for the write operatio=
n.
+ *     For pipes, this occurs when allocating pages for the pipe buffer.
+ *
+ * error: ERESTARTSYS, Restart system call (internal)
+ *   desc: Internal error code indicating the syscall should be restarted.=
 This
+ *     is converted to EINTR if SA_RESTART is not set on the signal handle=
r, or
+ *     the syscall is transparently restarted if SA_RESTART is set. User s=
pace
+ *     should not see this error code directly.
+ *
+ * error: EACCES, Permission denied
+ *   desc: The security subsystem (LSM such as SELinux or AppArmor) denied=
 the
+ *     write operation via security_file_permission(). This can occur even=
 if
+ *     the file was successfully opened.
+ *
+ * lock: file->f_pos_lock
+ *   type: KAPI_LOCK_MUTEX
+ *   acquired: conditional
+ *   released: true
+ *   desc: For regular files that require atomic position updates (FMODE_A=
TOMIC_POS),
+ *     the f_pos_lock mutex is acquired by fdget_pos() at syscall entry an=
d released
+ *     by fdput_pos() at syscall exit. This serializes concurrent writes s=
haring
+ *     the same file description. Not acquired for stream files (FMODE_STR=
EAM like
+ *     pipes and sockets) or when the file is not shared.
+ *
+ * lock: sb->s_writers (freeze protection)
+ *   type: KAPI_LOCK_CUSTOM
+ *   acquired: conditional
+ *   released: true
+ *   desc: For regular files, file_start_write() acquires freeze protectio=
n on
+ *     the superblock via sb_start_write() before the write, and file_end_=
write()
+ *     releases it after. This prevents writes during filesystem freeze. N=
ot
+ *     acquired for non-regular files (pipes, sockets, devices).
+ *
+ * lock: inode->i_rwsem
+ *   type: KAPI_LOCK_RWLOCK
+ *   acquired: conditional
+ *   released: true
+ *   desc: For regular files using generic_file_write_iter(), the inode's =
i_rwsem
+ *     is acquired in write mode before modifying file data. This is inter=
nal to
+ *     the filesystem and released before return. Not all filesystems use =
this
+ *     pattern.
+ *
+ * lock: pipe->mutex
+ *   type: KAPI_LOCK_MUTEX
+ *   acquired: conditional
+ *   released: true
+ *   desc: For pipes and FIFOs, the pipe's mutex is held while modifying p=
ipe
+ *     buffers. Released temporarily while waiting for space, then reacqui=
red.
+ *
+ * lock: RCU read-side
+ *   type: KAPI_LOCK_RCU
+ *   acquired: conditional
+ *   released: true
+ *   desc: Used during file descriptor lookup via fdget(). RCU read lock p=
rotects
+ *     access to the file descriptor table. Released by fdput() at syscall=
 exit.
+ *
+ * signal: SIGPIPE
+ *   direction: KAPI_SIGNAL_SEND
+ *   action: KAPI_SIGNAL_ACTION_TERMINATE
+ *   condition: Writing to a pipe or socket with no readers
+ *   desc: When writing to a pipe whose read end is closed, or a socket wh=
ose
+ *     peer has closed, SIGPIPE is sent to the calling process. The default
+ *     action terminates the process. Use signal(SIGPIPE, SIG_IGN) or set
+ *     IOCB_NOSIGNAL/MSG_NOSIGNAL to suppress. EPIPE is returned regardles=
s.
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *
+ * signal: SIGXFSZ
+ *   direction: KAPI_SIGNAL_SEND
+ *   action: KAPI_SIGNAL_ACTION_COREDUMP
+ *   condition: Writing exceeds RLIMIT_FSIZE
+ *   desc: When a write would exceed the soft file size limit (RLIMIT_FSIZ=
E),
+ *     SIGXFSZ is sent. The default action terminates with a core dump. The
+ *     write returns EFBIG. If RLIMIT_FSIZE is RLIM_INFINITY, no signal is=
 sent.
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *
+ * signal: Any signal
+ *   direction: KAPI_SIGNAL_RECEIVE
+ *   action: KAPI_SIGNAL_ACTION_RETURN
+ *   condition: While blocked waiting for space (pipes, sockets)
+ *   desc: The syscall may be interrupted by signals while waiting for buf=
fer
+ *     space to become available. If interrupted before any data is writte=
n,
+ *     returns -EINTR or -ERESTARTSYS. If data was already written, return=
s the
+ *     byte count. Restartable if SA_RESTART is set and no data was writte=
n.
+ *   error: -EINTR
+ *   timing: KAPI_SIGNAL_TIME_DURING
+ *   restartable: yes
+ *
+ * side-effect: KAPI_EFFECT_FILE_POSITION
+ *   target: file->f_pos
+ *   condition: For seekable files when write succeeds (returns > 0)
+ *   desc: The file offset (f_pos) is advanced by the number of bytes writ=
ten.
+ *     For files opened with O_APPEND, f_pos is first set to file size. For
+ *     stream files (FMODE_STREAM such as pipes and sockets), the offset i=
s not
+ *     used or modified. Position updates are protected by f_pos_lock when
+ *     shared.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: inode timestamps (mtime, ctime)
+ *   condition: When write succeeds (returns > 0)
+ *   desc: Updates the file's modification time (mtime) and change time (c=
time)
+ *     via file_update_time(). The update precision depends on filesystem =
mount
+ *     options (fine-grained timestamps for multigrain inodes).
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: SUID/SGID bits (mode)
+ *   condition: When writing to a setuid/setgid file
+ *   desc: The SUID bit is cleared when a non-root user writes to a file w=
ith
+ *     the bit set. The SGID bit may also be cleared. This is a security f=
eature
+ *     to prevent privilege escalation via modified setuid binaries. Done =
via
+ *     file_remove_privs().
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: file data
+ *   condition: When write succeeds (returns > 0)
+ *   desc: Modifies the file's data content. For regular files, data is wr=
itten
+ *     to the page cache (buffered I/O) or directly to storage (O_DIRECT).
+ *     Data may not be persistent until fsync() is called or the file is c=
losed.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: task I/O accounting
+ *   condition: Always
+ *   desc: Updates the current task's I/O accounting statistics. The wchar=
 field
+ *     (write characters) is incremented by bytes written via add_wchar().=
 The
+ *     syscw field (syscall write count) is incremented via inc_syscw(). T=
hese
+ *     statistics are visible in /proc/[pid]/io.
+ *   reversible: no
+ *
+ * side-effect: KAPI_EFFECT_MODIFY_STATE
+ *   target: fsnotify events
+ *   condition: When write returns > 0
+ *   desc: Generates an FS_MODIFY fsnotify event via fsnotify_modify(), al=
lowing
+ *     inotify, fanotify, and dnotify watchers to be notified of the write.
+ *
+ * capability: CAP_DAC_OVERRIDE
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypass discretionary access control on write permission
+ *   without: Standard DAC checks are enforced
+ *   condition: Checked via security_file_permission() during rw_verify_ar=
ea()
+ *
+ * capability: CAP_FOWNER
+ *   type: KAPI_CAP_BYPASS_CHECK
+ *   allows: Bypass ownership checks for SUID/SGID clearing
+ *   without: SUID/SGID bits are cleared on write by non-owner
+ *   condition: Checked during file_remove_privs()
+ *
+ * constraint: MAX_RW_COUNT
+ *   desc: The count parameter is silently clamped to MAX_RW_COUNT (INT_MA=
X &
+ *     PAGE_MASK, approximately 2GB minus one page) to prevent integer ove=
rflow
+ *     in internal calculations. This is transparent to the caller.
+ *   expr: actual_count =3D min(count, MAX_RW_COUNT)
+ *
+ * constraint: File must be open for writing
+ *   desc: The file descriptor must have been opened with O_WRONLY or O_RD=
WR.
+ *     Files opened with O_RDONLY or O_PATH cannot be written and return E=
BADF.
+ *     The file must have both FMODE_WRITE and FMODE_CAN_WRITE flags set.
+ *   expr: (file->f_mode & FMODE_WRITE) && (file->f_mode & FMODE_CAN_WRITE)
+ *
+ * constraint: RLIMIT_FSIZE
+ *   desc: The size of data written is constrained by the RLIMIT_FSIZE res=
ource
+ *     limit. If writing would exceed this limit, SIGXFSZ is sent and EFBI=
G is
+ *     returned. The limit does not apply to files beyond the limit - only=
 to
+ *     writes that would cross it.
+ *   expr: pos + count <=3D rlimit(RLIMIT_FSIZE) || rlimit(RLIMIT_FSIZE) =
=3D=3D RLIM_INFINITY
+ *
+ * constraint: File seals
+ *   desc: For memfd or shmem files with F_SEAL_WRITE or F_SEAL_FUTURE_WRI=
TE
+ *     seals applied, all write operations fail with EPERM. With F_SEAL_GR=
OW,
+ *     writes that would extend file size fail with EPERM.
+ *
+ * examples: n =3D write(fd, buf, sizeof(buf));  // Basic write
+ *   n =3D write(STDOUT_FILENO, msg, strlen(msg));  // Write to stdout
+ *   while (total < len) { n =3D write(fd, buf+total, len-total); if (n<0)=
 break; total +=3D n; }  // Handle short writes
+ *   if (write(pipefd[1], &byte, 1) < 0 && errno =3D=3D EPIPE) { handle_br=
oken_pipe(); }  // Pipe error handling
+ *
+ * notes: The behavior of write() varies significantly depending on the ty=
pe of
+ *   file descriptor:
+ *
+ *   - Regular files: Writes to the page cache (buffered) or directly to s=
torage
+ *     (O_DIRECT). Short writes are rare except near RLIMIT_FSIZE or disk =
full.
+ *     O_APPEND is atomic for determining write position.
+ *
+ *   - Pipes and FIFOs: Blocking by default. Writes up to PIPE_BUF (4096 b=
ytes
+ *     on Linux) are guaranteed atomic. Larger writes may be interleaved w=
ith
+ *     writes from other processes. Blocks if pipe is full; returns EAGAIN=
 with
+ *     O_NONBLOCK. SIGPIPE/EPIPE if no readers.
+ *
+ *   - Sockets: Behavior depends on socket type and protocol. Stream socke=
ts
+ *     (TCP) may return partial writes. Datagram sockets (UDP) typically w=
rite
+ *     complete messages or fail. SIGPIPE/EPIPE for broken connections (un=
less
+ *     MSG_NOSIGNAL). EDESTADDRREQ for unconnected datagram sockets.
+ *
+ *   - Terminals: May block on flow control. Canonical vs raw mode affects
+ *     behavior. Special characters may be interpreted.
+ *
+ *   - Device special files: Behavior is device-specific. Block devices be=
have
+ *     similarly to regular files. Character device behavior varies.
+ *
+ *   Race condition considerations: Concurrent writes from threads sharing=
 a
+ *   file description race on the file position. Linux 3.14+ provides atom=
ic
+ *   position updates via f_pos_lock for regular files (FMODE_ATOMIC_POS),=
 but
+ *   for maximum safety, use pwrite() for concurrent positioned writes.
+ *
+ *   O_DIRECT writes bypass the page cache and typically require buffer and
+ *   offset alignment to filesystem block size. Query requirements via sta=
tx()
+ *   with STATX_DIOALIGN (Linux 6.1+). Unaligned O_DIRECT writes return EI=
NVAL
+ *   on most filesystems.
+ *
+ *   For zero-copy writes, consider using splice(2), sendfile(2), or vmspl=
ice(2)
+ *   instead of copying data through user-space buffers with write().
+ *
+ *   Partial writes (short writes) must be handled by application code.
+ *   Applications should loop until all data is written or an error occurs.
+ *
+ * since-version: 1.0
+ */
 SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
 		size_t, count)
 {
--=20
2.51.0