From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31C3F2DC339; Thu, 18 Dec 2025 20:42:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090567; cv=none; b=P1pdwJoeNgiBvUx2Q52uTMNAzFN0VnF6gFzpuKXd3ejpmqgLA4/rb3hpeJiKVedUoH5qyPYynSusKVqpetDQYbS08tzk/HJaLeePbpvTpFkpM/3US9ehVXlE9wZrU/6pfSVGdyy0oiTup8dM3+LoK1A5xOYCU0RRjAQJ/PAjH2c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090567; c=relaxed/simple; bh=9NMc/Q7aNJkd1GmaQWWzsh1IwkmCBCQ1oNUHN/qbtOE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SR5BUoLt7NIbThXrTjmBTCn0fz15tQwGn0AwO70Uj71IY6Umh+fuOaREQSjTlw19QcundvI2rGvOz0CRux88Wl1bhcOZ9MEMMoUiamAt2CGkX5vGjwvZuXmqwVFBSLI0Gnk6+TJNe+iFI69OeCHvte3bVcVSoC/uz9bWxfEa8vk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UwiyQujg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UwiyQujg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3EB38C16AAE; Thu, 18 Dec 2025 20:42:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090567; bh=9NMc/Q7aNJkd1GmaQWWzsh1IwkmCBCQ1oNUHN/qbtOE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UwiyQujglDn+exdsGTSJv/gyPtwLNMpqMNpsqEvJU3/EizlQD+LVkqxRtqis2T/Un W2ZDP80RXvrEl45b2pqZ3gTtVOntnL6L0DSDlXzCsD1v7gYvT4qG5ZtbdUwfmNiPPD Lfvse18ZGg6bMwV82idsOLqqYvZ5gIEP/9GmV717WDF+rrxAtpDUpxTHQ4ltpC/avq vD3dqU8MMxrAWBt01aDHipTiZLxlts7asjkea6Txl2ql/CFFJlyDf0t5EiTuiOGIbT W4n+bShvpAU6tiyvqb+z41JmpPIGqv44vq5A/yg4HM9nMsfyZS+J8UDsP1hz1/GP4H c/AJRaxhK8RBg== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 01/15] kernel/api: introduce kernel API specification framework Date: Thu, 18 Dec 2025 15:42:23 -0500 Message-ID: <20251218204239.4159453-2-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add a framework for formally documenting kernel APIs with inline specifications. This framework provides: - Structured API documentation with parameter specifications, return values, error conditions, and execution context requirements - Runtime validation capabilities for debugging (CONFIG_KAPI_RUNTIME_CHECKS) - Export of specifications via debugfs for tooling integration - Support for both internal kernel APIs and system calls The framework stores specifications in a dedicated ELF section and provides infrastructure for: - Compile-time validation of specifications - Runtime querying of API documentation - Machine-readable export formats - Integration with existing SYSCALL_DEFINE macros This commit introduces the core infrastructure without modifying any existing APIs. Subsequent patches will add specifications to individual subsystems. Signed-off-by: Sasha Levin --- .gitignore | 1 + Documentation/dev-tools/kernel-api-spec.rst | 507 ++++++ MAINTAINERS | 9 + include/asm-generic/vmlinux.lds.h | 28 + include/linux/kernel_api_spec.h | 1597 +++++++++++++++++++ include/linux/syscall_api_spec.h | 198 +++ include/linux/syscalls.h | 38 + init/Kconfig | 2 + kernel/Makefile | 3 + kernel/api/Kconfig | 35 + kernel/api/Makefile | 29 + kernel/api/kernel_api_spec.c | 1185 ++++++++++++++ scripts/generate_api_specs.sh | 18 + 13 files changed, 3650 insertions(+) create mode 100644 Documentation/dev-tools/kernel-api-spec.rst create mode 100644 include/linux/kernel_api_spec.h create mode 100644 include/linux/syscall_api_spec.h create mode 100644 kernel/api/Kconfig create mode 100644 kernel/api/Makefile create mode 100644 kernel/api/kernel_api_spec.c create mode 100755 scripts/generate_api_specs.sh diff --git a/.gitignore b/.gitignore index 3a7241c941f5e..7130001e444f1 100644 --- a/.gitignore +++ b/.gitignore @@ -12,6 +12,7 @@ # .* *.a +*.apispec.h *.asn1.[ch] *.bin *.bz2 diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/de= v-tools/kernel-api-spec.rst new file mode 100644 index 0000000000000..3a63f6711e27b --- /dev/null +++ b/Documentation/dev-tools/kernel-api-spec.rst @@ -0,0 +1,507 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Kernel API Specification Framework +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:Author: Sasha Levin +:Date: June 2025 + +.. contents:: Table of Contents + :depth: 3 + :local: + +Introduction +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The Kernel API Specification Framework (KAPI) provides a comprehensive sys= tem for +formally documenting, validating, and introspecting kernel APIs. This fram= ework +addresses the long-standing challenge of maintaining accurate, machine-rea= dable +documentation for the thousands of internal kernel APIs and system calls. + +Purpose and Goals +----------------- + +The framework aims to: + +1. **Improve API Documentation**: Provide structured, inline documentation= that + lives alongside the code and is maintained as part of the development p= rocess. + +2. **Enable Runtime Validation**: Optionally validate API usage at runtime= to catch + common programming errors during development and testing. + +3. **Support Tooling**: Export API specifications in machine-readable form= ats for + use by static analyzers, documentation generators, and development tool= s. + +4. **Enhance Debugging**: Provide detailed API information at runtime thro= ugh debugfs + for debugging and introspection. + +5. **Formalize Contracts**: Explicitly document API contracts including pa= rameter + constraints, execution contexts, locking requirements, and side effects. + +Architecture Overview +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Components +---------- + +The framework consists of several key components: + +1. **Core Framework** (``kernel/api/kernel_api_spec.c``) + + - API specification registration and storage + - Runtime validation engine + - Specification lookup and querying + +2. **DebugFS Interface** (``kernel/api/kapi_debugfs.c``) + + - Runtime introspection via ``/sys/kernel/debug/kapi/`` + - JSON and XML export formats + - Per-API detailed information + +3. **IOCTL Support** (``kernel/api/ioctl_validation.c``) + + - Extended framework for IOCTL specifications + - Automatic validation wrappers + - Structure field validation + +4. **Specification Macros** (``include/linux/kernel_api_spec.h``) + + - Declarative macros for API documentation + - Type-safe parameter specifications + - Context and constraint definitions + +Data Model +---------- + +The framework uses a hierarchical data model:: + + kernel_api_spec + =E2=94=9C=E2=94=80=E2=94=80 Basic Information + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 name (API function name) + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 version (specification version) + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 description (human-readable de= scription) + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 kernel_version (when API was i= ntroduced) + =E2=94=82 + =E2=94=9C=E2=94=80=E2=94=80 Parameters (up to 16) + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 kapi_param_spec + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 name + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 type (int, pointer, string= , etc.) + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 direction (in, out, inout) + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 constraints (range, mask, = enum values) + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 validation rules + =E2=94=82 + =E2=94=9C=E2=94=80=E2=94=80 Return Value + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 kapi_return_spec + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 type + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 success conditions + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 validation rules + =E2=94=82 + =E2=94=9C=E2=94=80=E2=94=80 Error Conditions (up to 32) + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 kapi_error_spec + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 error code + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 condition description + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 recovery advice + =E2=94=82 + =E2=94=9C=E2=94=80=E2=94=80 Execution Context + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 allowed contexts (process, int= errupt, etc.) + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 locking requirements + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 preemption/interrupt state + =E2=94=82 + =E2=94=94=E2=94=80=E2=94=80 Side Effects + =E2=94=9C=E2=94=80=E2=94=80 memory allocation + =E2=94=9C=E2=94=80=E2=94=80 state changes + =E2=94=94=E2=94=80=E2=94=80 signal handling + +Usage Guide +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Basic API Specification +----------------------- + +To document a kernel API, use the specification macros in the implementati= on file: + +.. code-block:: c + + #include + + KAPI_DEFINE_SPEC(kmalloc_spec, kmalloc, "3.0") + KAPI_DESCRIPTION("Allocate kernel memory") + KAPI_PARAM(0, size, KAPI_TYPE_SIZE_T, KAPI_DIR_IN, + "Number of bytes to allocate") + KAPI_PARAM_RANGE(0, 0, KMALLOC_MAX_SIZE) + KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN, + "Allocation flags (GFP_*)") + KAPI_PARAM_MASK(1, __GFP_BITS_MASK) + KAPI_RETURN(KAPI_TYPE_POINTER, "Pointer to allocated memory or NULL") + KAPI_ERROR(ENOMEM, "Out of memory") + KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SOFTIRQ | KAPI_CTX_HARDIRQ) + KAPI_SIDE_EFFECT("Allocates memory from kernel heap") + KAPI_LOCK_NOT_REQUIRED("Any lock") + KAPI_END_SPEC + + void *kmalloc(size_t size, gfp_t flags) + { + /* Implementation */ + } + +System Call Specification +------------------------- + +System calls use specialized macros: + +.. code-block:: c + + KAPI_DEFINE_SYSCALL_SPEC(open_spec, open, "1.0") + KAPI_DESCRIPTION("Open a file") + KAPI_PARAM(0, pathname, KAPI_TYPE_USER_STRING, KAPI_DIR_IN, + "Path to file") + KAPI_PARAM_PATH(0, PATH_MAX) + KAPI_PARAM(1, flags, KAPI_TYPE_FLAGS, KAPI_DIR_IN, + "Open flags (O_*)") + KAPI_PARAM(2, mode, KAPI_TYPE_MODE_T, KAPI_DIR_IN, + "File permissions (if creating)") + KAPI_RETURN(KAPI_TYPE_INT, "File descriptor or -1") + KAPI_ERROR(EACCES, "Permission denied") + KAPI_ERROR(ENOENT, "File does not exist") + KAPI_ERROR(EMFILE, "Too many open files") + KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE) + KAPI_SIGNAL(EINTR, "Open can be interrupted by signal") + KAPI_END_SYSCALL_SPEC + +IOCTL Specification +------------------- + +IOCTLs have extended support for structure validation: + +.. code-block:: c + + KAPI_DEFINE_IOCTL_SPEC(vidioc_querycap_spec, VIDIOC_QUERYCAP, + "VIDIOC_QUERYCAP", + sizeof(struct v4l2_capability), + sizeof(struct v4l2_capability), + "video_fops") + KAPI_DESCRIPTION("Query device capabilities") + KAPI_IOCTL_FIELD(driver, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT, + "Driver name", 16) + KAPI_IOCTL_FIELD(card, KAPI_TYPE_CHAR_ARRAY, KAPI_DIR_OUT, + "Device name", 32) + KAPI_IOCTL_FIELD(version, KAPI_TYPE_U32, KAPI_DIR_OUT, + "Driver version") + KAPI_IOCTL_FIELD(capabilities, KAPI_TYPE_FLAGS, KAPI_DIR_OUT, + "Device capabilities") + KAPI_END_IOCTL_SPEC + +Runtime Validation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Enabling Validation +------------------- + +Runtime validation is controlled by kernel configuration: + +1. Enable ``CONFIG_KAPI_SPEC`` to build the framework +2. Enable ``CONFIG_KAPI_RUNTIME_CHECKS`` for runtime validation +3. Optionally enable ``CONFIG_KAPI_SPEC_DEBUGFS`` for debugfs interface + +Validation Modes +---------------- + +The framework supports several validation modes: + +.. code-block:: c + + /* Enable validation for specific API */ + kapi_enable_validation("kmalloc"); + + /* Enable validation for all APIs */ + kapi_enable_all_validation(); + + /* Set validation level */ + kapi_set_validation_level(KAPI_VALIDATE_FULL); + +Validation Levels: + +- ``KAPI_VALIDATE_NONE``: No validation +- ``KAPI_VALIDATE_BASIC``: Type and NULL checks only +- ``KAPI_VALIDATE_NORMAL``: Basic + range and constraint checks +- ``KAPI_VALIDATE_FULL``: All checks including custom validators + +Custom Validators +----------------- + +APIs can register custom validation functions: + +.. code-block:: c + + static bool validate_buffer_size(const struct kapi_param_spec *spec, + const void *value, void *context) + { + size_t size =3D *(size_t *)value; + struct my_context *ctx =3D context; + + return size > 0 && size <=3D ctx->max_buffer_size; + } + + KAPI_PARAM_CUSTOM_VALIDATOR(0, validate_buffer_size) + +DebugFS Interface +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The debugfs interface provides runtime access to API specifications: + +Directory Structure +------------------- + +:: + + /sys/kernel/debug/kapi/ + =E2=94=9C=E2=94=80=E2=94=80 apis/ # All registered = APIs + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 kmalloc/ + =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 specification # = Human-readable spec + =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 json # J= SON format + =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 xml # X= ML format + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 open/ + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 ... + =E2=94=9C=E2=94=80=E2=94=80 summary # Overview of all= APIs + =E2=94=9C=E2=94=80=E2=94=80 validation/ # Validation cont= rols + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 enabled # Global e= nable/disable + =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 level # Validati= on level + =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 stats # Validati= on statistics + =E2=94=94=E2=94=80=E2=94=80 export/ # Bulk export opt= ions + =E2=94=9C=E2=94=80=E2=94=80 all.json # All specs in JSON + =E2=94=94=E2=94=80=E2=94=80 all.xml # All specs in XML + +Usage Examples +-------------- + +Query specific API:: + + $ cat /sys/kernel/debug/kapi/apis/kmalloc/specification + API: kmalloc + Version: 3.0 + Description: Allocate kernel memory + + Parameters: + [0] size (size_t, in): Number of bytes to allocate + Range: 0 - 4194304 + [1] flags (flags, in): Allocation flags (GFP_*) + Mask: 0x1ffffff + + Returns: pointer - Pointer to allocated memory or NULL + + Errors: + ENOMEM: Out of memory + + Context: process, softirq, hardirq + + Side Effects: + - Allocates memory from kernel heap + +Export all specifications:: + + $ cat /sys/kernel/debug/kapi/export/all.json > kernel-apis.json + +Enable validation for specific API:: + + $ echo 1 > /sys/kernel/debug/kapi/apis/kmalloc/validate + +Performance Considerations +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +Memory Overhead +--------------- + +Each API specification consumes approximately 2-4KB of memory. With thousa= nds +of kernel APIs, this can add up to several megabytes. Consider: + +1. Building with ``CONFIG_KAPI_SPEC=3Dn`` for production kernels +2. Using ``__init`` annotations for APIs only used during boot +3. Implementing lazy loading for rarely used specifications + +Runtime Overhead +---------------- + +When ``CONFIG_KAPI_RUNTIME_CHECKS`` is enabled: + +- Each validated API call adds 50-200ns overhead +- Complex validations (custom validators) may add more +- Use validation only in development/testing kernels + +Optimization Strategies +----------------------- + +1. **Compile-time optimization**: When validation is disabled, all + validation code is optimized away by the compiler. + +2. **Selective validation**: Enable validation only for specific APIs + or subsystems under test. + +3. **Caching**: The framework caches validation results for repeated + calls with identical parameters. + +Documentation Generation +------------------------ + +The framework exports specifications via debugfs that can be used +to generate documentation. Tools for automatic documentation generation +from specifications are planned for future development. + +IDE Integration +--------------- + +Modern IDEs can use the JSON export for: + +- Parameter hints +- Type checking +- Context validation +- Error code documentation + +Testing Framework +----------------- + +The framework includes test helpers:: + + #ifdef CONFIG_KAPI_TESTING + /* Verify API behaves according to specification */ + kapi_test_api("kmalloc", test_cases); + #endif + +Best Practices +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Writing Specifications +---------------------- + +1. **Be Comprehensive**: Document all parameters, errors, and side effects +2. **Keep Updated**: Update specs when API behavior changes +3. **Use Examples**: Include usage examples in descriptions +4. **Validate Constraints**: Define realistic constraints for parameters +5. **Document Context**: Clearly specify allowed execution contexts + +Maintenance +----------- + +1. **Version Specifications**: Increment version when API changes +2. **Deprecation**: Mark deprecated APIs and suggest replacements +3. **Cross-reference**: Link related APIs in descriptions +4. **Test Specifications**: Verify specs match implementation + +Common Patterns +--------------- + +**Optional Parameters**:: + + KAPI_PARAM(2, optional_arg, KAPI_TYPE_POINTER, KAPI_DIR_IN, + "Optional argument (may be NULL)") + KAPI_PARAM_OPTIONAL(2) + +**Variable Arguments**:: + + KAPI_PARAM(1, fmt, KAPI_TYPE_FORMAT_STRING, KAPI_DIR_IN, + "Printf-style format string") + KAPI_PARAM_VARIADIC(2, "Format arguments") + +**Callback Functions**:: + + KAPI_PARAM(1, callback, KAPI_TYPE_FUNCTION_PTR, KAPI_DIR_IN, + "Callback function") + KAPI_PARAM_CALLBACK(1, "int (*)(void *data)", "data") + +Troubleshooting +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Common Issues +------------- + +**Specification Not Found**:: + + kernel: KAPI: Specification for 'my_api' not found + + Solution: Ensure KAPI_DEFINE_SPEC is in the same translation unit + as the function implementation. + +**Validation Failures**:: + + kernel: KAPI: Validation failed for kmalloc parameter 'size': + value 5242880 exceeds maximum 4194304 + + Solution: Check parameter constraints or adjust specification if + the constraint is incorrect. + +**Build Errors**:: + + error: 'KAPI_TYPE_UNKNOWN' undeclared + + Solution: Include and ensure + CONFIG_KAPI_SPEC is enabled. + +Debug Options +------------- + +Enable verbose debugging:: + + echo 8 > /proc/sys/kernel/printk + echo 1 > /sys/kernel/debug/kapi/debug/verbose + +Future Directions +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Planned Features +---------------- + +1. **Automatic Extraction**: Tool to extract specifications from existing + kernel-doc comments + +2. **Contract Verification**: Static analysis to verify implementation + matches specification + +3. **Performance Profiling**: Measure actual API performance against + documented expectations + +4. **Fuzzing Integration**: Use specifications to guide intelligent + fuzzing of kernel APIs + +5. **Version Compatibility**: Track API changes across kernel versions + +Research Areas +-------------- + +1. **Formal Verification**: Use specifications for mathematical proofs + of correctness + +2. **Runtime Monitoring**: Detect specification violations in production + with minimal overhead + +3. **API Evolution**: Analyze how kernel APIs change over time + +4. **Security Applications**: Use specifications for security policy + enforcement + +Contributing +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Submitting Specifications +------------------------- + +1. Add specifications to the same file as the API implementation +2. Follow existing patterns and naming conventions +3. Test with CONFIG_KAPI_RUNTIME_CHECKS enabled +4. Verify debugfs output is correct +5. Run scripts/checkpatch.pl on your changes + +Review Criteria +--------------- + +Specifications will be reviewed for: + +1. **Completeness**: All parameters and errors documented +2. **Accuracy**: Specification matches implementation +3. **Clarity**: Descriptions are clear and helpful +4. **Consistency**: Follows framework conventions +5. **Performance**: No unnecessary runtime overhead + +Contact +------- + +- Maintainer: Sasha Levin diff --git a/MAINTAINERS b/MAINTAINERS index 5b11839cba9de..14cd8b3c95e40 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13647,6 +13647,15 @@ W: https://linuxtv.org T: git git://linuxtv.org/media.git F: drivers/media/radio/radio-keene* =20 +KERNEL API SPECIFICATION FRAMEWORK (KAPI) +M: Sasha Levin +L: linux-api@vger.kernel.org +S: Maintained +F: Documentation/admin-guide/kernel-api-spec.rst +F: include/linux/kernel_api_spec.h +F: kernel/api/ +F: scripts/extract-kapi-spec.sh + KERNEL AUTOMOUNTER M: Ian Kent L: autofs@vger.kernel.org diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinu= x.lds.h index 8ca130af301fc..658a14f8bf309 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -296,6 +296,33 @@ #define TRACE_SYSCALLS() #endif =20 +#ifdef CONFIG_KAPI_SPEC +/* + * KAPI_SPECS - Include kernel API specifications in current section + * + * The .kapi_specs input section has 32-byte alignment requirement from + * the compiler, so we must align to 32 bytes before setting the start + * symbol to avoid padding between the symbol and actual data. + */ +#define KAPI_SPECS() \ + . =3D ALIGN(32); \ + __start_kapi_specs =3D .; \ + KEEP(*(.kapi_specs)) \ + __stop_kapi_specs =3D .; + +/* For placing KAPI specs in a dedicated section */ +#define KAPI_SPECS_SECTION() \ + .kapi_specs : AT(ADDR(.kapi_specs) - LOAD_OFFSET) { \ + . =3D ALIGN(32); \ + __start_kapi_specs =3D .; \ + KEEP(*(.kapi_specs)) \ + __stop_kapi_specs =3D .; \ + } +#else +#define KAPI_SPECS() +#define KAPI_SPECS_SECTION() +#endif + #ifdef CONFIG_BPF_EVENTS #define BPF_RAW_TP() STRUCT_ALIGN(); \ BOUNDED_SECTION_BY(__bpf_raw_tp_map, __bpf_raw_tp) @@ -485,6 +512,7 @@ . =3D ALIGN(8); \ BOUNDED_SECTION_BY(__tracepoints_ptrs, ___tracepoints_ptrs) \ *(__tracepoints_strings)/* Tracepoints: strings */ \ + KAPI_SPECS() \ } \ \ .rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \ diff --git a/include/linux/kernel_api_spec.h b/include/linux/kernel_api_spe= c.h new file mode 100644 index 0000000000000..b3460d156602e --- /dev/null +++ b/include/linux/kernel_api_spec.h @@ -0,0 +1,1597 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * kernel_api_spec.h - Kernel API Formal Specification Framework + * + * This framework provides structures and macros to formally specify kerne= l APIs + * in both human and machine-readable formats. It supports comprehensive d= ocumentation + * of function signatures, parameters, return values, error conditions, an= d constraints. + */ + +#ifndef _LINUX_KERNEL_API_SPEC_H +#define _LINUX_KERNEL_API_SPEC_H + +#include +#include +#include +#include + +struct sigaction; + +#define KAPI_MAX_PARAMS 16 +#define KAPI_MAX_ERRORS 32 +#define KAPI_MAX_CONSTRAINTS 32 +#define KAPI_MAX_SIGNALS 32 +#define KAPI_MAX_NAME_LEN 128 +#define KAPI_MAX_DESC_LEN 512 +#define KAPI_MAX_CAPABILITIES 8 +#define KAPI_MAX_SOCKET_STATES 16 +#define KAPI_MAX_PROTOCOL_BEHAVIORS 8 +#define KAPI_MAX_NET_ERRORS 16 +#define KAPI_MAX_SOCKOPTS 16 +#define KAPI_MAX_ADDR_FAMILIES 8 + +/* Magic numbers for section validation (ASCII mnemonics) */ +#define KAPI_MAGIC_PARAMS 0x4B415031 /* 'KAP1' */ +#define KAPI_MAGIC_RETURN 0x4B415232 /* 'KAR2' */ +#define KAPI_MAGIC_ERRORS 0x4B414533 /* 'KAE3' */ +#define KAPI_MAGIC_LOCKS 0x4B414C34 /* 'KAL4' */ +#define KAPI_MAGIC_CONSTRAINTS 0x4B414335 /* 'KAC5' */ +#define KAPI_MAGIC_INFO 0x4B414936 /* 'KAI6' */ +#define KAPI_MAGIC_SIGNALS 0x4B415337 /* 'KAS7' */ +#define KAPI_MAGIC_SIGMASK 0x4B414D38 /* 'KAM8' */ +#define KAPI_MAGIC_STRUCTS 0x4B415439 /* 'KAT9' */ +#define KAPI_MAGIC_EFFECTS 0x4B414641 /* 'KAFA' */ +#define KAPI_MAGIC_TRANS 0x4B415442 /* 'KATB' */ +#define KAPI_MAGIC_CAPS 0x4B414343 /* 'KACC' */ + +/** + * enum kapi_param_type - Parameter type classification + * @KAPI_TYPE_VOID: void type + * @KAPI_TYPE_INT: Integer types (int, long, etc.) + * @KAPI_TYPE_UINT: Unsigned integer types + * @KAPI_TYPE_PTR: Pointer types + * @KAPI_TYPE_STRUCT: Structure types + * @KAPI_TYPE_UNION: Union types + * @KAPI_TYPE_ENUM: Enumeration types + * @KAPI_TYPE_FUNC_PTR: Function pointer types + * @KAPI_TYPE_ARRAY: Array types + * @KAPI_TYPE_FD: File descriptor - validated in process context + * @KAPI_TYPE_USER_PTR: User space pointer - validated for access and size + * @KAPI_TYPE_PATH: Pathname - validated for access and path limits + * @KAPI_TYPE_CUSTOM: Custom/complex types + */ +enum kapi_param_type { + KAPI_TYPE_VOID =3D 0, + KAPI_TYPE_INT, + KAPI_TYPE_UINT, + KAPI_TYPE_PTR, + KAPI_TYPE_STRUCT, + KAPI_TYPE_UNION, + KAPI_TYPE_ENUM, + KAPI_TYPE_FUNC_PTR, + KAPI_TYPE_ARRAY, + KAPI_TYPE_FD, /* File descriptor - validated in process context */ + KAPI_TYPE_USER_PTR, /* User space pointer - validated for access and size= */ + KAPI_TYPE_PATH, /* Pathname - validated for access and path limits */ + KAPI_TYPE_CUSTOM, +}; + +/** + * enum kapi_param_flags - Parameter attribute flags + * @KAPI_PARAM_IN: Input parameter + * @KAPI_PARAM_OUT: Output parameter + * @KAPI_PARAM_INOUT: Input/output parameter + * @KAPI_PARAM_OPTIONAL: Optional parameter (can be NULL) + * @KAPI_PARAM_CONST: Const qualified parameter + * @KAPI_PARAM_VOLATILE: Volatile qualified parameter + * @KAPI_PARAM_USER: User space pointer + * @KAPI_PARAM_DMA: DMA-capable memory required + * @KAPI_PARAM_ALIGNED: Alignment requirements + */ +enum kapi_param_flags { + KAPI_PARAM_IN =3D (1 << 0), + KAPI_PARAM_OUT =3D (1 << 1), + KAPI_PARAM_INOUT =3D (1 << 2), + KAPI_PARAM_OPTIONAL =3D (1 << 3), + KAPI_PARAM_CONST =3D (1 << 4), + KAPI_PARAM_VOLATILE =3D (1 << 5), + KAPI_PARAM_USER =3D (1 << 6), + KAPI_PARAM_DMA =3D (1 << 7), + KAPI_PARAM_ALIGNED =3D (1 << 8), +}; + +/** + * enum kapi_context_flags - Function execution context flags + * @KAPI_CTX_PROCESS: Can be called from process context + * @KAPI_CTX_SOFTIRQ: Can be called from softirq context + * @KAPI_CTX_HARDIRQ: Can be called from hardirq context + * @KAPI_CTX_NMI: Can be called from NMI context + * @KAPI_CTX_ATOMIC: Must be called in atomic context + * @KAPI_CTX_SLEEPABLE: May sleep + * @KAPI_CTX_PREEMPT_DISABLED: Requires preemption disabled + * @KAPI_CTX_IRQ_DISABLED: Requires interrupts disabled + */ +enum kapi_context_flags { + KAPI_CTX_PROCESS =3D (1 << 0), + KAPI_CTX_SOFTIRQ =3D (1 << 1), + KAPI_CTX_HARDIRQ =3D (1 << 2), + KAPI_CTX_NMI =3D (1 << 3), + KAPI_CTX_ATOMIC =3D (1 << 4), + KAPI_CTX_SLEEPABLE =3D (1 << 5), + KAPI_CTX_PREEMPT_DISABLED =3D (1 << 6), + KAPI_CTX_IRQ_DISABLED =3D (1 << 7), +}; + +/** + * enum kapi_lock_type - Lock types used/required by the function + * @KAPI_LOCK_NONE: No locking requirements + * @KAPI_LOCK_MUTEX: Mutex lock + * @KAPI_LOCK_SPINLOCK: Spinlock + * @KAPI_LOCK_RWLOCK: Read-write lock + * @KAPI_LOCK_SEQLOCK: Sequence lock + * @KAPI_LOCK_RCU: RCU lock + * @KAPI_LOCK_SEMAPHORE: Semaphore + * @KAPI_LOCK_CUSTOM: Custom locking mechanism + */ +enum kapi_lock_type { + KAPI_LOCK_NONE =3D 0, + KAPI_LOCK_MUTEX, + KAPI_LOCK_SPINLOCK, + KAPI_LOCK_RWLOCK, + KAPI_LOCK_SEQLOCK, + KAPI_LOCK_RCU, + KAPI_LOCK_SEMAPHORE, + KAPI_LOCK_CUSTOM, +}; + +/** + * enum kapi_constraint_type - Types of parameter constraints + * @KAPI_CONSTRAINT_NONE: No constraint + * @KAPI_CONSTRAINT_RANGE: Numeric range constraint + * @KAPI_CONSTRAINT_MASK: Bitmask constraint + * @KAPI_CONSTRAINT_ENUM: Enumerated values constraint + * @KAPI_CONSTRAINT_ALIGNMENT: Alignment constraint (must be aligned to sp= ecified boundary) + * @KAPI_CONSTRAINT_POWER_OF_TWO: Value must be a power of two + * @KAPI_CONSTRAINT_PAGE_ALIGNED: Value must be page-aligned + * @KAPI_CONSTRAINT_NONZERO: Value must be non-zero + * @KAPI_CONSTRAINT_USER_STRING: Userspace null-terminated string with len= gth range + * @KAPI_CONSTRAINT_USER_PATH: Userspace pathname string (validated for ac= cessibility and PATH_MAX) + * @KAPI_CONSTRAINT_USER_PTR: Userspace pointer (validated for accessibili= ty and size) + * @KAPI_CONSTRAINT_CUSTOM: Custom validation function + */ +enum kapi_constraint_type { + KAPI_CONSTRAINT_NONE =3D 0, + KAPI_CONSTRAINT_RANGE, + KAPI_CONSTRAINT_MASK, + KAPI_CONSTRAINT_ENUM, + KAPI_CONSTRAINT_ALIGNMENT, + KAPI_CONSTRAINT_POWER_OF_TWO, + KAPI_CONSTRAINT_PAGE_ALIGNED, + KAPI_CONSTRAINT_NONZERO, + KAPI_CONSTRAINT_USER_STRING, + KAPI_CONSTRAINT_USER_PATH, + KAPI_CONSTRAINT_USER_PTR, + KAPI_CONSTRAINT_CUSTOM, +}; + +/** + * struct kapi_param_spec - Parameter specification + * @name: Parameter name + * @type_name: Type name as string + * @type: Parameter type classification + * @flags: Parameter attribute flags + * @size: Size in bytes (for arrays/buffers) + * @alignment: Required alignment + * @min_value: Minimum valid value (for numeric types) + * @max_value: Maximum valid value (for numeric types) + * @valid_mask: Valid bits mask (for flag parameters) + * @enum_values: Array of valid enumerated values + * @enum_count: Number of valid enumerated values + * @constraint_type: Type of constraint applied + * @validate: Custom validation function + * @description: Human-readable description + * @constraints: Additional constraints description + * @size_param_idx: Index of parameter that determines size (-1 if fixed s= ize) + * @size_multiplier: Multiplier for size calculation (e.g., sizeof(struct)) + */ +struct kapi_param_spec { + char name[KAPI_MAX_NAME_LEN]; + char type_name[KAPI_MAX_NAME_LEN]; + enum kapi_param_type type; + u32 flags; + size_t size; + size_t alignment; + s64 min_value; + s64 max_value; + u64 valid_mask; + const s64 *enum_values; + u32 enum_count; + enum kapi_constraint_type constraint_type; + bool (*validate)(s64 value); + char description[KAPI_MAX_DESC_LEN]; + char constraints[KAPI_MAX_DESC_LEN]; + int size_param_idx; /* Index of param that determines size, -1 if N/A */ + size_t size_multiplier; /* Size per unit (e.g., sizeof(struct epoll_event= )) */ +} __attribute__((packed)); + +/** + * struct kapi_error_spec - Error condition specification + * @error_code: Error code value + * @name: Error code name (e.g., "EINVAL") + * @condition: Condition that triggers this error + * @description: Detailed error description + */ +struct kapi_error_spec { + int error_code; + char name[KAPI_MAX_NAME_LEN]; + char condition[KAPI_MAX_DESC_LEN]; + char description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * enum kapi_return_check_type - Return value check types + * @KAPI_RETURN_EXACT: Success is an exact value + * @KAPI_RETURN_RANGE: Success is within a range + * @KAPI_RETURN_ERROR_CHECK: Success is when NOT in error list + * @KAPI_RETURN_FD: Return value is a file descriptor (>=3D 0 is success) + * @KAPI_RETURN_CUSTOM: Custom validation function + * @KAPI_RETURN_NO_RETURN: Function does not return (e.g., exec on success) + */ +enum kapi_return_check_type { + KAPI_RETURN_EXACT, + KAPI_RETURN_RANGE, + KAPI_RETURN_ERROR_CHECK, + KAPI_RETURN_FD, + KAPI_RETURN_CUSTOM, + KAPI_RETURN_NO_RETURN, +}; + +/** + * struct kapi_return_spec - Return value specification + * @type_name: Return type name + * @type: Return type classification + * @check_type: Type of success check to perform + * @success_value: Exact value indicating success (for EXACT) + * @success_min: Minimum success value (for RANGE) + * @success_max: Maximum success value (for RANGE) + * @error_values: Array of error values (for ERROR_CHECK) + * @error_count: Number of error values + * @is_success: Custom function to check success + * @description: Return value description + */ +struct kapi_return_spec { + char type_name[KAPI_MAX_NAME_LEN]; + enum kapi_param_type type; + enum kapi_return_check_type check_type; + s64 success_value; + s64 success_min; + s64 success_max; + const s64 *error_values; + u32 error_count; + bool (*is_success)(s64 retval); + char description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * enum kapi_lock_scope - Lock acquisition/release scope + * @KAPI_LOCK_INTERNAL: Lock is acquired and released within the function = (common case) + * @KAPI_LOCK_ACQUIRES: Function acquires lock but does not release it + * @KAPI_LOCK_RELEASES: Function releases lock (must be held on entry) + * @KAPI_LOCK_CALLER_HELD: Lock must be held by caller throughout the call + */ +enum kapi_lock_scope { + KAPI_LOCK_INTERNAL =3D 0, + KAPI_LOCK_ACQUIRES, + KAPI_LOCK_RELEASES, + KAPI_LOCK_CALLER_HELD, +}; + +/** + * struct kapi_lock_spec - Lock requirement specification + * @lock_name: Name of the lock + * @lock_type: Type of lock + * @scope: Lock scope (internal, acquires, releases, or caller-held) + * @description: Additional lock requirements + */ +struct kapi_lock_spec { + char lock_name[KAPI_MAX_NAME_LEN]; + enum kapi_lock_type lock_type; + enum kapi_lock_scope scope; + char description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * struct kapi_constraint_spec - Additional constraint specification + * @name: Constraint name + * @description: Constraint description + * @expression: Formal expression (if applicable) + */ +struct kapi_constraint_spec { + char name[KAPI_MAX_NAME_LEN]; + char description[KAPI_MAX_DESC_LEN]; + char expression[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * enum kapi_signal_direction - Signal flow direction + * @KAPI_SIGNAL_RECEIVE: Function may receive this signal + * @KAPI_SIGNAL_SEND: Function may send this signal + * @KAPI_SIGNAL_HANDLE: Function handles this signal specially + * @KAPI_SIGNAL_BLOCK: Function blocks this signal + * @KAPI_SIGNAL_IGNORE: Function ignores this signal + */ +enum kapi_signal_direction { + KAPI_SIGNAL_RECEIVE =3D (1 << 0), + KAPI_SIGNAL_SEND =3D (1 << 1), + KAPI_SIGNAL_HANDLE =3D (1 << 2), + KAPI_SIGNAL_BLOCK =3D (1 << 3), + KAPI_SIGNAL_IGNORE =3D (1 << 4), +}; + +/** + * enum kapi_signal_action - What the function does with the signal + * @KAPI_SIGNAL_ACTION_DEFAULT: Default signal action applies + * @KAPI_SIGNAL_ACTION_TERMINATE: Causes termination + * @KAPI_SIGNAL_ACTION_COREDUMP: Causes termination with core dump + * @KAPI_SIGNAL_ACTION_STOP: Stops the process + * @KAPI_SIGNAL_ACTION_CONTINUE: Continues a stopped process + * @KAPI_SIGNAL_ACTION_CUSTOM: Custom handling described in notes + * @KAPI_SIGNAL_ACTION_RETURN: Returns from syscall with EINTR + * @KAPI_SIGNAL_ACTION_RESTART: Restarts the syscall + * @KAPI_SIGNAL_ACTION_QUEUE: Queues the signal for later delivery + * @KAPI_SIGNAL_ACTION_DISCARD: Discards the signal + * @KAPI_SIGNAL_ACTION_TRANSFORM: Transforms to another signal + */ +enum kapi_signal_action { + KAPI_SIGNAL_ACTION_DEFAULT =3D 0, + KAPI_SIGNAL_ACTION_TERMINATE, + KAPI_SIGNAL_ACTION_COREDUMP, + KAPI_SIGNAL_ACTION_STOP, + KAPI_SIGNAL_ACTION_CONTINUE, + KAPI_SIGNAL_ACTION_CUSTOM, + KAPI_SIGNAL_ACTION_RETURN, + KAPI_SIGNAL_ACTION_RESTART, + KAPI_SIGNAL_ACTION_QUEUE, + KAPI_SIGNAL_ACTION_DISCARD, + KAPI_SIGNAL_ACTION_TRANSFORM, +}; + +/** + * struct kapi_signal_spec - Signal specification + * @signal_num: Signal number (e.g., SIGKILL, SIGTERM) + * @signal_name: Signal name as string + * @direction: Direction flags (OR of kapi_signal_direction) + * @action: What happens when signal is received + * @target: Description of target process/thread for sent signals + * @condition: Condition under which signal is sent/received/handled + * @description: Detailed description of signal handling + * @restartable: Whether syscall is restartable after this signal + * @sa_flags_required: Required signal action flags (SA_*) + * @sa_flags_forbidden: Forbidden signal action flags + * @error_on_signal: Error code returned when signal occurs (-EINTR, etc) + * @transform_to: Signal number to transform to (if action is TRANSFORM) + * @timing: When signal can occur ("entry", "during", "exit", "anytime") + * @priority: Signal handling priority (lower processed first) + * @interruptible: Whether this operation is interruptible by this signal + * @queue_behavior: How signal is queued ("realtime", "standard", "coalesc= e") + * @state_required: Required process state for signal to be delivered + * @state_forbidden: Forbidden process state for signal delivery + */ +struct kapi_signal_spec { + int signal_num; + char signal_name[32]; + u32 direction; + enum kapi_signal_action action; + char target[KAPI_MAX_DESC_LEN]; + char condition[KAPI_MAX_DESC_LEN]; + char description[KAPI_MAX_DESC_LEN]; + bool restartable; + u32 sa_flags_required; + u32 sa_flags_forbidden; + int error_on_signal; + int transform_to; + char timing[32]; + u8 priority; + bool interruptible; + char queue_behavior[128]; + u32 state_required; + u32 state_forbidden; +} __attribute__((packed)); + +/** + * struct kapi_signal_mask_spec - Signal mask specification + * @mask_name: Name of the signal mask + * @signals: Array of signal numbers in the mask + * @signal_count: Number of signals in the mask + * @description: Description of what this mask represents + */ +struct kapi_signal_mask_spec { + char mask_name[KAPI_MAX_NAME_LEN]; + int signals[KAPI_MAX_SIGNALS]; + u32 signal_count; + char description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * struct kapi_struct_field - Structure field specification + * @name: Field name + * @type: Field type classification + * @type_name: Type name as string + * @offset: Offset within structure + * @size: Size of field in bytes + * @flags: Field attribute flags + * @constraint_type: Type of constraint applied + * @min_value: Minimum valid value (for numeric types) + * @max_value: Maximum valid value (for numeric types) + * @valid_mask: Valid bits mask (for flag fields) + * @enum_values: Comma-separated list of valid enum values (for enum types) + * @description: Field description + */ +struct kapi_struct_field { + char name[KAPI_MAX_NAME_LEN]; + enum kapi_param_type type; + char type_name[KAPI_MAX_NAME_LEN]; + size_t offset; + size_t size; + u32 flags; + enum kapi_constraint_type constraint_type; + s64 min_value; + s64 max_value; + u64 valid_mask; + char enum_values[KAPI_MAX_DESC_LEN]; /* Comma-separated list of valid enu= m values */ + char description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * struct kapi_struct_spec - Structure type specification + * @name: Structure name + * @size: Total size of structure + * @alignment: Required alignment + * @field_count: Number of fields + * @fields: Field specifications + * @description: Structure description + */ +struct kapi_struct_spec { + char name[KAPI_MAX_NAME_LEN]; + size_t size; + size_t alignment; + u32 field_count; + struct kapi_struct_field fields[KAPI_MAX_PARAMS]; + char description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * enum kapi_capability_action - What the capability allows + * @KAPI_CAP_BYPASS_CHECK: Bypasses a check entirely + * @KAPI_CAP_INCREASE_LIMIT: Increases or removes a limit + * @KAPI_CAP_OVERRIDE_RESTRICTION: Overrides a restriction + * @KAPI_CAP_GRANT_PERMISSION: Grants permission that would otherwise be d= enied + * @KAPI_CAP_MODIFY_BEHAVIOR: Changes the behavior of the operation + * @KAPI_CAP_ACCESS_RESOURCE: Allows access to restricted resources + * @KAPI_CAP_PERFORM_OPERATION: Allows performing a privileged operation + */ +enum kapi_capability_action { + KAPI_CAP_BYPASS_CHECK =3D 0, + KAPI_CAP_INCREASE_LIMIT, + KAPI_CAP_OVERRIDE_RESTRICTION, + KAPI_CAP_GRANT_PERMISSION, + KAPI_CAP_MODIFY_BEHAVIOR, + KAPI_CAP_ACCESS_RESOURCE, + KAPI_CAP_PERFORM_OPERATION, +}; + +/** + * struct kapi_capability_spec - Capability requirement specification + * @capability: The capability constant (e.g., CAP_IPC_LOCK) + * @cap_name: Capability name as string + * @action: What the capability allows (kapi_capability_action) + * @allows: Description of what the capability allows + * @without_cap: What happens without the capability + * @check_condition: Condition when capability is checked + * @priority: Check priority (lower checked first) + * @alternative: Alternative capabilities that can be used + * @alternative_count: Number of alternative capabilities + */ +struct kapi_capability_spec { + int capability; + char cap_name[KAPI_MAX_NAME_LEN]; + enum kapi_capability_action action; + char allows[KAPI_MAX_DESC_LEN]; + char without_cap[KAPI_MAX_DESC_LEN]; + char check_condition[KAPI_MAX_DESC_LEN]; + u8 priority; + int alternative[KAPI_MAX_CAPABILITIES]; + u32 alternative_count; +} __attribute__((packed)); + +/** + * enum kapi_side_effect_type - Types of side effects + * @KAPI_EFFECT_NONE: No side effects + * @KAPI_EFFECT_ALLOC_MEMORY: Allocates memory + * @KAPI_EFFECT_FREE_MEMORY: Frees memory + * @KAPI_EFFECT_MODIFY_STATE: Modifies global/shared state + * @KAPI_EFFECT_SIGNAL_SEND: Sends signals + * @KAPI_EFFECT_FILE_POSITION: Modifies file position + * @KAPI_EFFECT_LOCK_ACQUIRE: Acquires locks + * @KAPI_EFFECT_LOCK_RELEASE: Releases locks + * @KAPI_EFFECT_RESOURCE_CREATE: Creates system resources (FDs, PIDs, etc) + * @KAPI_EFFECT_RESOURCE_DESTROY: Destroys system resources + * @KAPI_EFFECT_SCHEDULE: May cause scheduling/context switch + * @KAPI_EFFECT_HARDWARE: Interacts with hardware + * @KAPI_EFFECT_NETWORK: Network I/O operation + * @KAPI_EFFECT_FILESYSTEM: Filesystem modification + * @KAPI_EFFECT_PROCESS_STATE: Modifies process state + * @KAPI_EFFECT_IRREVERSIBLE: Effect cannot be undone + */ +enum kapi_side_effect_type { + KAPI_EFFECT_NONE =3D 0, + KAPI_EFFECT_ALLOC_MEMORY =3D (1 << 0), + KAPI_EFFECT_FREE_MEMORY =3D (1 << 1), + KAPI_EFFECT_MODIFY_STATE =3D (1 << 2), + KAPI_EFFECT_SIGNAL_SEND =3D (1 << 3), + KAPI_EFFECT_FILE_POSITION =3D (1 << 4), + KAPI_EFFECT_LOCK_ACQUIRE =3D (1 << 5), + KAPI_EFFECT_LOCK_RELEASE =3D (1 << 6), + KAPI_EFFECT_RESOURCE_CREATE =3D (1 << 7), + KAPI_EFFECT_RESOURCE_DESTROY =3D (1 << 8), + KAPI_EFFECT_SCHEDULE =3D (1 << 9), + KAPI_EFFECT_HARDWARE =3D (1 << 10), + KAPI_EFFECT_NETWORK =3D (1 << 11), + KAPI_EFFECT_FILESYSTEM =3D (1 << 12), + KAPI_EFFECT_PROCESS_STATE =3D (1 << 13), + KAPI_EFFECT_IRREVERSIBLE =3D (1 << 14), +}; + +/** + * struct kapi_side_effect - Side effect specification + * @type: Bitmask of effect types + * @target: What is affected (e.g., "process memory", "file descriptor tab= le") + * @condition: Condition under which effect occurs + * @description: Detailed description of the effect + * @reversible: Whether the effect can be undone + */ +struct kapi_side_effect { + u32 type; + char target[KAPI_MAX_NAME_LEN]; + char condition[KAPI_MAX_DESC_LEN]; + char description[KAPI_MAX_DESC_LEN]; + bool reversible; +} __attribute__((packed)); + +/** + * struct kapi_state_transition - State transition specification + * @from_state: Starting state description + * @to_state: Ending state description + * @condition: Condition for transition + * @object: Object whose state changes + * @description: Detailed description + */ +struct kapi_state_transition { + char from_state[KAPI_MAX_NAME_LEN]; + char to_state[KAPI_MAX_NAME_LEN]; + char condition[KAPI_MAX_DESC_LEN]; + char object[KAPI_MAX_NAME_LEN]; + char description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +#define KAPI_MAX_STRUCT_SPECS 8 +#define KAPI_MAX_SIDE_EFFECTS 32 +#define KAPI_MAX_STATE_TRANS 8 + +/** + * enum kapi_socket_state - Socket states for state machine + */ +enum kapi_socket_state { + KAPI_SOCK_STATE_UNSPEC =3D 0, + KAPI_SOCK_STATE_CLOSED, + KAPI_SOCK_STATE_OPEN, + KAPI_SOCK_STATE_BOUND, + KAPI_SOCK_STATE_LISTEN, + KAPI_SOCK_STATE_SYN_SENT, + KAPI_SOCK_STATE_SYN_RECV, + KAPI_SOCK_STATE_ESTABLISHED, + KAPI_SOCK_STATE_FIN_WAIT1, + KAPI_SOCK_STATE_FIN_WAIT2, + KAPI_SOCK_STATE_CLOSE_WAIT, + KAPI_SOCK_STATE_CLOSING, + KAPI_SOCK_STATE_LAST_ACK, + KAPI_SOCK_STATE_TIME_WAIT, + KAPI_SOCK_STATE_CONNECTED, + KAPI_SOCK_STATE_DISCONNECTED, +}; + +/** + * enum kapi_socket_protocol - Socket protocol types + */ +enum kapi_socket_protocol { + KAPI_PROTO_TCP =3D (1 << 0), + KAPI_PROTO_UDP =3D (1 << 1), + KAPI_PROTO_UNIX =3D (1 << 2), + KAPI_PROTO_RAW =3D (1 << 3), + KAPI_PROTO_PACKET =3D (1 << 4), + KAPI_PROTO_NETLINK =3D (1 << 5), + KAPI_PROTO_SCTP =3D (1 << 6), + KAPI_PROTO_DCCP =3D (1 << 7), + KAPI_PROTO_ALL =3D 0xFFFFFFFF, +}; + +/** + * enum kapi_buffer_behavior - Network buffer handling behaviors + */ +enum kapi_buffer_behavior { + KAPI_BUF_PEEK =3D (1 << 0), + KAPI_BUF_TRUNCATE =3D (1 << 1), + KAPI_BUF_SCATTER =3D (1 << 2), + KAPI_BUF_ZERO_COPY =3D (1 << 3), + KAPI_BUF_KERNEL_ALLOC =3D (1 << 4), + KAPI_BUF_DMA_CAPABLE =3D (1 << 5), + KAPI_BUF_FRAGMENT =3D (1 << 6), +}; + +/** + * enum kapi_async_behavior - Asynchronous operation behaviors + */ +enum kapi_async_behavior { + KAPI_ASYNC_BLOCK =3D 0, + KAPI_ASYNC_NONBLOCK =3D (1 << 0), + KAPI_ASYNC_POLL_READY =3D (1 << 1), + KAPI_ASYNC_SIGNAL_DRIVEN =3D (1 << 2), + KAPI_ASYNC_AIO =3D (1 << 3), + KAPI_ASYNC_IO_URING =3D (1 << 4), + KAPI_ASYNC_EPOLL =3D (1 << 5), +}; + +/** + * struct kapi_socket_state_spec - Socket state requirement/transition + */ +struct kapi_socket_state_spec { + enum kapi_socket_state required_states[KAPI_MAX_SOCKET_STATES]; + u32 required_state_count; + enum kapi_socket_state forbidden_states[KAPI_MAX_SOCKET_STATES]; + u32 forbidden_state_count; + enum kapi_socket_state resulting_state; + char state_condition[KAPI_MAX_DESC_LEN]; + u32 applicable_protocols; +} __attribute__((packed)); + +/** + * struct kapi_protocol_behavior - Protocol-specific behavior + */ +struct kapi_protocol_behavior { + u32 applicable_protocols; + char behavior[KAPI_MAX_DESC_LEN]; + s64 protocol_flags; + char flag_description[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * struct kapi_buffer_spec - Network buffer specification + */ +struct kapi_buffer_spec { + u32 buffer_behaviors; + size_t min_buffer_size; + size_t max_buffer_size; + size_t optimal_buffer_size; + char fragmentation_rules[KAPI_MAX_DESC_LEN]; + bool can_partial_transfer; + char partial_transfer_rules[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * struct kapi_async_spec - Asynchronous behavior specification + */ +struct kapi_async_spec { + enum kapi_async_behavior supported_modes; + int nonblock_errno; + u32 poll_events_in; + u32 poll_events_out; + char completion_condition[KAPI_MAX_DESC_LEN]; + bool supports_timeout; + char timeout_behavior[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/** + * struct kapi_addr_family_spec - Address family specification + */ +struct kapi_addr_family_spec { + int family; + char family_name[32]; + size_t addr_struct_size; + size_t min_addr_len; + size_t max_addr_len; + char addr_format[KAPI_MAX_DESC_LEN]; + bool supports_wildcard; + bool supports_multicast; + bool supports_broadcast; + char special_addresses[KAPI_MAX_DESC_LEN]; + u32 port_range_min; + u32 port_range_max; +} __attribute__((packed)); + +/** + * struct kernel_api_spec - Complete kernel API specification + * @name: Function name + * @version: API version + * @description: Brief description + * @long_description: Detailed description + * @context_flags: Execution context flags + * @param_count: Number of parameters + * @params: Parameter specifications + * @return_spec: Return value specification + * @error_count: Number of possible errors + * @errors: Error specifications + * @lock_count: Number of lock specifications + * @locks: Lock requirement specifications + * @constraint_count: Number of additional constraints + * @constraints: Additional constraint specifications + * @examples: Usage examples + * @notes: Additional notes + * @since_version: Kernel version when introduced + * @signal_count: Number of signal specifications + * @signals: Signal handling specifications + * @signal_mask_count: Number of signal mask specifications + * @signal_masks: Signal mask specifications + * @struct_spec_count: Number of structure specifications + * @struct_specs: Structure type specifications + * @side_effect_count: Number of side effect specifications + * @side_effects: Side effect specifications + * @state_trans_count: Number of state transition specifications + * @state_transitions: State transition specifications + */ +struct kernel_api_spec { + char name[KAPI_MAX_NAME_LEN]; + u32 version; + char description[KAPI_MAX_DESC_LEN]; + char long_description[KAPI_MAX_DESC_LEN * 4]; + u32 context_flags; + + /* Parameters */ + u32 param_magic; /* 0x4B415031 =3D 'KAP1' */ + u32 param_count; + struct kapi_param_spec params[KAPI_MAX_PARAMS]; + + /* Return value */ + u32 return_magic; /* 0x4B415232 =3D 'KAR2' */ + struct kapi_return_spec return_spec; + + /* Errors */ + u32 error_magic; /* 0x4B414533 =3D 'KAE3' */ + u32 error_count; + struct kapi_error_spec errors[KAPI_MAX_ERRORS]; + + /* Locking */ + u32 lock_magic; /* 0x4B414C34 =3D 'KAL4' */ + u32 lock_count; + struct kapi_lock_spec locks[KAPI_MAX_CONSTRAINTS]; + + /* Constraints */ + u32 constraint_magic; /* 0x4B414335 =3D 'KAC5' */ + u32 constraint_count; + struct kapi_constraint_spec constraints[KAPI_MAX_CONSTRAINTS]; + + /* Additional information */ + u32 info_magic; /* 0x4B414936 =3D 'KAI6' */ + char examples[KAPI_MAX_DESC_LEN * 2]; + char notes[KAPI_MAX_DESC_LEN * 2]; + char since_version[32]; + + /* Signal specifications */ + u32 signal_magic; /* 0x4B415337 =3D 'KAS7' */ + u32 signal_count; + struct kapi_signal_spec signals[KAPI_MAX_SIGNALS]; + + /* Signal mask specifications */ + u32 sigmask_magic; /* 0x4B414D38 =3D 'KAM8' */ + u32 signal_mask_count; + struct kapi_signal_mask_spec signal_masks[KAPI_MAX_SIGNALS]; + + /* Structure specifications */ + u32 struct_magic; /* 0x4B415439 =3D 'KAT9' */ + u32 struct_spec_count; + struct kapi_struct_spec struct_specs[KAPI_MAX_STRUCT_SPECS]; + + /* Side effects */ + u32 effect_magic; /* 0x4B414641 =3D 'KAFA' */ + u32 side_effect_count; + struct kapi_side_effect side_effects[KAPI_MAX_SIDE_EFFECTS]; + + /* State transitions */ + u32 trans_magic; /* 0x4B415442 =3D 'KATB' */ + u32 state_trans_count; + struct kapi_state_transition state_transitions[KAPI_MAX_STATE_TRANS]; + + /* Capability specifications */ + u32 cap_magic; /* 0x4B414343 =3D 'KACC' */ + u32 capability_count; + struct kapi_capability_spec capabilities[KAPI_MAX_CAPABILITIES]; + + /* Extended fields for socket and network operations */ + struct kapi_socket_state_spec socket_state; + struct kapi_protocol_behavior protocol_behaviors[KAPI_MAX_PROTOCOL_BEHAVI= ORS]; + u32 protocol_behavior_count; + struct kapi_buffer_spec buffer_spec; + struct kapi_async_spec async_spec; + struct kapi_addr_family_spec addr_families[KAPI_MAX_ADDR_FAMILIES]; + u32 addr_family_count; + + /* Operation characteristics */ + bool is_connection_oriented; + bool is_message_oriented; + bool supports_oob_data; + bool supports_peek; + bool supports_select_poll; + bool is_reentrant; + + /* Semantic descriptions */ + char connection_establishment[KAPI_MAX_DESC_LEN]; + char connection_termination[KAPI_MAX_DESC_LEN]; + char data_transfer_semantics[KAPI_MAX_DESC_LEN]; +} __attribute__((packed)); + +/* Macros for defining API specifications */ + +/** + * DEFINE_KERNEL_API_SPEC - Define a kernel API specification + * @func_name: Function name to specify + */ +#define DEFINE_KERNEL_API_SPEC(func_name) \ + static struct kernel_api_spec __kapi_spec_##func_name \ + __used __section(".kapi_specs") =3D { \ + .name =3D __stringify(func_name), \ + .version =3D 1, + +#define KAPI_END_SPEC }; + +/** + * KAPI_DESCRIPTION - Set API description + * @desc: Description string + */ +#define KAPI_DESCRIPTION(desc) \ + .description =3D desc, + +/** + * KAPI_LONG_DESC - Set detailed API description + * @desc: Detailed description string + */ +#define KAPI_LONG_DESC(desc) \ + .long_description =3D desc, + +/** + * KAPI_CONTEXT - Set execution context flags + * @flags: Context flags (OR'ed KAPI_CTX_* values) + */ +#define KAPI_CONTEXT(flags) \ + .context_flags =3D flags, + +/** + * KAPI_PARAM - Define a parameter specification + * @idx: Parameter index (0-based) + * @pname: Parameter name + * @ptype: Type name string + * @pdesc: Parameter description + */ +#define KAPI_PARAM(idx, pname, ptype, pdesc) \ + .params[idx] =3D { \ + .name =3D pname, \ + .type_name =3D ptype, \ + .description =3D pdesc, \ + .size_param_idx =3D -1, /* Default: no dynamic sizing */ + +#define KAPI_PARAM_TYPE(ptype) \ + .type =3D ptype, + +#define KAPI_PARAM_FLAGS(pflags) \ + .flags =3D pflags, + +#define KAPI_PARAM_SIZE(psize) \ + .size =3D psize, + +#define KAPI_PARAM_RANGE(pmin, pmax) \ + .min_value =3D pmin, \ + .max_value =3D pmax, + +#define KAPI_PARAM_CONSTRAINT_TYPE(ctype) \ + .constraint_type =3D ctype, + +#define KAPI_PARAM_CONSTRAINT(desc) \ + .constraints =3D desc, + +#define KAPI_PARAM_VALID_MASK(mask) \ + .valid_mask =3D mask, + +#define KAPI_PARAM_ENUM_VALUES(values) \ + .enum_values =3D values, \ + .enum_count =3D ARRAY_SIZE(values), + +#define KAPI_PARAM_ALIGNMENT(align) \ + .alignment =3D align, + +#define KAPI_PARAM_SIZE_PARAM(idx) \ + .size_param_idx =3D idx, + +#define KAPI_PARAM_END }, + +/** + * KAPI_PARAM_COUNT - Set the number of parameters + * @n: Number of parameters + */ +#define KAPI_PARAM_COUNT(n) \ + .param_magic =3D KAPI_MAGIC_PARAMS, \ + .param_count =3D n, + +/** + * KAPI_RETURN - Define return value specification + * @rtype: Return type name + * @rdesc: Return value description + */ +#define KAPI_RETURN(rtype, rdesc) \ + .return_spec =3D { \ + .type_name =3D rtype, \ + .description =3D rdesc, + +#define KAPI_RETURN_SUCCESS(val) \ + .success_value =3D val, + +#define KAPI_RETURN_TYPE(rtype) \ + .type =3D rtype, + +#define KAPI_RETURN_CHECK_TYPE(ctype) \ + .check_type =3D ctype, + +#define KAPI_RETURN_ERROR_VALUES(values) \ + .error_values =3D values, + +#define KAPI_RETURN_ERROR_COUNT(count) \ + .error_count =3D count, + +#define KAPI_RETURN_SUCCESS_RANGE(min, max) \ + .success_min =3D min, \ + .success_max =3D max, + +#define KAPI_RETURN_END }, + +/** + * KAPI_ERROR - Define an error condition + * @idx: Error index + * @ecode: Error code value + * @ename: Error name + * @econd: Error condition + * @edesc: Error description + */ +#define KAPI_ERROR(idx, ecode, ename, econd, edesc) \ + .errors[idx] =3D { \ + .error_code =3D ecode, \ + .name =3D ename, \ + .condition =3D econd, \ + .description =3D edesc, \ + }, + +/** + * KAPI_ERROR_COUNT - Set the number of errors + * @n: Number of errors + */ +#define KAPI_ERROR_COUNT(n) \ + .error_magic =3D KAPI_MAGIC_ERRORS, \ + .error_count =3D n, + +/** + * KAPI_LOCK - Define a lock requirement + * @idx: Lock index + * @lname: Lock name + * @ltype: Lock type + */ +#define KAPI_LOCK(idx, lname, ltype) \ + .locks[idx] =3D { \ + .lock_name =3D lname, \ + .lock_type =3D ltype, + +#define KAPI_LOCK_ACQUIRED \ + .acquired =3D true, + +#define KAPI_LOCK_RELEASED \ + .released =3D true, + +#define KAPI_LOCK_HELD_ENTRY \ + .held_on_entry =3D true, + +#define KAPI_LOCK_HELD_EXIT \ + .held_on_exit =3D true, + +#define KAPI_LOCK_DESC(ldesc) \ + .description =3D ldesc, + +#define KAPI_LOCK_END }, + +/** + * KAPI_CONSTRAINT - Define an additional constraint + * @idx: Constraint index + * @cname: Constraint name + * @cdesc: Constraint description + */ +#define KAPI_CONSTRAINT(idx, cname, cdesc) \ + .constraints[idx] =3D { \ + .name =3D cname, \ + .description =3D cdesc, + +#define KAPI_CONSTRAINT_EXPR(expr) \ + .expression =3D expr, + +#define KAPI_CONSTRAINT_END }, + +/** + * KAPI_EXAMPLES - Set API usage examples + * @examples: Examples string + */ +#define KAPI_EXAMPLES(ex) \ + .info_magic =3D KAPI_MAGIC_INFO, \ + .examples =3D ex, + +/** + * KAPI_NOTES - Set API notes + * @notes: Notes string + */ +#define KAPI_NOTES(n) \ + .notes =3D n, + + +/** + * KAPI_SIGNAL - Define a signal specification + * @idx: Signal index + * @signum: Signal number (e.g., SIGKILL) + * @signame: Signal name string + * @dir: Direction flags + * @act: Action taken + */ +#define KAPI_SIGNAL(idx, signum, signame, dir, act) \ + .signals[idx] =3D { \ + .signal_num =3D signum, \ + .signal_name =3D signame, \ + .direction =3D dir, \ + .action =3D act, + +#define KAPI_SIGNAL_TARGET(tgt) \ + .target =3D tgt, + +#define KAPI_SIGNAL_CONDITION(cond) \ + .condition =3D cond, + +#define KAPI_SIGNAL_DESC(desc) \ + .description =3D desc, + +#define KAPI_SIGNAL_RESTARTABLE \ + .restartable =3D true, + +#define KAPI_SIGNAL_SA_FLAGS_REQ(flags) \ + .sa_flags_required =3D flags, + +#define KAPI_SIGNAL_SA_FLAGS_FORBID(flags) \ + .sa_flags_forbidden =3D flags, + +#define KAPI_SIGNAL_ERROR(err) \ + .error_on_signal =3D err, + +#define KAPI_SIGNAL_TRANSFORM(sig) \ + .transform_to =3D sig, + +#define KAPI_SIGNAL_TIMING(when) \ + .timing =3D when, + +#define KAPI_SIGNAL_PRIORITY(prio) \ + .priority =3D prio, + +#define KAPI_SIGNAL_INTERRUPTIBLE \ + .interruptible =3D true, + +#define KAPI_SIGNAL_QUEUE(behavior) \ + .queue_behavior =3D behavior, + +#define KAPI_SIGNAL_STATE_REQ(state) \ + .state_required =3D state, + +#define KAPI_SIGNAL_STATE_FORBID(state) \ + .state_forbidden =3D state, + +#define KAPI_SIGNAL_END }, + +#define KAPI_SIGNAL_COUNT(n) \ + .signal_magic =3D KAPI_MAGIC_SIGNALS, \ + .signal_count =3D n, + +/** + * KAPI_SIGNAL_MASK - Define a signal mask specification + * @idx: Mask index + * @name: Mask name + * @desc: Mask description + */ +#define KAPI_SIGNAL_MASK(idx, name, desc) \ + .signal_masks[idx] =3D { \ + .mask_name =3D name, \ + .description =3D desc, + +/* + * KAPI_SIGNAL_MASK_SIGNALS - Specify signals in a signal mask + * @...: Variadic list of signal numbers + * + * Usage: + * KAPI_SIGNAL_MASK(0, "blocked", "Signals blocked during operation") + * KAPI_SIGNAL_MASK_SIGNALS(SIGINT, SIGTERM, SIGQUIT) + * KAPI_SIGNAL_MASK_END + */ +#define KAPI_SIGNAL_MASK_SIGNALS(...) \ + .signals =3D { __VA_ARGS__ }, \ + .signal_count =3D sizeof((int[]){ __VA_ARGS__ }) / sizeof(int), + +#define KAPI_SIGNAL_MASK_END }, + +/** + * KAPI_STRUCT_SPEC - Define a structure specification + * @idx: Structure spec index + * @sname: Structure name + * @sdesc: Structure description + */ +#define KAPI_STRUCT_SPEC(idx, sname, sdesc) \ + .struct_specs[idx] =3D { \ + .name =3D #sname, \ + .description =3D sdesc, + +#define KAPI_STRUCT_SIZE(ssize, salign) \ + .size =3D ssize, \ + .alignment =3D salign, + +#define KAPI_STRUCT_FIELD_COUNT(n) \ + .field_count =3D n, + +/** + * KAPI_STRUCT_FIELD - Define a structure field + * @fidx: Field index + * @fname: Field name + * @ftype: Field type (KAPI_TYPE_*) + * @ftype_name: Type name as string + * @fdesc: Field description + */ +#define KAPI_STRUCT_FIELD(fidx, fname, ftype, ftype_name, fdesc) \ + .fields[fidx] =3D { \ + .name =3D fname, \ + .type =3D ftype, \ + .type_name =3D ftype_name, \ + .description =3D fdesc, + +#define KAPI_FIELD_OFFSET(foffset) \ + .offset =3D foffset, + +#define KAPI_FIELD_SIZE(fsize) \ + .size =3D fsize, + +#define KAPI_FIELD_FLAGS(fflags) \ + .flags =3D fflags, + +#define KAPI_FIELD_CONSTRAINT_RANGE(min, max) \ + .constraint_type =3D KAPI_CONSTRAINT_RANGE, \ + .min_value =3D min, \ + .max_value =3D max, + +#define KAPI_FIELD_CONSTRAINT_MASK(mask) \ + .constraint_type =3D KAPI_CONSTRAINT_MASK, \ + .valid_mask =3D mask, + +#define KAPI_FIELD_CONSTRAINT_ENUM(values) \ + .constraint_type =3D KAPI_CONSTRAINT_ENUM, \ + .enum_values =3D values, + +#define KAPI_STRUCT_FIELD_END }, + +#define KAPI_STRUCT_SPEC_END }, + +/* Counter for structure specifications */ +#define KAPI_STRUCT_SPEC_COUNT(n) \ + .struct_magic =3D KAPI_MAGIC_STRUCTS, \ + .struct_spec_count =3D n, + +/* Additional lock-related macros */ +#define KAPI_LOCK_COUNT(n) \ + .lock_magic =3D KAPI_MAGIC_LOCKS, \ + .lock_count =3D n, + +/** + * KAPI_SIDE_EFFECT - Define a side effect + * @idx: Side effect index + * @etype: Effect type bitmask (OR'ed KAPI_EFFECT_* values) + * @etarget: What is affected + * @edesc: Effect description + */ +#define KAPI_SIDE_EFFECT(idx, etype, etarget, edesc) \ + .side_effects[idx] =3D { \ + .type =3D etype, \ + .target =3D etarget, \ + .description =3D edesc, \ + .reversible =3D false, /* Default to non-reversible */ + +#define KAPI_EFFECT_CONDITION(cond) \ + .condition =3D cond, + +#define KAPI_EFFECT_REVERSIBLE \ + .reversible =3D true, + +#define KAPI_SIDE_EFFECT_END }, + +/** + * KAPI_STATE_TRANS - Define a state transition + * @idx: State transition index + * @obj: Object whose state changes + * @from: From state + * @to: To state + * @desc: Transition description + */ +#define KAPI_STATE_TRANS(idx, obj, from, to, desc) \ + .state_transitions[idx] =3D { \ + .object =3D obj, \ + .from_state =3D from, \ + .to_state =3D to, \ + .description =3D desc, + +#define KAPI_STATE_TRANS_COND(cond) \ + .condition =3D cond, + +#define KAPI_STATE_TRANS_END }, + +/* Counters for side effects and state transitions */ +#define KAPI_SIDE_EFFECT_COUNT(n) \ + .effect_magic =3D KAPI_MAGIC_EFFECTS, \ + .side_effect_count =3D n, + +#define KAPI_STATE_TRANS_COUNT(n) \ + .trans_magic =3D KAPI_MAGIC_TRANS, \ + .state_trans_count =3D n, + +/* Helper macros for common side effect patterns */ +#define KAPI_EFFECTS_MEMORY (KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_FREE_M= EMORY) +#define KAPI_EFFECTS_LOCKING (KAPI_EFFECT_LOCK_ACQUIRE | KAPI_EFFECT_LOCK_= RELEASE) +#define KAPI_EFFECTS_RESOURCES (KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_= RESOURCE_DESTROY) +#define KAPI_EFFECTS_IO (KAPI_EFFECT_NETWORK | KAPI_EFFECT_FILESYSTEM) + +/* + * Helper macros for combining common parameter flag patterns. + * Note: KAPI_PARAM_IN, KAPI_PARAM_OUT, KAPI_PARAM_INOUT, and KAPI_PARAM_O= PTIONAL + * are already defined in enum kapi_param_flags - use those directly. + */ +#define KAPI_PARAM_FLAGS_INOUT (KAPI_PARAM_IN | KAPI_PARAM_OUT) +#define KAPI_PARAM_FLAGS_USER (KAPI_PARAM_USER | KAPI_PARAM_IN) + +/* Common signal timing constants */ +#define KAPI_SIGNAL_TIME_ENTRY "entry" +#define KAPI_SIGNAL_TIME_DURING "during" +#define KAPI_SIGNAL_TIME_EXIT "exit" +#define KAPI_SIGNAL_TIME_ANYTIME "anytime" +#define KAPI_SIGNAL_TIME_BLOCKING "while_blocked" +#define KAPI_SIGNAL_TIME_SLEEPING "while_sleeping" +#define KAPI_SIGNAL_TIME_BEFORE "before" +#define KAPI_SIGNAL_TIME_AFTER "after" + +/* Common signal queue behaviors */ +#define KAPI_SIGNAL_QUEUE_STANDARD "standard" +#define KAPI_SIGNAL_QUEUE_REALTIME "realtime" +#define KAPI_SIGNAL_QUEUE_COALESCE "coalesce" +#define KAPI_SIGNAL_QUEUE_REPLACE "replace" +#define KAPI_SIGNAL_QUEUE_DISCARD "discard" + +/* Process state flags for signal delivery */ +#define KAPI_SIGNAL_STATE_RUNNING (1 << 0) +#define KAPI_SIGNAL_STATE_SLEEPING (1 << 1) +#define KAPI_SIGNAL_STATE_STOPPED (1 << 2) +#define KAPI_SIGNAL_STATE_TRACED (1 << 3) +#define KAPI_SIGNAL_STATE_ZOMBIE (1 << 4) +#define KAPI_SIGNAL_STATE_DEAD (1 << 5) + +/* Capability specification macros */ + +/** + * KAPI_CAPABILITY - Define a capability requirement + * @idx: Capability index + * @cap: Capability constant (e.g., CAP_IPC_LOCK) + * @name: Capability name string + * @act: Action type (kapi_capability_action) + */ +#define KAPI_CAPABILITY(idx, cap, name, act) \ + .capabilities[idx] =3D { \ + .capability =3D cap, \ + .cap_name =3D name, \ + .action =3D act, + +#define KAPI_CAP_ALLOWS(desc) \ + .allows =3D desc, + +#define KAPI_CAP_WITHOUT(desc) \ + .without_cap =3D desc, + +#define KAPI_CAP_CONDITION(cond) \ + .check_condition =3D cond, + +#define KAPI_CAP_PRIORITY(prio) \ + .priority =3D prio, + +#define KAPI_CAP_ALTERNATIVE(caps, count) \ + .alternative =3D caps, \ + .alternative_count =3D count, + +#define KAPI_CAPABILITY_END }, + +/* Counter for capability specifications */ +#define KAPI_CAPABILITY_COUNT(n) \ + .cap_magic =3D KAPI_MAGIC_CAPS, \ + .capability_count =3D n, + +/* Common signal patterns for syscalls */ +#define KAPI_SIGNAL_INTERRUPTIBLE_SLEEP \ + KAPI_SIGNAL(0, SIGINT, "SIGINT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTION_= RETURN) \ + KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \ + KAPI_SIGNAL_ERROR(-EINTR) \ + KAPI_SIGNAL_RESTARTABLE \ + KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \ + KAPI_SIGNAL_END, \ + KAPI_SIGNAL(1, SIGTERM, "SIGTERM", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO= N_RETURN) \ + KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_SLEEPING) \ + KAPI_SIGNAL_ERROR(-EINTR) \ + KAPI_SIGNAL_RESTARTABLE \ + KAPI_SIGNAL_DESC("Interrupts sleep, returns -EINTR") \ + KAPI_SIGNAL_END + +#define KAPI_SIGNAL_FATAL_DEFAULT \ + KAPI_SIGNAL(2, SIGKILL, "SIGKILL", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO= N_TERMINATE) \ + KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \ + KAPI_SIGNAL_PRIORITY(0) \ + KAPI_SIGNAL_DESC("Process terminated immediately") \ + KAPI_SIGNAL_END + +#define KAPI_SIGNAL_STOP_CONT \ + KAPI_SIGNAL(3, SIGSTOP, "SIGSTOP", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO= N_STOP) \ + KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \ + KAPI_SIGNAL_DESC("Process stopped") \ + KAPI_SIGNAL_END, \ + KAPI_SIGNAL(4, SIGCONT, "SIGCONT", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTIO= N_CONTINUE) \ + KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_ANYTIME) \ + KAPI_SIGNAL_DESC("Process continued") \ + KAPI_SIGNAL_END + +/* Validation and runtime checking */ + +#ifdef CONFIG_KAPI_RUNTIME_CHECKS +bool kapi_validate_params(const struct kernel_api_spec *spec, ...); +bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 val= ue); +bool kapi_validate_param_with_context(const struct kapi_param_spec *param_= spec, + s64 value, const s64 *all_params, int param_count); +int kapi_validate_syscall_param(const struct kernel_api_spec *spec, + int param_idx, s64 value); +int kapi_validate_syscall_params(const struct kernel_api_spec *spec, + const s64 *params, int param_count); +bool kapi_check_return_success(const struct kapi_return_spec *return_spec,= s64 retval); +bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 re= tval); +int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 r= etval); +void kapi_check_context(const struct kernel_api_spec *spec); +void kapi_check_locks(const struct kernel_api_spec *spec); +bool kapi_check_signal_allowed(const struct kernel_api_spec *spec, int sig= num); +bool kapi_validate_signal_action(const struct kernel_api_spec *spec, int s= ignum, + struct sigaction *act); +int kapi_get_signal_error(const struct kernel_api_spec *spec, int signum); +bool kapi_is_signal_restartable(const struct kernel_api_spec *spec, int si= gnum); +#else +static inline bool kapi_validate_params(const struct kernel_api_spec *spec= , ...) +{ + return true; +} +static inline bool kapi_validate_param(const struct kapi_param_spec *param= _spec, s64 value) +{ + return true; +} +static inline bool kapi_validate_param_with_context(const struct kapi_para= m_spec *param_spec, + s64 value, const s64 *all_params, int param_count) +{ + return true; +} +static inline int kapi_validate_syscall_param(const struct kernel_api_spec= *spec, + int param_idx, s64 value) +{ + return 0; +} +static inline int kapi_validate_syscall_params(const struct kernel_api_spe= c *spec, + const s64 *params, int param_count) +{ + return 0; +} +static inline bool kapi_check_return_success(const struct kapi_return_spec= *return_spec, s64 retval) +{ + return true; +} +static inline bool kapi_validate_return_value(const struct kernel_api_spec= *spec, s64 retval) +{ + return true; +} +static inline int kapi_validate_syscall_return(const struct kernel_api_spe= c *spec, s64 retval) +{ + return 0; +} +static inline void kapi_check_context(const struct kernel_api_spec *spec) = {} +static inline void kapi_check_locks(const struct kernel_api_spec *spec) {} +static inline bool kapi_check_signal_allowed(const struct kernel_api_spec = *spec, int signum) +{ + return true; +} +static inline bool kapi_validate_signal_action(const struct kernel_api_spe= c *spec, int signum, + struct sigaction *act) +{ + return true; +} +static inline int kapi_get_signal_error(const struct kernel_api_spec *spec= , int signum) +{ + return -EINTR; +} +static inline bool kapi_is_signal_restartable(const struct kernel_api_spec= *spec, int signum) +{ + return false; +} +#endif + +/* Export/query functions */ +const struct kernel_api_spec *kapi_get_spec(const char *name); +int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t= size); +void kapi_print_spec(const struct kernel_api_spec *spec); + +/* Registration for dynamic APIs */ +int kapi_register_spec(struct kernel_api_spec *spec); +void kapi_unregister_spec(const char *name); + +/* Helper to get parameter constraint info */ +static inline bool kapi_get_param_constraint(const char *api_name, int par= am_idx, + enum kapi_constraint_type *type, + u64 *valid_mask, s64 *min_val, s64 *max_val) +{ + const struct kernel_api_spec *spec =3D kapi_get_spec(api_name); + + if (!spec || param_idx >=3D spec->param_count) + return false; + + if (type) + *type =3D spec->params[param_idx].constraint_type; + if (valid_mask) + *valid_mask =3D spec->params[param_idx].valid_mask; + if (min_val) + *min_val =3D spec->params[param_idx].min_value; + if (max_val) + *max_val =3D spec->params[param_idx].max_value; + + return true; +} + +/* Socket state requirement macros */ +#define KAPI_SOCKET_STATE_REQ(...) \ + .socket_state =3D { \ + .required_states =3D { __VA_ARGS__ }, \ + .required_state_count =3D sizeof((enum kapi_socket_state[]){__VA_ARGS__}= )/sizeof(enum kapi_socket_state), + +#define KAPI_SOCKET_STATE_FORBID(...) \ + .forbidden_states =3D { __VA_ARGS__ }, \ + .forbidden_state_count =3D sizeof((enum kapi_socket_state[]){__VA_ARGS__= })/sizeof(enum kapi_socket_state), + +#define KAPI_SOCKET_STATE_RESULT(state) \ + .resulting_state =3D state, + +#define KAPI_SOCKET_STATE_COND(cond) \ + .state_condition =3D cond, + +#define KAPI_SOCKET_STATE_PROTOS(protos) \ + .applicable_protocols =3D protos, + +#define KAPI_SOCKET_STATE_END }, + +/* Protocol behavior macros */ +#define KAPI_PROTOCOL_BEHAVIOR(idx, protos, desc) \ + .protocol_behaviors[idx] =3D { \ + .applicable_protocols =3D protos, \ + .behavior =3D desc, + +#define KAPI_PROTOCOL_FLAGS(flags, desc) \ + .protocol_flags =3D flags, \ + .flag_description =3D desc, + +#define KAPI_PROTOCOL_BEHAVIOR_END }, + +/* Async behavior macros */ +#define KAPI_ASYNC_SPEC(modes, errno) \ + .async_spec =3D { \ + .supported_modes =3D modes, \ + .nonblock_errno =3D errno, + +#define KAPI_ASYNC_POLL(in, out) \ + .poll_events_in =3D in, \ + .poll_events_out =3D out, + +#define KAPI_ASYNC_COMPLETION(cond) \ + .completion_condition =3D cond, + +#define KAPI_ASYNC_TIMEOUT(supported, desc) \ + .supports_timeout =3D supported, \ + .timeout_behavior =3D desc, + +#define KAPI_ASYNC_END }, + +/* Buffer behavior macros */ +#define KAPI_BUFFER_SPEC(behaviors) \ + .buffer_spec =3D { \ + .buffer_behaviors =3D behaviors, + +#define KAPI_BUFFER_SIZE(min, max, optimal) \ + .min_buffer_size =3D min, \ + .max_buffer_size =3D max, \ + .optimal_buffer_size =3D optimal, + +#define KAPI_BUFFER_PARTIAL(allowed, rules) \ + .can_partial_transfer =3D allowed, \ + .partial_transfer_rules =3D rules, + +#define KAPI_BUFFER_FRAGMENT(rules) \ + .fragmentation_rules =3D rules, + +#define KAPI_BUFFER_END }, + +/* Address family macros */ +#define KAPI_ADDR_FAMILY(idx, fam, name, struct_sz, min_len, max_len) \ + .addr_families[idx] =3D { \ + .family =3D fam, \ + .family_name =3D name, \ + .addr_struct_size =3D struct_sz, \ + .min_addr_len =3D min_len, \ + .max_addr_len =3D max_len, + +#define KAPI_ADDR_FORMAT(fmt) \ + .addr_format =3D fmt, + +#define KAPI_ADDR_FEATURES(wildcard, multicast, broadcast) \ + .supports_wildcard =3D wildcard, \ + .supports_multicast =3D multicast, \ + .supports_broadcast =3D broadcast, + +#define KAPI_ADDR_SPECIAL(addrs) \ + .special_addresses =3D addrs, + +#define KAPI_ADDR_PORTS(min, max) \ + .port_range_min =3D min, \ + .port_range_max =3D max, + +#define KAPI_ADDR_FAMILY_END }, + +#define KAPI_ADDR_FAMILY_COUNT(n) \ + .addr_family_count =3D n, + +#define KAPI_PROTOCOL_BEHAVIOR_COUNT(n) \ + .protocol_behavior_count =3D n, + +#define KAPI_CONSTRAINT_COUNT(n) \ + .constraint_magic =3D KAPI_MAGIC_CONSTRAINTS, \ + .constraint_count =3D n, + +/* Network operation characteristics macros */ +#define KAPI_NET_CONNECTION_ORIENTED \ + .is_connection_oriented =3D true, + +#define KAPI_NET_MESSAGE_ORIENTED \ + .is_message_oriented =3D true, + +#define KAPI_NET_SUPPORTS_OOB \ + .supports_oob_data =3D true, + +#define KAPI_NET_SUPPORTS_PEEK \ + .supports_peek =3D true, + +#define KAPI_NET_REENTRANT \ + .is_reentrant =3D true, + +/* Semantic description macros */ +#define KAPI_NET_CONN_ESTABLISH(desc) \ + .connection_establishment =3D desc, + +#define KAPI_NET_CONN_TERMINATE(desc) \ + .connection_termination =3D desc, + +#define KAPI_NET_DATA_TRANSFER(desc) \ + .data_transfer_semantics =3D desc, + +#endif /* _LINUX_KERNEL_API_SPEC_H */ diff --git a/include/linux/syscall_api_spec.h b/include/linux/syscall_api_s= pec.h new file mode 100644 index 0000000000000..b7f9ba0f978ab --- /dev/null +++ b/include/linux/syscall_api_spec.h @@ -0,0 +1,198 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * syscall_api_spec.h - System Call API Specification Integration + * + * This header extends the SYSCALL_DEFINEX macros to support inline API sp= ecifications, + * allowing syscall documentation to be written alongside the implementati= on in a + * human-readable and machine-parseable format. + */ + +#ifndef _LINUX_SYSCALL_API_SPEC_H +#define _LINUX_SYSCALL_API_SPEC_H + +#include + + + +/* Automatic syscall validation infrastructure */ +/* + * The validation is now integrated directly into the SYSCALL_DEFINEx macr= os + * in syscalls.h when CONFIG_KAPI_RUNTIME_CHECKS is enabled. + * + * The validation happens in the __do_kapi_sys##name wrapper function whic= h: + * 1. Validates all parameters before calling the actual syscall + * 2. Calls the real syscall implementation + * 3. Validates the return value + * 4. Returns the result + */ + + +/* + * Helper macros for common syscall patterns + */ + +/* For syscalls that can sleep */ +#define KAPI_SYSCALL_SLEEPABLE \ + KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE) + +/* For syscalls that must be atomic */ +#define KAPI_SYSCALL_ATOMIC \ + KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC) + +/* Common parameter specifications */ +#define KAPI_PARAM_FD(idx, desc) \ + KAPI_PARAM(idx, "fd", "int", desc) \ + KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \ + .type =3D KAPI_TYPE_FD, \ + .constraint_type =3D KAPI_CONSTRAINT_NONE, \ + KAPI_PARAM_END + +#define KAPI_PARAM_USER_BUF(idx, name, desc) \ + KAPI_PARAM(idx, name, "void __user *", desc) \ + KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \ + KAPI_PARAM_END + +/** + * KAPI_PARAM_USER_STRUCT - Define a userspace struct pointer parameter + * @idx: Parameter index (0-based) + * @name: Parameter name + * @struct_type: The struct type (e.g., struct iocb) + * @desc: Parameter description + * + * This macro defines a parameter that is a userspace pointer to a struct. + * The pointer will be validated to ensure: + * - The pointer is accessible in userspace + * - The memory region of sizeof(struct_type) bytes is accessible + */ +#define KAPI_PARAM_USER_STRUCT(idx, name, struct_type, desc) \ + KAPI_PARAM(idx, name, #struct_type " __user *", desc) \ + KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \ + .type =3D KAPI_TYPE_USER_PTR, \ + .size =3D sizeof(struct_type), \ + .constraint_type =3D KAPI_CONSTRAINT_USER_PTR, \ + KAPI_PARAM_END + +/** + * KAPI_PARAM_USER_PTR_SIZED - Define a userspace pointer with explicit si= ze + * @idx: Parameter index (0-based) + * @name: Parameter name + * @ptr_size: Size in bytes of the memory region + * @desc: Parameter description + * + * This macro defines a parameter that is a userspace pointer to a memory + * region of a specific size. The pointer will be validated to ensure: + * - The pointer is accessible in userspace + * - The memory region of ptr_size bytes is accessible + */ +#define KAPI_PARAM_USER_PTR_SIZED(idx, name, ptr_size, desc) \ + KAPI_PARAM(idx, name, "void __user *", desc) \ + KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \ + .type =3D KAPI_TYPE_USER_PTR, \ + .size =3D ptr_size, \ + .constraint_type =3D KAPI_CONSTRAINT_USER_PTR, \ + KAPI_PARAM_END + +/** + * KAPI_PARAM_USER_STRING - Define a userspace null-terminated string para= meter + * @idx: Parameter index (0-based) + * @name: Parameter name + * @min_len: Minimum string length (excluding null terminator) + * @max_len: Maximum string length (excluding null terminator) + * @desc: Parameter description + * + * This macro defines a parameter that is a userspace pointer to a + * null-terminated string. The string will be validated to ensure: + * - The pointer is accessible in userspace + * - The string length (excluding null terminator) is within [min_len, max= _len] + */ +#define KAPI_PARAM_USER_STRING(idx, name, min_len, max_len, desc) \ + KAPI_PARAM(idx, name, "const char __user *", desc) \ + KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \ + .type =3D KAPI_TYPE_USER_PTR, \ + .constraint_type =3D KAPI_CONSTRAINT_USER_STRING, \ + .min_value =3D min_len, \ + .max_value =3D max_len, \ + KAPI_PARAM_END + +/** + * KAPI_PARAM_USER_PATH - Define a userspace pathname parameter + * @idx: Parameter index (0-based) + * @name: Parameter name + * @desc: Parameter description + * + * This macro defines a parameter that is a userspace pointer to a + * null-terminated pathname string. The path will be validated to ensure: + * - The pointer is accessible in userspace + * - The path is a valid null-terminated string + * - The path length does not exceed PATH_MAX (4096 bytes) + */ +#define KAPI_PARAM_USER_PATH(idx, name, desc) \ + KAPI_PARAM(idx, name, "const char __user *", desc) \ + KAPI_PARAM_FLAGS(KAPI_PARAM_USER | KAPI_PARAM_IN) \ + .type =3D KAPI_TYPE_PATH, \ + .constraint_type =3D KAPI_CONSTRAINT_USER_PATH, \ + KAPI_PARAM_END + +#define KAPI_PARAM_SIZE_T(idx, name, desc) \ + KAPI_PARAM(idx, name, "size_t", desc) \ + KAPI_PARAM_FLAGS(KAPI_PARAM_IN) \ + KAPI_PARAM_RANGE(0, SIZE_MAX) \ + KAPI_PARAM_END + +/* Common error specifications */ +#define KAPI_ERROR_EBADF(idx) \ + KAPI_ERROR(idx, -EBADF, "EBADF", "Invalid file descriptor", \ + "The file descriptor is not valid or has been closed") + +#define KAPI_ERROR_EINVAL(idx, condition) \ + KAPI_ERROR(idx, -EINVAL, "EINVAL", condition, \ + "Invalid argument provided") + +#define KAPI_ERROR_ENOMEM(idx) \ + KAPI_ERROR(idx, -ENOMEM, "ENOMEM", "Insufficient memory", \ + "Cannot allocate memory for the operation") + +#define KAPI_ERROR_EPERM(idx) \ + KAPI_ERROR(idx, -EPERM, "EPERM", "Operation not permitted", \ + "The calling process does not have the required permissions") + +#define KAPI_ERROR_EFAULT(idx) \ + KAPI_ERROR(idx, -EFAULT, "EFAULT", "Bad address", \ + "Invalid user space address provided") + +/* Standard return value specifications */ +#define KAPI_RETURN_SUCCESS_ZERO \ + KAPI_RETURN("long", "0 on success, negative error code on failure") \ + KAPI_RETURN_SUCCESS(0, "=3D=3D 0") \ + KAPI_RETURN_END + +#define KAPI_RETURN_FD_SPEC \ + KAPI_RETURN("long", "File descriptor on success, negative error code on f= ailure") \ + .check_type =3D KAPI_RETURN_FD, \ + KAPI_RETURN_END + +#define KAPI_RETURN_COUNT \ + KAPI_RETURN("long", "Number of bytes processed on success, negative error= code on failure") \ + KAPI_RETURN_SUCCESS(0, ">=3D 0") \ + KAPI_RETURN_END + +/* KAPI_ERROR_COUNT and KAPI_PARAM_COUNT are now defined in kernel_api_spe= c.h */ + +/** + * KAPI_SINCE_VERSION - Set the since version + * @version: Version string when the API was introduced + */ +#define KAPI_SINCE_VERSION(version) \ + .since_version =3D version, + + +/** + * KAPI_SIGNAL_MASK_COUNT - Set the signal mask count + * @count: Number of signal masks defined + */ +#define KAPI_SIGNAL_MASK_COUNT(count) \ + .signal_mask_count =3D count, + + + +#endif /* _LINUX_SYSCALL_API_SPEC_H */ \ No newline at end of file diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index cf84d98964b2f..eafda2f509999 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -89,6 +89,7 @@ struct file_attr; #include #include #include +#include #include #include #include @@ -134,6 +135,7 @@ struct file_attr; #define __SC_TYPE(t, a) t #define __SC_ARGS(t, a) a #define __SC_TEST(t, a) (void)BUILD_BUG_ON_ZERO(!__TYPE_IS_LL(t) && sizeof= (t) > sizeof(long)) +#define __SC_CAST_TO_S64(t, a) (s64)(a) =20 #ifdef CONFIG_FTRACE_SYSCALLS #define __SC_STR_ADECL(t, a) #a @@ -244,6 +246,41 @@ static inline int is_syscall_trace_event(struct trace_= event_call *tp_event) * done within __do_sys_*(). */ #ifndef __SYSCALL_DEFINEx +#ifdef CONFIG_KAPI_RUNTIME_CHECKS +#define __SYSCALL_DEFINEx(x, name, ...) \ + __diag_push(); \ + __diag_ignore(GCC, 8, "-Wattribute-alias", \ + "Type aliasing is used to sanitize syscall arguments");\ + asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ + __attribute__((alias(__stringify(__se_sys##name)))); \ + ALLOW_ERROR_INJECTION(sys##name, ERRNO); \ + static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\ + static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ + asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ + asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ + { \ + long ret =3D __do_kapi_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\ + __MAP(x,__SC_TEST,__VA_ARGS__); \ + __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \ + return ret; \ + } \ + __diag_pop(); \ + static inline long __do_kapi_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\ + { \ + const struct kernel_api_spec *__spec =3D kapi_get_spec("sys_" #name); \ + if (__spec) { \ + s64 __params[x] =3D { __MAP(x,__SC_CAST_TO_S64,__VA_ARGS__) }; \ + int __ret =3D kapi_validate_syscall_params(__spec, __params, x); \ + if (__ret) return __ret; \ + } \ + long ret =3D __do_sys##name(__MAP(x,__SC_ARGS,__VA_ARGS__)); \ + if (__spec) { \ + kapi_validate_syscall_return(__spec, (s64)ret); \ + } \ + return ret; \ + } \ + static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) +#else /* !CONFIG_KAPI_RUNTIME_CHECKS */ #define __SYSCALL_DEFINEx(x, name, ...) \ __diag_push(); \ __diag_ignore(GCC, 8, "-Wattribute-alias", \ @@ -262,6 +299,7 @@ static inline int is_syscall_trace_event(struct trace_e= vent_call *tp_event) } \ __diag_pop(); \ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) +#endif /* CONFIG_KAPI_RUNTIME_CHECKS */ #endif /* __SYSCALL_DEFINEx */ =20 /* For split 64-bit arguments on 32-bit architectures */ diff --git a/init/Kconfig b/init/Kconfig index fa79feb8fe57b..6a2a88de3aa84 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2191,6 +2191,8 @@ source "kernel/Kconfig.kexec" =20 source "kernel/liveupdate/Kconfig" =20 +source "kernel/api/Kconfig" + endmenu # General setup =20 source "arch/Kconfig" diff --git a/kernel/Makefile b/kernel/Makefile index e83669841b8cc..0531ed6973619 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -57,6 +57,9 @@ obj-y +=3D dma/ obj-y +=3D entry/ obj-y +=3D unwind/ obj-$(CONFIG_MODULES) +=3D module/ +obj-$(CONFIG_KAPI_SPEC) +=3D api/ +# Ensure api/ is always cleaned even when CONFIG_KAPI_SPEC is not set +obj- +=3D api/ =20 obj-$(CONFIG_KCMP) +=3D kcmp.o obj-$(CONFIG_FREEZER) +=3D freezer.o diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig new file mode 100644 index 0000000000000..fde25ec70e134 --- /dev/null +++ b/kernel/api/Kconfig @@ -0,0 +1,35 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Kernel API Specification Framework Configuration +# + +config KAPI_SPEC + bool "Kernel API Specification Framework" + help + This option enables the kernel API specification framework, + which provides formal documentation of kernel APIs in both + human and machine-readable formats. + + The framework allows developers to document APIs inline with + their implementation, including parameter specifications, + return values, error conditions, locking requirements, and + execution context constraints. + + When enabled, API specifications can be queried at runtime + and exported in various formats (JSON, XML) through debugfs. + + If unsure, say N. + +config KAPI_RUNTIME_CHECKS + bool "Runtime API specification checks" + depends on KAPI_SPEC + depends on DEBUG_KERNEL + help + Enable runtime validation of API usage against specifications. + This includes checking execution context requirements, parameter + validation, and lock state verification. + + This adds overhead and should only be used for debugging and + development. The checks use WARN_ONCE to report violations. + + If unsure, say N. diff --git a/kernel/api/Makefile b/kernel/api/Makefile new file mode 100644 index 0000000000000..acab17c78afa3 --- /dev/null +++ b/kernel/api/Makefile @@ -0,0 +1,29 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for the Kernel API Specification Framework +# + +# Core API specification framework +obj-$(CONFIG_KAPI_SPEC) +=3D kernel_api_spec.o + +# Auto-generated API specifications collector +ifeq ($(CONFIG_KAPI_SPEC),y) +obj-$(CONFIG_KAPI_SPEC) +=3D generated_api_specs.o + +# Find all potential apispec files (this is evaluated at make time) +apispec-files :=3D $(shell find $(objtree) -name "*.apispec.h" -type f 2>/= dev/null) + +# Generate the collector file +# Note: FORCE ensures this is always regenerated to pick up new apispec fi= les +$(obj)/generated_api_specs.c: $(srctree)/scripts/generate_api_specs.sh FOR= CE + $(Q)$(CONFIG_SHELL) $< $(srctree) $(objtree) > $@ + +targets +=3D generated_api_specs.c + +# Add explicit dependency on the generator script +$(obj)/generated_api_specs.o: $(obj)/generated_api_specs.c +endif + +# Always clean generated files, regardless of config +clean-files +=3D generated_api_specs.c + diff --git a/kernel/api/kernel_api_spec.c b/kernel/api/kernel_api_spec.c new file mode 100644 index 0000000000000..252000db38d2c --- /dev/null +++ b/kernel/api/kernel_api_spec.c @@ -0,0 +1,1185 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * kernel_api_spec.c - Kernel API Specification Framework Implementation + * + * Provides runtime support for kernel API specifications including valida= tion, + * export to various formats, and querying capabilities. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Section where API specifications are stored */ +extern struct kernel_api_spec __start_kapi_specs[]; +extern struct kernel_api_spec __stop_kapi_specs[]; + +/* Dynamic API registration */ +static LIST_HEAD(dynamic_api_specs); +static DEFINE_MUTEX(api_spec_mutex); + +struct dynamic_api_spec { + struct list_head list; + struct kernel_api_spec *spec; +}; + +/* + * __kapi_find_spec_locked - Internal lookup, caller must hold api_spec_mu= tex + */ +static const struct kernel_api_spec *__kapi_find_spec_locked(const char *n= ame) +{ + struct kernel_api_spec *spec; + struct dynamic_api_spec *dyn_spec; + + /* Search static specifications */ + for (spec =3D __start_kapi_specs; spec < __stop_kapi_specs; spec++) { + if (strcmp(spec->name, name) =3D=3D 0) + return spec; + } + + /* Search dynamic specifications (mutex already held) */ + list_for_each_entry(dyn_spec, &dynamic_api_specs, list) { + if (strcmp(dyn_spec->spec->name, name) =3D=3D 0) + return dyn_spec->spec; + } + + return NULL; +} + +/** + * kapi_get_spec - Get API specification by name + * @name: Function name to look up + * + * Return: Pointer to API specification or NULL if not found + */ +const struct kernel_api_spec *kapi_get_spec(const char *name) +{ + const struct kernel_api_spec *spec; + + mutex_lock(&api_spec_mutex); + spec =3D __kapi_find_spec_locked(name); + mutex_unlock(&api_spec_mutex); + + return spec; +} +EXPORT_SYMBOL_GPL(kapi_get_spec); + +/** + * kapi_register_spec - Register a dynamic API specification + * @spec: API specification to register + * + * Return: 0 on success, negative error code on failure + */ +int kapi_register_spec(struct kernel_api_spec *spec) +{ + struct dynamic_api_spec *dyn_spec; + int ret =3D 0; + + if (!spec || !spec->name[0]) + return -EINVAL; + + dyn_spec =3D kzalloc(sizeof(*dyn_spec), GFP_KERNEL); + if (!dyn_spec) + return -ENOMEM; + + dyn_spec->spec =3D spec; + + mutex_lock(&api_spec_mutex); + + /* Check if already exists while holding lock to prevent races */ + if (__kapi_find_spec_locked(spec->name)) { + ret =3D -EEXIST; + kfree(dyn_spec); + } else { + list_add_tail(&dyn_spec->list, &dynamic_api_specs); + } + + mutex_unlock(&api_spec_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(kapi_register_spec); + +/** + * kapi_unregister_spec - Unregister a dynamic API specification + * @name: Name of API to unregister + */ +void kapi_unregister_spec(const char *name) +{ + struct dynamic_api_spec *dyn_spec, *tmp; + + mutex_lock(&api_spec_mutex); + list_for_each_entry_safe(dyn_spec, tmp, &dynamic_api_specs, list) { + if (strcmp(dyn_spec->spec->name, name) =3D=3D 0) { + list_del(&dyn_spec->list); + kfree(dyn_spec); + break; + } + } + mutex_unlock(&api_spec_mutex); +} +EXPORT_SYMBOL_GPL(kapi_unregister_spec); + +/** + * param_type_to_string - Convert parameter type to string + * @type: Parameter type + * + * Return: String representation of type + */ +static const char *param_type_to_string(enum kapi_param_type type) +{ + static const char * const type_names[] =3D { + [KAPI_TYPE_VOID] =3D "void", + [KAPI_TYPE_INT] =3D "int", + [KAPI_TYPE_UINT] =3D "uint", + [KAPI_TYPE_PTR] =3D "pointer", + [KAPI_TYPE_STRUCT] =3D "struct", + [KAPI_TYPE_UNION] =3D "union", + [KAPI_TYPE_ENUM] =3D "enum", + [KAPI_TYPE_FUNC_PTR] =3D "function_pointer", + [KAPI_TYPE_ARRAY] =3D "array", + [KAPI_TYPE_FD] =3D "file_descriptor", + [KAPI_TYPE_USER_PTR] =3D "user_pointer", + [KAPI_TYPE_PATH] =3D "pathname", + [KAPI_TYPE_CUSTOM] =3D "custom", + }; + + if (type >=3D ARRAY_SIZE(type_names)) + return "unknown"; + + return type_names[type]; +} + +/** + * lock_type_to_string - Convert lock type to string + * @type: Lock type + * + * Return: String representation of lock type + */ +static const char *lock_type_to_string(enum kapi_lock_type type) +{ + static const char * const lock_names[] =3D { + [KAPI_LOCK_NONE] =3D "none", + [KAPI_LOCK_MUTEX] =3D "mutex", + [KAPI_LOCK_SPINLOCK] =3D "spinlock", + [KAPI_LOCK_RWLOCK] =3D "rwlock", + [KAPI_LOCK_SEQLOCK] =3D "seqlock", + [KAPI_LOCK_RCU] =3D "rcu", + [KAPI_LOCK_SEMAPHORE] =3D "semaphore", + [KAPI_LOCK_CUSTOM] =3D "custom", + }; + + if (type >=3D ARRAY_SIZE(lock_names)) + return "unknown"; + + return lock_names[type]; +} + +/** + * lock_scope_to_string - Convert lock scope to string + * @scope: Lock scope + * + * Return: String representation of lock scope + */ +static const char *lock_scope_to_string(enum kapi_lock_scope scope) +{ + static const char * const scope_names[] =3D { + [KAPI_LOCK_INTERNAL] =3D "internal", + [KAPI_LOCK_ACQUIRES] =3D "acquires", + [KAPI_LOCK_RELEASES] =3D "releases", + [KAPI_LOCK_CALLER_HELD] =3D "caller_held", + }; + + if (scope >=3D ARRAY_SIZE(scope_names)) + return "unknown"; + + return scope_names[scope]; +} + +/** + * return_check_type_to_string - Convert return check type to string + * @type: Return check type + * + * Return: String representation of return check type + */ +static const char *return_check_type_to_string(enum kapi_return_check_type= type) +{ + static const char * const check_names[] =3D { + [KAPI_RETURN_EXACT] =3D "exact", + [KAPI_RETURN_RANGE] =3D "range", + [KAPI_RETURN_ERROR_CHECK] =3D "error_check", + [KAPI_RETURN_FD] =3D "file_descriptor", + [KAPI_RETURN_CUSTOM] =3D "custom", + [KAPI_RETURN_NO_RETURN] =3D "no_return", + }; + + if (type >=3D ARRAY_SIZE(check_names)) + return "unknown"; + + return check_names[type]; +} + +/** + * capability_action_to_string - Convert capability action to string + * @action: Capability action + * + * Return: String representation of capability action + */ +static const char *capability_action_to_string(enum kapi_capability_action= action) +{ + static const char * const action_names[] =3D { + [KAPI_CAP_BYPASS_CHECK] =3D "bypass_check", + [KAPI_CAP_INCREASE_LIMIT] =3D "increase_limit", + [KAPI_CAP_OVERRIDE_RESTRICTION] =3D "override_restriction", + [KAPI_CAP_GRANT_PERMISSION] =3D "grant_permission", + [KAPI_CAP_MODIFY_BEHAVIOR] =3D "modify_behavior", + [KAPI_CAP_ACCESS_RESOURCE] =3D "access_resource", + [KAPI_CAP_PERFORM_OPERATION] =3D "perform_operation", + }; + + if (action >=3D ARRAY_SIZE(action_names)) + return "unknown"; + + return action_names[action]; +} + +/** + * kapi_export_json - Export API specification to JSON format + * @spec: API specification to export + * @buf: Buffer to write JSON to + * @size: Size of buffer + * + * Return: Number of bytes written or negative error + */ +int kapi_export_json(const struct kernel_api_spec *spec, char *buf, size_t= size) +{ + int ret =3D 0; + int i; + + if (!spec || !buf || size =3D=3D 0) + return -EINVAL; + + ret =3D scnprintf(buf, size, + "{\n" + " \"name\": \"%s\",\n" + " \"version\": %u,\n" + " \"description\": \"%s\",\n" + " \"long_description\": \"%s\",\n" + " \"context_flags\": \"0x%x\",\n", + spec->name, + spec->version, + spec->description, + spec->long_description, + spec->context_flags); + + /* Parameters */ + ret +=3D scnprintf(buf + ret, size - ret, + " \"parameters\": [\n"); + + for (i =3D 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) { + const struct kapi_param_spec *param =3D &spec->params[i]; + + ret +=3D scnprintf(buf + ret, size - ret, + " {\n" + " \"name\": \"%s\",\n" + " \"type\": \"%s\",\n" + " \"type_class\": \"%s\",\n" + " \"flags\": \"0x%x\",\n" + " \"description\": \"%s\"\n" + " }%s\n", + param->name, + param->type_name, + param_type_to_string(param->type), + param->flags, + param->description, + (i < spec->param_count - 1) ? "," : ""); + } + + ret +=3D scnprintf(buf + ret, size - ret, " ],\n"); + + /* Return value */ + ret +=3D scnprintf(buf + ret, size - ret, + " \"return\": {\n" + " \"type\": \"%s\",\n" + " \"type_class\": \"%s\",\n" + " \"check_type\": \"%s\",\n", + spec->return_spec.type_name, + param_type_to_string(spec->return_spec.type), + return_check_type_to_string(spec->return_spec.check_type)); + + switch (spec->return_spec.check_type) { + case KAPI_RETURN_EXACT: + ret +=3D scnprintf(buf + ret, size - ret, + " \"success_value\": %lld,\n", + spec->return_spec.success_value); + break; + case KAPI_RETURN_RANGE: + ret +=3D scnprintf(buf + ret, size - ret, + " \"success_min\": %lld,\n" + " \"success_max\": %lld,\n", + spec->return_spec.success_min, + spec->return_spec.success_max); + break; + case KAPI_RETURN_ERROR_CHECK: + ret +=3D scnprintf(buf + ret, size - ret, + " \"error_count\": %u,\n", + spec->return_spec.error_count); + break; + default: + break; + } + + ret +=3D scnprintf(buf + ret, size - ret, + " \"description\": \"%s\"\n" + " },\n", + spec->return_spec.description); + + /* Errors */ + ret +=3D scnprintf(buf + ret, size - ret, + " \"errors\": [\n"); + + for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) { + const struct kapi_error_spec *error =3D &spec->errors[i]; + + ret +=3D scnprintf(buf + ret, size - ret, + " {\n" + " \"code\": %d,\n" + " \"name\": \"%s\",\n" + " \"condition\": \"%s\",\n" + " \"description\": \"%s\"\n" + " }%s\n", + error->error_code, + error->name, + error->condition, + error->description, + (i < spec->error_count - 1) ? "," : ""); + } + + ret +=3D scnprintf(buf + ret, size - ret, " ],\n"); + + /* Locks */ + ret +=3D scnprintf(buf + ret, size - ret, + " \"locks\": [\n"); + + for (i =3D 0; i < spec->lock_count && i < KAPI_MAX_CONSTRAINTS; i++) { + const struct kapi_lock_spec *lock =3D &spec->locks[i]; + + ret +=3D scnprintf(buf + ret, size - ret, + " {\n" + " \"name\": \"%s\",\n" + " \"type\": \"%s\",\n" + " \"scope\": \"%s\",\n" + " \"description\": \"%s\"\n" + " }%s\n", + lock->lock_name, + lock_type_to_string(lock->lock_type), + lock_scope_to_string(lock->scope), + lock->description, + (i < spec->lock_count - 1) ? "," : ""); + } + + ret +=3D scnprintf(buf + ret, size - ret, " ],\n"); + + /* Capabilities */ + ret +=3D scnprintf(buf + ret, size - ret, + " \"capabilities\": [\n"); + + for (i =3D 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i+= +) { + const struct kapi_capability_spec *cap =3D &spec->capabilities[i]; + + ret +=3D scnprintf(buf + ret, size - ret, + " {\n" + " \"capability\": %d,\n" + " \"name\": \"%s\",\n" + " \"action\": \"%s\",\n" + " \"allows\": \"%s\",\n" + " \"without_cap\": \"%s\",\n" + " \"check_condition\": \"%s\",\n" + " \"priority\": %u", + cap->capability, + cap->cap_name, + capability_action_to_string(cap->action), + cap->allows, + cap->without_cap, + cap->check_condition, + cap->priority); + + if (cap->alternative_count > 0) { + int j; + ret +=3D scnprintf(buf + ret, size - ret, + ",\n \"alternatives\": ["); + for (j =3D 0; j < cap->alternative_count; j++) { + ret +=3D scnprintf(buf + ret, size - ret, + "%d%s", cap->alternative[j], + (j < cap->alternative_count - 1) ? ", " : ""); + } + ret +=3D scnprintf(buf + ret, size - ret, "]"); + } + + ret +=3D scnprintf(buf + ret, size - ret, + "\n }%s\n", + (i < spec->capability_count - 1) ? "," : ""); + } + + ret +=3D scnprintf(buf + ret, size - ret, " ],\n"); + + /* Additional info */ + ret +=3D scnprintf(buf + ret, size - ret, + " \"since_version\": \"%s\",\n" + " \"examples\": \"%s\",\n" + " \"notes\": \"%s\"\n" + "}\n", + spec->since_version, + spec->examples, + spec->notes); + + return ret; +} +EXPORT_SYMBOL_GPL(kapi_export_json); + + +/** + * kapi_print_spec - Print API specification to kernel log + * @spec: API specification to print + */ +void kapi_print_spec(const struct kernel_api_spec *spec) +{ + int i; + + if (!spec) + return; + + pr_info("=3D=3D=3D Kernel API Specification =3D=3D=3D\n"); + pr_info("Name: %s\n", spec->name); + pr_info("Version: %u\n", spec->version); + pr_info("Description: %s\n", spec->description); + + if (spec->long_description[0]) + pr_info("Long Description: %s\n", spec->long_description); + + pr_info("Context Flags: 0x%x\n", spec->context_flags); + + /* Parameters */ + if (spec->param_count > 0) { + pr_info("Parameters:\n"); + for (i =3D 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) { + const struct kapi_param_spec *param =3D &spec->params[i]; + pr_info(" [%d] %s: %s (flags: 0x%x)\n", + i, param->name, param->type_name, param->flags); + if (param->description[0]) + pr_info(" Description: %s\n", param->description); + } + } + + /* Return value */ + pr_info("Return: %s\n", spec->return_spec.type_name); + if (spec->return_spec.description[0]) + pr_info(" Description: %s\n", spec->return_spec.description); + + /* Errors */ + if (spec->error_count > 0) { + pr_info("Possible Errors:\n"); + for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) { + const struct kapi_error_spec *error =3D &spec->errors[i]; + pr_info(" %s (%d): %s\n", + error->name, error->error_code, error->condition); + } + } + + /* Capabilities */ + if (spec->capability_count > 0) { + pr_info("Capabilities:\n"); + for (i =3D 0; i < spec->capability_count && i < KAPI_MAX_CAPABILITIES; i= ++) { + const struct kapi_capability_spec *cap =3D &spec->capabilities[i]; + pr_info(" %s (%d):\n", cap->cap_name, cap->capability); + pr_info(" Action: %s\n", capability_action_to_string(cap->action)); + pr_info(" Allows: %s\n", cap->allows); + pr_info(" Without: %s\n", cap->without_cap); + if (cap->check_condition[0]) + pr_info(" Condition: %s\n", cap->check_condition); + } + } + + pr_info("=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D\n"); +} +EXPORT_SYMBOL_GPL(kapi_print_spec); + +#ifdef CONFIG_KAPI_RUNTIME_CHECKS + +/** + * kapi_validate_fd - Validate that a file descriptor is valid in current = context + * @fd: File descriptor to validate + * + * Return: true if fd is valid in current process context, false otherwise + */ +static bool kapi_validate_fd(int fd) +{ + struct fd f; + + /* Special case: AT_FDCWD is always valid */ + if (fd =3D=3D AT_FDCWD) + return true; + + /* Check basic range */ + if (fd < 0) + return false; + + /* Check if fd is valid in current process context */ + f =3D fdget(fd); + if (fd_empty(f)) { + return false; + } + + /* fd is valid, release reference */ + fdput(f); + return true; +} + +/** + * kapi_validate_user_ptr - Validate that a user pointer is accessible + * @ptr: User pointer to validate + * @size: Size in bytes to validate + * + * Return: true if user memory is accessible, false otherwise + */ +static bool kapi_validate_user_ptr(const void __user *ptr, size_t size) +{ + /* NULL pointers are not valid; caller handles optional case */ + if (!ptr) + return false; + + return access_ok(ptr, size); +} + +/** + * kapi_validate_user_ptr_with_params - Validate user pointer with dynamic= size + * @param_spec: Parameter specification + * @ptr: User pointer to validate + * @all_params: Array of all parameter values + * @param_count: Number of parameters + * + * Return: true if user memory is accessible, false otherwise + */ +static bool kapi_validate_user_ptr_with_params(const struct kapi_param_spe= c *param_spec, + const void __user *ptr, + const s64 *all_params, + int param_count) +{ + size_t actual_size; + + /* NULL is allowed for optional parameters */ + if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL)) + return true; + + /* Calculate actual size based on related parameter */ + if (param_spec->size_param_idx >=3D 0 && + param_spec->size_param_idx < param_count) { + s64 count =3D all_params[param_spec->size_param_idx]; + + /* Validate count is positive */ + if (count <=3D 0) { + pr_warn("Parameter %s: size determinant is non-positive (%lld)\n", + param_spec->name, count); + return false; + } + + /* Check for multiplication overflow */ + if (param_spec->size_multiplier > 0 && + count > SIZE_MAX / param_spec->size_multiplier) { + pr_warn("Parameter %s: size calculation overflow\n", + param_spec->name); + return false; + } + + actual_size =3D count * param_spec->size_multiplier; + } else { + /* Use fixed size */ + actual_size =3D param_spec->size; + } + + return kapi_validate_user_ptr(ptr, actual_size); +} + +/** + * kapi_validate_path - Validate that a pathname is accessible and within = limits + * @path: User pointer to pathname + * @param_spec: Parameter specification + * + * Return: true if path is valid, false otherwise + */ +static bool kapi_validate_path(const char __user *path, + const struct kapi_param_spec *param_spec) +{ + size_t len; + + /* NULL is allowed for optional parameters */ + if (!path && (param_spec->flags & KAPI_PARAM_OPTIONAL)) + return true; + + if (!path) { + pr_warn("Parameter %s: NULL path not allowed\n", param_spec->name); + return false; + } + + /* Check if the path is accessible */ + if (!access_ok(path, 1)) { + pr_warn("Parameter %s: path pointer %p not accessible\n", + param_spec->name, path); + return false; + } + + /* Use strnlen_user to get the length and validate accessibility */ + len =3D strnlen_user(path, PATH_MAX + 1); + if (len =3D=3D 0) { + pr_warn("Parameter %s: invalid path pointer %p\n", + param_spec->name, path); + return false; + } + + /* Check path length limit */ + if (len > PATH_MAX) { + pr_warn("Parameter %s: path too long (exceeds PATH_MAX)\n", + param_spec->name); + return false; + } + + return true; +} + +/** + * kapi_validate_user_string - Validate a userspace null-terminated string + * @str: User pointer to string + * @param_spec: Parameter specification containing length constraints + * + * Validates that the userspace string pointer is accessible and that the + * string length (excluding null terminator) is within the range specified + * by min_value and max_value in the parameter specification. + * + * Return: true if string is valid, false otherwise + */ +static bool kapi_validate_user_string(const char __user *str, + const struct kapi_param_spec *param_spec) +{ + size_t len; + size_t max_check_len; + + /* NULL is allowed for optional parameters */ + if (!str && (param_spec->flags & KAPI_PARAM_OPTIONAL)) + return true; + + if (!str) { + pr_warn("Parameter %s: NULL string not allowed\n", param_spec->name); + return false; + } + + /* Check if the string pointer is accessible */ + if (!access_ok(str, 1)) { + pr_warn("Parameter %s: string pointer %p not accessible\n", + param_spec->name, str); + return false; + } + + /* + * Use strnlen_user to get the string length and validate accessibility. + * Check up to max_value + 1 to detect strings that are too long. + * If max_value is 0 or unset, use PATH_MAX as a reasonable default. + */ + max_check_len =3D param_spec->max_value > 0 ? + (size_t)param_spec->max_value + 1 : PATH_MAX + 1; + len =3D strnlen_user(str, max_check_len); + + if (len =3D=3D 0) { + pr_warn("Parameter %s: invalid string pointer %p\n", + param_spec->name, str); + return false; + } + + /* + * strnlen_user returns the length including the null terminator. + * Convert to string length (excluding terminator) for range check. + */ + len--; + + /* Check minimum length constraint */ + if (param_spec->min_value > 0 && len < (size_t)param_spec->min_value) { + pr_warn("Parameter %s: string too short (%zu < %lld)\n", + param_spec->name, len, param_spec->min_value); + return false; + } + + /* Check maximum length constraint */ + if (param_spec->max_value > 0 && len > (size_t)param_spec->max_value) { + pr_warn("Parameter %s: string too long (%zu > %lld)\n", + param_spec->name, len, param_spec->max_value); + return false; + } + + return true; +} + +/** + * kapi_validate_user_ptr_constraint - Validate a userspace pointer with s= ize + * @ptr: User pointer to validate + * @param_spec: Parameter specification containing size + * + * Validates that the userspace pointer is accessible and that the memory + * region of the specified size can be accessed. The size is taken from + * the param_spec->size field. + * + * Return: true if pointer is valid, false otherwise + */ +static bool kapi_validate_user_ptr_constraint(const void __user *ptr, + const struct kapi_param_spec *param_spec) +{ + /* NULL is allowed for optional parameters */ + if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL)) + return true; + + if (!ptr) { + pr_warn("Parameter %s: NULL pointer not allowed\n", param_spec->name); + return false; + } + + /* Validate size is specified */ + if (param_spec->size =3D=3D 0) { + pr_warn("Parameter %s: size not specified for user pointer validation\n", + param_spec->name); + return false; + } + + /* Check if the memory region is accessible */ + if (!access_ok(ptr, param_spec->size)) { + pr_warn("Parameter %s: user pointer %p not accessible for %zu bytes\n", + param_spec->name, ptr, param_spec->size); + return false; + } + + return true; +} + +/** + * kapi_validate_param - Validate a parameter against its specification + * @param_spec: Parameter specification + * @value: Parameter value to validate + * + * Return: true if valid, false otherwise + */ +bool kapi_validate_param(const struct kapi_param_spec *param_spec, s64 val= ue) +{ + int i; + + /* Special handling for file descriptor type */ + if (param_spec->type =3D=3D KAPI_TYPE_FD) { + if (!kapi_validate_fd((int)value)) { + pr_warn("Parameter %s: invalid file descriptor %lld\n", + param_spec->name, value); + return false; + } + /* Continue with additional constraint checks if needed */ + } + + /* Special handling for user pointer type */ + if (param_spec->type =3D=3D KAPI_TYPE_USER_PTR) { + const void __user *ptr =3D (const void __user *)value; + + /* NULL is allowed for optional parameters */ + if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL)) + return true; + + if (!kapi_validate_user_ptr(ptr, param_spec->size)) { + pr_warn("Parameter %s: invalid user pointer %p (size: %zu)\n", + param_spec->name, ptr, param_spec->size); + return false; + } + /* Continue with additional constraint checks if needed */ + } + + /* Special handling for path type */ + if (param_spec->type =3D=3D KAPI_TYPE_PATH) { + const char __user *path =3D (const char __user *)value; + + if (!kapi_validate_path(path, param_spec)) { + return false; + } + /* Continue with additional constraint checks if needed */ + } + + switch (param_spec->constraint_type) { + case KAPI_CONSTRAINT_NONE: + return true; + + case KAPI_CONSTRAINT_RANGE: + if (value < param_spec->min_value || value > param_spec->max_value) { + pr_warn("Parameter %s value %lld out of range [%lld, %lld]\n", + param_spec->name, value, + param_spec->min_value, param_spec->max_value); + return false; + } + return true; + + case KAPI_CONSTRAINT_MASK: + if (value & ~param_spec->valid_mask) { + pr_warn("Parameter %s value 0x%llx contains invalid bits (valid mask: 0= x%llx)\n", + param_spec->name, value, param_spec->valid_mask); + return false; + } + return true; + + case KAPI_CONSTRAINT_ENUM: + if (!param_spec->enum_values || param_spec->enum_count =3D=3D 0) + return true; + + for (i =3D 0; i < param_spec->enum_count; i++) { + if (value =3D=3D param_spec->enum_values[i]) + return true; + } + pr_warn("Parameter %s value %lld not in valid enumeration\n", + param_spec->name, value); + return false; + + case KAPI_CONSTRAINT_ALIGNMENT: + if (param_spec->alignment =3D=3D 0) { + pr_warn("Parameter %s: alignment constraint specified but alignment is = 0\n", + param_spec->name); + return false; + } + if (value & (param_spec->alignment - 1)) { + pr_warn("Parameter %s value 0x%llx not aligned to %zu boundary\n", + param_spec->name, value, param_spec->alignment); + return false; + } + return true; + + case KAPI_CONSTRAINT_POWER_OF_TWO: + if (value =3D=3D 0 || (value & (value - 1))) { + pr_warn("Parameter %s value %lld is not a power of two\n", + param_spec->name, value); + return false; + } + return true; + + case KAPI_CONSTRAINT_PAGE_ALIGNED: + if (value & (PAGE_SIZE - 1)) { + pr_warn("Parameter %s value 0x%llx not page-aligned (PAGE_SIZE=3D%ld)\n= ", + param_spec->name, value, PAGE_SIZE); + return false; + } + return true; + + case KAPI_CONSTRAINT_NONZERO: + if (value =3D=3D 0) { + pr_warn("Parameter %s must be non-zero\n", param_spec->name); + return false; + } + return true; + + case KAPI_CONSTRAINT_USER_STRING: + return kapi_validate_user_string((const char __user *)value, param_spec); + + case KAPI_CONSTRAINT_USER_PATH: + return kapi_validate_path((const char __user *)value, param_spec); + + case KAPI_CONSTRAINT_USER_PTR: + return kapi_validate_user_ptr_constraint((const void __user *)value, par= am_spec); + + case KAPI_CONSTRAINT_CUSTOM: + if (param_spec->validate) + return param_spec->validate(value); + return true; + + default: + return true; + } +} +EXPORT_SYMBOL_GPL(kapi_validate_param); + +/** + * kapi_validate_param_with_context - Validate parameter with access to al= l params + * @param_spec: Parameter specification + * @value: Parameter value to validate + * @all_params: Array of all parameter values + * @param_count: Number of parameters + * + * Return: true if valid, false otherwise + */ +bool kapi_validate_param_with_context(const struct kapi_param_spec *param_= spec, + s64 value, const s64 *all_params, int param_count) +{ + /* Special handling for user pointer type with dynamic sizing */ + if (param_spec->type =3D=3D KAPI_TYPE_USER_PTR) { + const void __user *ptr =3D (const void __user *)value; + + /* NULL is allowed for optional parameters */ + if (!ptr && (param_spec->flags & KAPI_PARAM_OPTIONAL)) + return true; + + if (!kapi_validate_user_ptr_with_params(param_spec, ptr, all_params, par= am_count)) { + pr_warn("Parameter %s: invalid user pointer %p\n", + param_spec->name, ptr); + return false; + } + /* Continue with additional constraint checks if needed */ + } + + /* For other types, fall back to regular validation */ + return kapi_validate_param(param_spec, value); +} +EXPORT_SYMBOL_GPL(kapi_validate_param_with_context); + +/** + * kapi_validate_syscall_param - Validate syscall parameter with enforceme= nt + * @spec: API specification + * @param_idx: Parameter index + * @value: Parameter value + * + * Return: -EINVAL if invalid, 0 if valid + */ +int kapi_validate_syscall_param(const struct kernel_api_spec *spec, + int param_idx, s64 value) +{ + const struct kapi_param_spec *param_spec; + + if (!spec || param_idx >=3D spec->param_count) + return 0; + + param_spec =3D &spec->params[param_idx]; + + if (!kapi_validate_param(param_spec, value)) { + if (strncmp(spec->name, "sys_", 4) =3D=3D 0) { + /* For syscalls, we can return EINVAL to userspace */ + return -EINVAL; + } + } + + return 0; +} +EXPORT_SYMBOL_GPL(kapi_validate_syscall_param); + +/** + * kapi_validate_syscall_params - Validate all syscall parameters together + * @spec: API specification + * @params: Array of parameter values + * @param_count: Number of parameters + * + * Return: -EINVAL if any parameter is invalid, 0 if all valid + */ +int kapi_validate_syscall_params(const struct kernel_api_spec *spec, + const s64 *params, int param_count) +{ + int i; + + if (!spec || !params) + return 0; + + /* Validate that we have the expected number of parameters */ + if (param_count !=3D spec->param_count) { + pr_warn("API %s: parameter count mismatch (expected %u, got %d)\n", + spec->name, spec->param_count, param_count); + return -EINVAL; + } + + /* Validate each parameter with context */ + for (i =3D 0; i < spec->param_count && i < KAPI_MAX_PARAMS; i++) { + const struct kapi_param_spec *param_spec =3D &spec->params[i]; + + if (!kapi_validate_param_with_context(param_spec, params[i], params, par= am_count)) { + if (strncmp(spec->name, "sys_", 4) =3D=3D 0) { + /* For syscalls, we can return EINVAL to userspace */ + return -EINVAL; + } + } + } + + return 0; +} +EXPORT_SYMBOL_GPL(kapi_validate_syscall_params); + +/** + * kapi_check_return_success - Check if return value indicates success + * @return_spec: Return specification + * @retval: Return value to check + * + * Returns true if the return value indicates success according to the spe= c. + */ +bool kapi_check_return_success(const struct kapi_return_spec *return_spec,= s64 retval) +{ + u32 i; + + if (!return_spec) + return true; /* No spec means we can't validate */ + + switch (return_spec->check_type) { + case KAPI_RETURN_EXACT: + return retval =3D=3D return_spec->success_value; + + case KAPI_RETURN_RANGE: + return retval >=3D return_spec->success_min && + retval <=3D return_spec->success_max; + + case KAPI_RETURN_ERROR_CHECK: + /* Success if NOT in error list */ + if (return_spec->error_values) { + for (i =3D 0; i < return_spec->error_count; i++) { + if (retval =3D=3D return_spec->error_values[i]) + return false; /* Found in error list */ + } + } + return true; /* Not in error list =3D success */ + + case KAPI_RETURN_FD: + /* File descriptors: >=3D 0 is success, < 0 is error */ + return retval >=3D 0; + + case KAPI_RETURN_CUSTOM: + if (return_spec->is_success) + return return_spec->is_success(retval); + fallthrough; + + default: + return true; /* Unknown check type, assume success */ + } +} +EXPORT_SYMBOL_GPL(kapi_check_return_success); + +/** + * kapi_validate_return_value - Validate that return value matches spec + * @spec: API specification + * @retval: Return value to validate + * + * Return: true if return value is valid according to spec, false otherwis= e. + * + * This function checks: + * 1. If the value indicates success, it must match the success criteria + * 2. If the value indicates error, it must be one of the specified error = codes + */ +bool kapi_validate_return_value(const struct kernel_api_spec *spec, s64 re= tval) +{ + int i; + bool is_success; + + if (!spec) + return true; /* No spec means we can't validate */ + + /* First check if this is a success return */ + is_success =3D kapi_check_return_success(&spec->return_spec, retval); + + if (is_success) { + /* Special validation for file descriptor returns */ + if (spec->return_spec.check_type =3D=3D KAPI_RETURN_FD) { + /* For successful FD returns, validate it's a valid FD */ + if (!kapi_validate_fd((int)retval)) { + pr_warn("API %s returned invalid file descriptor %lld\n", + spec->name, retval); + return false; + } + } + return true; + } + + /* Error case - check if it's one of the specified errors */ + if (spec->error_count =3D=3D 0) { + /* No errors specified, so any error is potentially valid */ + pr_debug("API %s returned unspecified error %lld\n", + spec->name, retval); + return true; + } + + /* Check if the error is in our list of specified errors */ + for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) { + if (retval =3D=3D spec->errors[i].error_code) + return true; + } + + /* Error not in spec */ + pr_warn("API %s returned unspecified error code %lld. Valid errors are:\n= ", + spec->name, retval); + for (i =3D 0; i < spec->error_count && i < KAPI_MAX_ERRORS; i++) { + pr_warn(" %s (%d): %s\n", + spec->errors[i].name, + spec->errors[i].error_code, + spec->errors[i].condition); + } + + return false; +} +EXPORT_SYMBOL_GPL(kapi_validate_return_value); + +/** + * kapi_validate_syscall_return - Validate syscall return value with enfor= cement + * @spec: API specification + * @retval: Return value + * + * Return: 0 if valid, -EINVAL if the return value doesn't match spec + * + * For syscalls, this can help detect kernel bugs where unspecified error + * codes are returned to userspace. + */ +int kapi_validate_syscall_return(const struct kernel_api_spec *spec, s64 r= etval) +{ + if (!spec) + return 0; + + if (!kapi_validate_return_value(spec, retval)) { + /* Log the violation but don't change the return value */ + WARN_ONCE(1, "Syscall %s returned unspecified value %lld\n", + spec->name, retval); + /* Could return -EINVAL here to enforce, but that might break userspace = */ + } + + return 0; +} +EXPORT_SYMBOL_GPL(kapi_validate_syscall_return); + +/** + * kapi_check_context - Check if current context matches API requirements + * @spec: API specification to check against + */ +void kapi_check_context(const struct kernel_api_spec *spec) +{ + u32 ctx =3D spec->context_flags; + bool valid =3D false; + + if (!ctx) + return; + + /* Check if we're in an allowed context */ + if ((ctx & KAPI_CTX_PROCESS) && !in_interrupt()) + valid =3D true; + + if ((ctx & KAPI_CTX_SOFTIRQ) && in_softirq()) + valid =3D true; + + if ((ctx & KAPI_CTX_HARDIRQ) && in_hardirq()) + valid =3D true; + + if ((ctx & KAPI_CTX_NMI) && in_nmi()) + valid =3D true; + + if (!valid) { + WARN_ONCE(1, "API %s called from invalid context\n", spec->name); + } + + /* Check specific requirements */ + if ((ctx & KAPI_CTX_ATOMIC) && preemptible()) { + WARN_ONCE(1, "API %s requires atomic context\n", spec->name); + } + + if ((ctx & KAPI_CTX_SLEEPABLE) && !preemptible()) { + WARN_ONCE(1, "API %s requires sleepable context\n", spec->name); + } +} +EXPORT_SYMBOL_GPL(kapi_check_context); + +#endif /* CONFIG_KAPI_RUNTIME_CHECKS */ diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh new file mode 100755 index 0000000000000..2c3078a508fef --- /dev/null +++ b/scripts/generate_api_specs.sh @@ -0,0 +1,18 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Stub script for generating API specifications collector +# This is a placeholder until the full implementation is available +# + +cat << 'EOF' +// SPDX-License-Identifier: GPL-2.0 +/* + * Auto-generated API specifications collector (stub) + * Generated by scripts/generate_api_specs.sh + */ + +#include + +/* No API specifications collected yet */ +EOF --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AB222D97BD; Thu, 18 Dec 2025 20:42:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090568; cv=none; b=Rb9zk8ijzkzRAa1sdSOW7HcbBxa7+g5Yvggx2J0NN979xYBbHDir4qkmKRoAtkqnnXRlKa4TTuUMDSbqtAnNLDzr6S8AdYhap2NZIhtxNmeHP8u2ZH+zn5N9m8zBzvixLwi2aie3X4Ak0D/x5EM2zx8YHpzE2srYuBrBiKR7sM8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090568; c=relaxed/simple; bh=lDfn1hgkc+g2P4PzDyhasO3+1sCtVJHnt7XbBNuByqE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CGamYOpf80ArdbRm+k8+fgbYZ9vJXw4XSpFbvh8AC9+NXkBPef9bMEvh9zj+PALjVFcCXDzV1yNb2YxYTYxQkrSntfWyMZ8+5uu6bwD8zkiXqRcbXIWy0WueJ9m/C6nSXXyvwZ+wZ8k6cHwp6FSogyYFFHD1PpG6uCTYaog6Yrg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lhxttUnh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lhxttUnh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56770C19421; Thu, 18 Dec 2025 20:42:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090568; bh=lDfn1hgkc+g2P4PzDyhasO3+1sCtVJHnt7XbBNuByqE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lhxttUnhw8JzdeH7QFLreeAdhesw7gaymacZllocrsn8XRKgc9sfuRCgoUeZBXv8X NERLbSA4sBCFQWOoId7zBT8ub2LnH2I1UscFA8WPgfcvIteeIiWvjg2Q5gnorQcTqx GJ7w6dEssOoKXEMZIqGKKJ2gDdU6PVU7nSwJKEcaPvLss7EmOZ2yjNz22s62ZoO9Ly C7r1hol+LVoI/mg2YD03lMZixbSuyJOjLt4SmNbpxbDpYz0VcvhzKE367oezj+gcKv 881WOlv9pl33ticJ/FgZMp4bh7wID46pKudRXarRazbJ6nYxO7xh4im49NcujfE/M+ Ccj4pItP9o+ug== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 02/15] kernel/api: enable kerneldoc-based API specifications Date: Thu, 18 Dec 2025 15:42:24 -0500 Message-ID: <20251218204239.4159453-3-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for extracting API specifications from kernel-doc comments and generating C macro invocations for the kernel API specification framework. Changes include: - New kdoc_apispec.py module for generating API spec macros - Updates to kernel-doc.py to support -apispec output format - Build system integration in Makefile.build - Generator script for collecting all API specifications - Support for API-specific sections in kernel-doc comments Signed-off-by: Sasha Levin --- scripts/Makefile.build | 28 + scripts/Makefile.clean | 3 + scripts/generate_api_specs.sh | 83 ++- scripts/kernel-doc.py | 5 + tools/lib/python/kdoc/kdoc_apispec.py | 755 ++++++++++++++++++++++++++ tools/lib/python/kdoc/kdoc_output.py | 9 +- tools/lib/python/kdoc/kdoc_parser.py | 86 ++- 7 files changed, 957 insertions(+), 12 deletions(-) create mode 100644 tools/lib/python/kdoc/kdoc_apispec.py diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 52c08c4eb0b9a..7a192d29a01f6 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -172,6 +172,34 @@ ifneq ($(KBUILD_EXTRA_WARN),) $< endif =20 +# Generate API spec headers from kernel-doc comments +ifeq ($(CONFIG_KAPI_SPEC),y) +# Function to check if a file has API specifications +has-apispec =3D $(shell grep -qE '^\s*\*\s*(long-desc|context-flags|state-= trans):' $(src)/$(1) 2>/dev/null && echo $(1)) + +# Get base names without directory prefix +c-objs-base :=3D $(notdir $(real-obj-y) $(real-obj-m)) +# Filter to only .o files with corresponding .c source files +c-files :=3D $(foreach o,$(c-objs-base),$(if $(wildcard $(src)/$(o:.o=3D.c= )),$(o:.o=3D.c))) +# Also check for any additional .c files that contain API specs but are in= cluded +extra-c-files :=3D $(shell find $(src) -maxdepth 1 -name "*.c" -exec grep = -l '^\s*\*\s*\(long-desc\|context-flags\|state-trans\):' {} \; 2>/dev/null = | xargs -r basename -a) +# Combine both lists and remove duplicates +all-c-files :=3D $(sort $(c-files) $(extra-c-files)) +# Only include files that actually have API specifications +apispec-files :=3D $(foreach f,$(all-c-files),$(call has-apispec,$(f))) +# Generate apispec targets with proper directory prefix +apispec-y :=3D $(addprefix $(obj)/,$(apispec-files:.c=3D.apispec.h)) +always-y +=3D $(apispec-y) +targets +=3D $(apispec-y) + +quiet_cmd_apispec =3D APISPEC $@ + cmd_apispec =3D PYTHONDONTWRITEBYTECODE=3D1 $(KERNELDOC) -apispec \ + $(KDOCFLAGS) $< > $@ || rm -f $@ + +$(obj)/%.apispec.h: $(src)/%.c FORCE + $(call if_changed,apispec) +endif + # Compile C sources (.c) # ------------------------------------------------------------------------= --- =20 diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean index 6ead00ec7313b..f78dbbe637f27 100644 --- a/scripts/Makefile.clean +++ b/scripts/Makefile.clean @@ -35,6 +35,9 @@ __clean-files :=3D $(filter-out $(no-clean-files), $(__= clean-files)) =20 __clean-files :=3D $(wildcard $(addprefix $(obj)/, $(__clean-files))) =20 +# Also clean generated apispec headers (computed dynamically in Makefile.b= uild) +__clean-files +=3D $(wildcard $(obj)/*.apispec.h) + # =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 # To make this rule robust against "Argument list too long" error, diff --git a/scripts/generate_api_specs.sh b/scripts/generate_api_specs.sh index 2c3078a508fef..3ac6be9b4fe98 100755 --- a/scripts/generate_api_specs.sh +++ b/scripts/generate_api_specs.sh @@ -1,18 +1,87 @@ #!/bin/bash # SPDX-License-Identifier: GPL-2.0 # -# Stub script for generating API specifications collector -# This is a placeholder until the full implementation is available +# generate_api_specs.sh - Generate C file that includes all API specificat= ion headers # +# Usage: generate_api_specs.sh =20 -cat << 'EOF' -// SPDX-License-Identifier: GPL-2.0 +SRCTREE=3D"$1" +OBJTREE=3D"$2" + +if [ -z "$SRCTREE" ] || [ -z "$OBJTREE" ]; then + echo "Usage: $0 " >&2 + exit 1 +fi + +# Generate header +cat < #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_KAPI_SPEC + +EOF =20 -/* No API specifications collected yet */ +# Find all .apispec.h files and generate includes +# Look in both source tree and object tree +(find "$SRCTREE" -name "*.apispec.h" -type f 2>/dev/null; \ + find "$OBJTREE" -name "*.apispec.h" -type f 2>/dev/null) | \ + grep -v "/generated_api_specs.c" | \ + sort -u | \ + while read -r apispec_file; do + # Get relative path from srctree or objtree + case "$apispec_file" in + "$SRCTREE"*) + rel_path=3D"${apispec_file#$SRCTREE/}" + ;; + *) + rel_path=3D"${apispec_file#$OBJTREE/}" + ;; + esac + + # Skip if file is empty + if [ ! -s "$apispec_file" ]; then + continue + fi + + # Generate include statement with relative path from kernel/api/ + # The generated file is always at kernel/api/generated_api_specs.c, + # so we need to go up two directories to reach the root + echo "#include \"../../${rel_path}\"" + done + +# Close the ifdef +cat <\n" + "#include \n\n" + ) + + def _format_macro_param(self, value, max_len=3DKAPI_MAX_DESC_LEN): + """Format a value for use in C macro parameter, truncating if need= ed""" + if value is None: + return '""' + value =3D str(value).replace('\\', '\\\\').replace('"', '\\"') + value =3D value.replace('\n', ' ') + # Truncate to fit within max_len, accounting for escaping overhead + if len(value) > max_len - 10: + value =3D value[:max_len - 13] + '...' + return f'"{value}"' + + def _get_section(self, sections, key): + """Get first line from sections, checking with and without @ prefi= x""" + for prefix in ['', '@']: + full_key =3D prefix + key + if full_key in sections: + content =3D sections[full_key].strip() + # Return only first line to avoid mixing sections + return content.split('\n')[0].strip() if content else '' + return None + + def _get_raw_section(self, sections, key): + """Get full section content, checking with and without @ prefix""" + for prefix in ['', '@']: + full_key =3D prefix + key + if full_key in sections: + return sections[full_key] + return '' + + def _parse_indented_items(self, section_content, item_parser): + """Generic parser for indented items. + + Args: + section_content: Raw section content + item_parser: Function that takes (lines, start_index) and retu= rns (item, next_index) + + Returns: + List of parsed items + """ + if not section_content: + return [] + + items =3D [] + lines =3D section_content.strip().split('\n') + i =3D 0 + + while i < len(lines): + if not lines[i].strip(): + i +=3D 1 + continue + + # Check if this is a main item (not indented) + if not lines[i].startswith((' ', '\t')): + item, i =3D item_parser(lines, i) + if item: + items.append(item) + else: + i +=3D 1 + + return items + + def _parse_subfields(self, lines, start_idx): + """Parse indented subfields starting from start_idx+1. + + Returns: (dict of subfields, next index) + """ + subfields =3D {} + i =3D start_idx + 1 + + while i < len(lines) and (lines[i].startswith((' ', '\t'))): + line =3D lines[i].strip() + if ':' in line: + key, value =3D line.split(':', 1) + subfields[key.strip()] =3D value.strip() + i +=3D 1 + + return subfields, i + + def _parse_signal_item(self, lines, i): + """Parse a single signal specification""" + signal =3D {'name': lines[i].strip()} + subfields, next_i =3D self._parse_subfields(lines, i) + + # Map subfields to signal attributes + signal.update({ + 'direction': subfields.get('direction', 'KAPI_SIGNAL_RECEIVE'), + 'action': subfields.get('action', 'KAPI_SIGNAL_ACTION_RETURN'), + 'condition': subfields.get('condition'), + 'desc': subfields.get('desc'), + 'error': subfields.get('error'), + 'timing': subfields.get('timing'), + 'priority': subfields.get('priority'), + 'interruptible': subfields.get('interruptible', '').lower() = =3D=3D 'yes', + 'number': subfields.get('number', '0'), + }) + + return signal, next_i + + def _parse_error_item(self, lines, i): + """Parse a single error specification""" + line =3D lines[i].strip() + + # Skip desc: lines + if line.startswith('desc:'): + return None, i + 1 + + # Check for error pattern + if not re.match(r'^[A-Z][A-Z0-9_]+,', line): + return None, i + 1 + + error =3D {'line': line, 'desc': ''} + + # Look for desc: continuation + i +=3D 1 + desc_lines =3D [] + while i < len(lines): + next_line =3D lines[i].strip() + if next_line.startswith('desc:'): + desc_lines.append(next_line[5:].strip()) + i +=3D 1 + elif not next_line or re.match(r'^[A-Z][A-Z0-9_]+,', next_line= ): + break + else: + desc_lines.append(next_line) + i +=3D 1 + + if desc_lines: + error['desc'] =3D ' '.join(desc_lines) + + return error, i + + def _parse_lock_item(self, lines, i): + """Parse a single lock specification""" + line =3D lines[i].strip() + if ':' not in line: + return None, i + 1 + + parts =3D line.split(':', 1)[1].strip().split(',', 1) + if len(parts) < 2: + return None, i + 1 + + lock =3D { + 'name': parts[0].strip(), + 'type': parts[1].strip() + } + + subfields, next_i =3D self._parse_subfields(lines, i) + + # Map boolean fields + for field in ['acquired', 'released', 'held-on-entry', 'held-on-ex= it']: + if subfields.get(field, '').lower() =3D=3D 'true': + lock[field] =3D True + + lock['desc'] =3D subfields.get('desc', '') + + return lock, next_i + + def _parse_constraint_item(self, lines, i): + """Parse a single constraint specification""" + line =3D lines[i].strip() + + # Check for old format with comma + if ',' in line: + parts =3D line.split(',', 1) + constraint =3D { + 'name': parts[0].strip(), + 'desc': parts[1].strip() if len(parts) > 1 else '', + 'expr': None + } + else: + constraint =3D {'name': line, 'desc': '', 'expr': None} + + subfields, next_i =3D self._parse_subfields(lines, i) + + if 'desc' in subfields: + constraint['desc'] =3D (constraint['desc'] + ' ' + subfields['= desc']).strip() + constraint['expr'] =3D subfields.get('expr') + + return constraint, next_i + + def _parse_side_effect_item(self, lines, i): + """Parse a single side effect specification""" + line =3D lines[i].strip() + + # Default to new format + effect =3D { + 'type': line, + 'target': '', + 'desc': '', + 'condition': None, + 'reversible': False + } + + # Check for old format with commas + if ',' in line: + # Handle condition and reversible flags + cond_match =3D re.search(r',\s*condition=3D([^,]+?)(?:\s*,\s*r= eversible=3D(yes|no)\s*)?$', line) + if cond_match: + effect['condition'] =3D cond_match.group(1).strip() + effect['reversible'] =3D cond_match.group(2) =3D=3D 'yes' + line =3D line[:cond_match.start()] + elif ', reversible=3Dyes' in line: + effect['reversible'] =3D True + line =3D line.replace(', reversible=3Dyes', '') + elif ', reversible=3Dno' in line: + line =3D line.replace(', reversible=3Dno', '') + + parts =3D line.split(',', 2) + if len(parts) >=3D 1: + effect['type'] =3D parts[0].strip() + if len(parts) >=3D 2: + effect['target'] =3D parts[1].strip() + if len(parts) >=3D 3: + effect['desc'] =3D parts[2].strip() + else: + # Multi-line format with subfields + subfields, next_i =3D self._parse_subfields(lines, i) + effect.update({ + 'target': subfields.get('target', ''), + 'desc': subfields.get('desc', ''), + 'condition': subfields.get('condition'), + 'reversible': subfields.get('reversible', '').lower() =3D= =3D 'yes' + }) + return effect, next_i + + return effect, i + 1 + + def _parse_state_trans_item(self, lines, i): + """Parse a single state transition specification""" + line =3D lines[i].strip() + + trans =3D { + 'target': line, + 'from': '', + 'to': '', + 'condition': '', + 'desc': '' + } + + # Check for old format with commas + if ',' in line: + parts =3D line.split(',', 3) + if len(parts) >=3D 1: + trans['target'] =3D parts[0].strip() + if len(parts) >=3D 2: + trans['from'] =3D parts[1].strip() + if len(parts) >=3D 3: + trans['to'] =3D parts[2].strip() + if len(parts) >=3D 4: + desc_part =3D parts[3].strip() + desc_parts =3D desc_part.split(',', 1) + if len(desc_parts) > 1: + trans['condition'] =3D desc_parts[0].strip() + trans['desc'] =3D desc_parts[1].strip() + else: + trans['desc'] =3D desc_part + return trans, i + 1 + else: + # Multi-line format with subfields + subfields, next_i =3D self._parse_subfields(lines, i) + trans.update({ + 'from': subfields.get('from', ''), + 'to': subfields.get('to', ''), + 'condition': subfields.get('condition', ''), + 'desc': subfields.get('desc', '') + }) + return trans, next_i + + def _process_parameters(self, sections, parameterlist, parameterdescs,= parametertypes): + """Process and output parameter specifications""" + param_count =3D len(parameterlist) + if param_count > 0: + self.data +=3D f"\n\tKAPI_PARAM_COUNT({param_count})\n" + + for param_idx, param in enumerate(parameterlist): + param_name =3D param.strip() + param_desc =3D parameterdescs.get(param_name, '') + param_ctype =3D parametertypes.get(param_name, '') + + # Parse parameter specifications + param_section =3D self._get_raw_section(sections, 'param') + param_specs =3D {} + if param_section: + param_specs =3D self._parse_param_spec(param_section, para= m_name) + + self.data +=3D f"\n\tKAPI_PARAM({param_idx}, {self._format_mac= ro_param(param_name)}, " + self.data +=3D f"{self._format_macro_param(param_ctype)}, {sel= f._format_macro_param(param_desc)})\n" + + # Add parameter attributes + for key, macro in [ + ('param-type', 'KAPI_PARAM_TYPE'), + ('param-flags', 'KAPI_PARAM_FLAGS'), + ('param-size', 'KAPI_PARAM_SIZE'), + ('param-alignment', 'KAPI_PARAM_ALIGNMENT'), + ]: + if key in param_specs: + self.data +=3D f"\t\t{macro}({param_specs[key]})\n" + + # Handle constraint type + if 'param-constraint-type' in param_specs: + ctype =3D param_specs['param-constraint-type'] + if ctype =3D=3D 'KAPI_CONSTRAINT_BITMASK': + ctype =3D 'KAPI_CONSTRAINT_MASK' + self.data +=3D f"\t\tKAPI_PARAM_CONSTRAINT_TYPE({ctype})\n" + + # Handle range + if 'param-range' in param_specs and ',' in param_specs['param-= range']: + min_val, max_val =3D param_specs['param-range'].split(',',= 1) + self.data +=3D f"\t\tKAPI_PARAM_RANGE({min_val.strip()}, {= max_val.strip()})\n" + + # Handle mask + if 'param-mask' in param_specs: + self.data +=3D f"\t\tKAPI_PARAM_VALID_MASK({param_specs['p= aram-mask']})\n" + + # Handle enum values + if 'param-enum-values' in param_specs: + self.data +=3D f"\t\tKAPI_PARAM_ENUM_VALUES({param_specs['= param-enum-values']})\n" + + # Handle constraint description + if 'param-constraint' in param_specs: + self.data +=3D f"\t\tKAPI_PARAM_CONSTRAINT({self._format_m= acro_param(param_specs['param-constraint'])})\n" + + self.data +=3D "\tKAPI_PARAM_END\n" + + def _parse_param_spec(self, section_content, param_name): + """Parse parameter specifications from indented format""" + specs =3D {} + lines =3D section_content.strip().split('\n') + current_item =3D None + + # Map to expected keys + field_map =3D { + 'flags': 'param-flags', + 'size': 'param-size', + 'constraint-type': 'param-constraint-type', + 'constraint': 'param-constraint', + 'range': 'param-range', + 'mask': 'param-mask', + 'valid-mask': 'param-mask', + 'valid-values': 'param-enum-values', + 'alignment': 'param-alignment', + 'struct-type': 'param-struct-type', + } + + i =3D 0 + while i < len(lines): + line =3D lines[i] + if not line.strip(): + i +=3D 1 + continue + + # Check if this is our parameter (non-indented line) + if not line.startswith((' ', '\t')): + parts =3D line.strip().split(',', 1) + current_item =3D param_name if parts[0].strip() =3D=3D par= am_name else None + if current_item and len(parts) > 1: + specs['param-type'] =3D parts[1].strip() + i +=3D 1 + elif current_item =3D=3D param_name: + # Parse subfield + stripped =3D line.strip() + if ':' in stripped: + key, value =3D stripped.split(':', 1) + key =3D key.strip() + value =3D value.strip() + + # Collect continuation lines (indented lines without a= colon that + # defines a new key, i.e., lines that are pure continu= ations) + i +=3D 1 + while i < len(lines): + next_line =3D lines[i] + # Stop if we hit a non-indented line (new param) + if next_line.strip() and not next_line.startswith(= (' ', '\t')): + break + next_stripped =3D next_line.strip() + # Stop if we hit a new key (contains colon with kn= own key prefix) + if next_stripped and ':' in next_stripped: + potential_key =3D next_stripped.split(':', 1)[= 0].strip() + if potential_key in field_map or potential_key= in ['type', 'desc']: + break + # This is a continuation line + if next_stripped: + value =3D value + ' ' + next_stripped + i +=3D 1 + + if key in field_map: + # Clean up the value - remove excessive whitespace + value =3D ' '.join(value.split()) + specs[field_map[key]] =3D value + else: + i +=3D 1 + + return specs + + def _validate_effect_type(self, effect_type): + """Validate and normalize effect type""" + if 'KAPI_EFFECT_SCHEDULER' in effect_type: + return effect_type.replace('KAPI_EFFECT_SCHEDULER', 'KAPI_EFFE= CT_SCHEDULE') + + if 'KAPI_EFFECT_' in effect_type and effect_type not in VALID_EFFE= CT_TYPES: + if '|' in effect_type: + parts =3D [p.strip() for p in effect_type.split('|')] + valid_parts =3D [p if p in VALID_EFFECT_TYPES else 'KAPI_E= FFECT_MODIFY_STATE' for p in parts] + return ' | '.join(valid_parts) + return 'KAPI_EFFECT_MODIFY_STATE' + + return effect_type + + def _has_api_spec(self, sections): + """Check if this function has an API specification. + + Returns True if at least 2 KAPI-specific section indicators are pr= esent. + We require 2+ indicators (not just 1) to avoid false positives from + regular kernel-doc comments that happen to use a common section na= me + like 'return' or 'error'. Having multiple KAPI sections strongly + suggests intentional API specification rather than coincidence. + """ + indicators =3D [ + 'api-type', 'context-flags', 'param-type', 'error-code', + 'capability', 'signal', 'lock', 'state-trans', 'constraint', + 'return', 'error', 'side-effects', 'struct' + ] + + count =3D sum(1 for ind in indicators + if any(key.lower().startswith(ind.lower()) or + key.lower().startswith('@' + ind.lower()) + for key in sections.keys())) + + # Require 2+ indicators to distinguish from regular kernel-doc + return count >=3D 2 + + def out_function(self, fname, name, args): + """Generate API spec for a function""" + function_name =3D args.get('function', name) + sections =3D args.sections if hasattr(args, 'sections') else args.= get('sections', {}) + + if not self._has_api_spec(sections): + return + + parameterlist =3D args.parameterlist if hasattr(args, 'parameterli= st') else args.get('parameterlist', []) + parameterdescs =3D args.parameterdescs if hasattr(args, 'parameter= descs') else args.get('parameterdescs', {}) + parametertypes =3D args.parametertypes if hasattr(args, 'parameter= types') else args.get('parametertypes', {}) + purpose =3D args.get('purpose', '') + + # Start macro invocation + self.data +=3D f"DEFINE_KERNEL_API_SPEC({function_name})\n" + + # Basic info + if purpose: + self.data +=3D f"\tKAPI_DESCRIPTION({self._format_macro_param(= purpose)})\n" + + long_desc =3D self._get_section(sections, 'long-desc') + if long_desc: + self.data +=3D f"\tKAPI_LONG_DESC({self._format_macro_param(lo= ng_desc)})\n" + + # Context flags + context =3D self._get_section(sections, 'context-flags') or self._= get_section(sections, 'context') + if context: + self.data +=3D f"\tKAPI_CONTEXT({context})\n" + + # Process parameters + self._process_parameters(sections, parameterlist, parameterdescs, = parametertypes) + + # Process errors + errors =3D self._parse_indented_items( + self._get_raw_section(sections, 'error'), + self._parse_error_item + ) + + if errors: + self.data +=3D f"\n\tKAPI_RETURN_ERROR_COUNT({len(errors)})\n" + self.data +=3D f"\n\tKAPI_ERROR_COUNT({len(errors)})\n" + + for idx, error in enumerate(errors): + self._output_error(idx, error) + + # Process signals + signals =3D self._parse_indented_items( + self._get_raw_section(sections, 'signal'), + self._parse_signal_item + ) + + if signals: + self.data +=3D f"\n\tKAPI_SIGNAL_COUNT({len(signals)})\n" + + for idx, signal in enumerate(signals): + self._output_signal(idx, signal) + + # Process other specifications + self._process_locks(sections) + self._process_constraints(sections) + self._process_side_effects(sections) + self._process_state_transitions(sections) + self._process_capabilities(sections) + + # Add examples and notes + for key, macro in [('examples', 'KAPI_EXAMPLES'), ('notes', 'KAPI_= NOTES')]: + value =3D self._get_section(sections, key) + if value: + self.data +=3D f"\n\t{macro}({self._format_macro_param(val= ue)})\n" + + self.data +=3D "\nKAPI_END_SPEC;\n\n" + + def _output_error(self, idx, error): + """Output a single error specification""" + line =3D error['line'] + if line.startswith('-'): + line =3D line[1:].strip() + + parts =3D line.split(',', 2) + if len(parts) =3D=3D 2: + # Format: NAME, description + name =3D parts[0].strip() + short_desc =3D parts[1].strip() + code =3D f"-{name}" + elif len(parts) >=3D 3: + # Format: code, name, description + code =3D parts[0].strip() + name =3D parts[1].strip() + short_desc =3D parts[2].strip() + if not code.startswith('-'): + code =3D f"-{code}" + else: + return + + long_desc =3D error.get('desc', '') or short_desc + + self.data +=3D f"\n\tKAPI_ERROR({idx}, {code}, {self._format_macro= _param(name)}, " + self.data +=3D f"{self._format_macro_param(short_desc)},\n\t\t {= self._format_macro_param(long_desc)})\n" + + def _output_signal(self, idx, signal): + """Output a single signal specification""" + self.data +=3D f"\n\tKAPI_SIGNAL({idx}, {signal['number']}, " + self.data +=3D f"{self._format_macro_param(signal['name'], KAPI_MA= X_SIGNAL_NAME_LEN)}, " + self.data +=3D f"{signal['direction']}, {signal['action']})\n" + + for key, macro in [ + ('condition', 'KAPI_SIGNAL_CONDITION'), + ('desc', 'KAPI_SIGNAL_DESC'), + ('error', 'KAPI_SIGNAL_ERROR'), + ('timing', 'KAPI_SIGNAL_TIMING'), + ('priority', 'KAPI_SIGNAL_PRIORITY'), + ]: + if signal.get(key): + # Priority field is numeric + if key =3D=3D 'priority': + self.data +=3D f"\t\t{macro}({signal[key]})\n" + else: + self.data +=3D f"\t\t{macro}({self._format_macro_param= (signal[key])})\n" + + if signal.get('interruptible'): + self.data +=3D "\t\tKAPI_SIGNAL_INTERRUPTIBLE\n" + + self.data +=3D "\tKAPI_SIGNAL_END\n" + + def _process_locks(self, sections): + """Process lock specifications""" + locks =3D self._parse_indented_items( + self._get_raw_section(sections, 'lock'), + self._parse_lock_item + ) + + if locks: + self.data +=3D f"\n\tKAPI_LOCK_COUNT({len(locks)})\n" + + for idx, lock in enumerate(locks): + self.data +=3D f"\n\tKAPI_LOCK({idx}, {self._format_macro_= param(lock['name'])}, {lock['type']})\n" + + for flag in ['acquired', 'released']: + if lock.get(flag): + self.data +=3D f"\t\tKAPI_LOCK_{flag.upper()}\n" + + if lock.get('desc'): + self.data +=3D f"\t\tKAPI_LOCK_DESC({self._format_macr= o_param(lock['desc'])})\n" + + self.data +=3D "\tKAPI_LOCK_END\n" + + def _process_constraints(self, sections): + """Process constraint specifications""" + constraints =3D self._parse_indented_items( + self._get_raw_section(sections, 'constraint'), + self._parse_constraint_item + ) + + if constraints: + self.data +=3D f"\n\tKAPI_CONSTRAINT_COUNT({len(constraints)})= \n" + + for idx, constraint in enumerate(constraints): + self.data +=3D f"\n\tKAPI_CONSTRAINT({idx}, {self._format_= macro_param(constraint['name'])},\n" + self.data +=3D f"\t\t\t{self._format_macro_param(constrain= t['desc'])})\n" + + if constraint.get('expr'): + self.data +=3D f"\t\tKAPI_CONSTRAINT_EXPR({self._forma= t_macro_param(constraint['expr'])})\n" + + self.data +=3D "\tKAPI_CONSTRAINT_END\n" + + def _process_side_effects(self, sections): + """Process side effect specifications""" + effects =3D self._parse_indented_items( + self._get_raw_section(sections, 'side-effect'), + self._parse_side_effect_item + ) + + if effects: + self.data +=3D f"\n\tKAPI_SIDE_EFFECT_COUNT({len(effects)})\n" + + for idx, effect in enumerate(effects): + effect_type =3D self._validate_effect_type(effect['type']) + + self.data +=3D f"\n\tKAPI_SIDE_EFFECT({idx}, {effect_type}= ,\n" + self.data +=3D f"\t\t\t {self._format_macro_param(effect['= target'])},\n" + self.data +=3D f"\t\t\t {self._format_macro_param(effect['= desc'])})\n" + + if effect.get('condition'): + self.data +=3D f"\t\tKAPI_EFFECT_CONDITION({self._form= at_macro_param(effect['condition'])})\n" + + if effect.get('reversible'): + self.data +=3D "\t\tKAPI_EFFECT_REVERSIBLE\n" + + self.data +=3D "\tKAPI_SIDE_EFFECT_END\n" + + def _process_state_transitions(self, sections): + """Process state transition specifications""" + transitions =3D self._parse_indented_items( + self._get_raw_section(sections, 'state-trans'), + self._parse_state_trans_item + ) + + if transitions: + self.data +=3D f"\n\tKAPI_STATE_TRANS_COUNT({len(transitions)}= )\n" + + for idx, trans in enumerate(transitions): + desc =3D trans['desc'] + if trans.get('condition'): + desc =3D trans['condition'] + (', ' + desc if desc els= e '') + + self.data +=3D f"\n\tKAPI_STATE_TRANS({idx}, {self._format= _macro_param(trans['target'])}, " + self.data +=3D f"{self._format_macro_param(trans['from'])}= , {self._format_macro_param(trans['to'])},\n" + self.data +=3D f"\t\t\t {self._format_macro_param(desc)})\= n" + self.data +=3D "\tKAPI_STATE_TRANS_END\n" + + def _process_capabilities(self, sections): + """Process capability specifications""" + cap_section =3D self._get_raw_section(sections, 'capability') + if not cap_section: + return + + lines =3D cap_section.strip().split('\n') + capabilities =3D [] + i =3D 0 + + while i < len(lines): + line =3D lines[i].strip() + if not line or line.startswith(('allows:', 'without:', 'condit= ion:', 'priority:')): + i +=3D 1 + continue + + cap_info =3D {'line': line} + + # Parse subfields + subfields, next_i =3D self._parse_subfields(lines, i) + cap_info.update(subfields) + capabilities.append(cap_info) + i =3D next_i + + if capabilities: + self.data +=3D f"\n\tKAPI_CAPABILITY_COUNT({len(capabilities)}= )\n" + + for idx, cap in enumerate(capabilities): + parts =3D cap['line'].split(',', 2) + if len(parts) >=3D 2: + cap_name =3D parts[0].strip() + cap_type =3D parts[1].strip() + cap_desc =3D parts[2].strip() if len(parts) > 2 else c= ap_name + + # Fix common type issues + if 'BYPASS' in cap_type and cap_type !=3D 'KAPI_CAP_BY= PASS_CHECK': + cap_type =3D 'KAPI_CAP_BYPASS_CHECK' + + self.data +=3D f"\n\tKAPI_CAPABILITY({idx}, {cap_name}= , {self._format_macro_param(cap_desc)}, {cap_type})\n" + + for key, macro in [ + ('allows', 'KAPI_CAP_ALLOWS'), + ('without', 'KAPI_CAP_WITHOUT'), + ('condition', 'KAPI_CAP_CONDITION'), + ('priority', 'KAPI_CAP_PRIORITY'), + ]: + if cap.get(key): + value =3D self._format_macro_param(cap[key]) i= f key !=3D 'priority' else cap[key] + self.data +=3D f"\t\t{macro}({value})\n" + + self.data +=3D "\tKAPI_CAPABILITY_END\n" + + # Skip output methods for non-function types + def out_enum(self, fname, name, args): pass + def out_typedef(self, fname, name, args): pass + def out_struct(self, fname, name, args): pass + def out_doc(self, fname, name, args): pass diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/k= doc_output.py index b1aaa7fc36041..cc5752cd76a8d 100644 --- a/tools/lib/python/kdoc/kdoc_output.py +++ b/tools/lib/python/kdoc/kdoc_output.py @@ -124,8 +124,13 @@ class OutputFormat: Output warnings for identifiers that will be displayed. """ =20 - for log_msg in args.warnings: - self.config.warning(log_msg) + warnings =3D getattr(args, 'warnings', []) + + for log_msg in warnings: + # Skip numeric warnings (line numbers) which are false positiv= es + # from parameter-specific sections like "param-constraint: nam= e, value" + if not isinstance(log_msg, int): + self.config.warning(log_msg) =20 def check_doc(self, name, args): """Check if DOC should be output""" diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/k= doc_parser.py index 500aafc500322..ecd218e762a34 100644 --- a/tools/lib/python/kdoc/kdoc_parser.py +++ b/tools/lib/python/kdoc/kdoc_parser.py @@ -31,6 +31,23 @@ from kdoc.kdoc_item import KdocItem # Allow whitespace at end of comment start. doc_start =3D KernRe(r'^/\*\*\s*$', cache=3DFalse) =20 +# Sections that are allowed to be duplicated for API specifications +# These represent lists of items (multiple errors, signals, etc.) +ALLOWED_DUPLICATE_SECTIONS =3D { + 'param', '@param', + 'error', '@error', + 'signal', '@signal', + 'lock', '@lock', + 'side-effect', '@side-effect', + 'state-trans', '@state-trans', + 'capability', '@capability', + 'constraint', '@constraint', + 'validation-group', '@validation-group', + 'validation-rule', '@validation-rule', + 'validation-flag', '@validation-flag', + 'struct-field', '@struct-field', +} + doc_end =3D KernRe(r'\*/', cache=3DFalse) doc_com =3D KernRe(r'\s*\*\s*', cache=3DFalse) doc_com_body =3D KernRe(r'\s*\* ?', cache=3DFalse) @@ -43,10 +60,71 @@ doc_decl =3D doc_com + KernRe(r'(\w+)', cache=3DFalse) # @{section-name}: # while trying to not match literal block starts like "example::" # +# Base kernel-doc section names known_section_names =3D 'description|context|returns?|notes?|examples?' -known_sections =3D KernRe(known_section_names, flags =3D re.I) + +# API specification section names (for KAPI spec framework) +# Format: (base_name, has_count_variant, has_other_variants) +# Sections with has_count_variant=3DTrue need negative lookahead in doc_se= ct +# to avoid matching 'error' when 'error-count' is intended +_kapi_base_sections =3D [ + # (name, needs_lookahead, additional_variants) + ('api-type', False, []), + ('api-version', False, []), + ('param', True, []), # has param-count + ('struct', True, ['struct-type', 'struct-field', 'struct-field-[a-z\\-= ]+']), + ('validation-group', False, []), + ('validation-policy', False, []), + ('validation-flag', False, []), + ('validation-rule', False, []), + ('error', True, ['error-code', 'error-condition']), + ('capability', True, []), + ('signal', True, []), + ('lock', True, []), + ('since', False, ['since-version']), + ('context-flags', False, []), + ('return', True, ['return-type', 'return-check', 'return-check-type', + 'return-success', 'return-desc']), + ('long-desc', False, []), + ('constraint', True, []), + ('side-effect', True, []), + ('state-trans', True, []), +] + +def _build_kapi_patterns(): + """Build KAPI section patterns from the base definitions.""" + validation_parts =3D [] # For known_sections (simple validation) + parsing_parts =3D [] # For doc_sect (with negative lookaheads) + + for name, has_count, variants in _kapi_base_sections: + # Add base name (with optional @ prefix) + validation_parts.append(f'@?{name}') + if has_count: + # Need negative lookahead to not match 'name-count' or 'name-*' + parsing_parts.append(f'@?{name}(?!-)') + validation_parts.append(f'@?{name}-count') + parsing_parts.append(f'@?{name}-count') + else: + parsing_parts.append(f'@?{name}') + + # Add variants + for variant in variants: + validation_parts.append(f'@?{variant}') + parsing_parts.append(f'@?{variant}') + + # Add catch-all for kapi-* extensions + validation_parts.append(r'@?kapi-.*') + parsing_parts.append(r'@?kapi-.*') + + return '|'.join(validation_parts), '|'.join(parsing_parts) + +_kapi_validation_pattern, _kapi_parsing_pattern =3D _build_kapi_patterns() + +known_sections =3D KernRe(known_section_names + '|' + _kapi_validation_pat= tern, + flags=3Dre.I) doc_sect =3D doc_com + \ - KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)= ?$', + KernRe(r'\s*(@[.\w\-]+|@\.\.\.|' + known_section_names + '|' + + _kapi_parsing_pattern + r')\s*:([^:].*)?$', flags=3Dre.I, cache=3DFalse) =20 doc_content =3D doc_com_body + KernRe(r'(.*)', cache=3DFalse) @@ -342,7 +420,9 @@ class KernelEntry: else: if name in self.sections and self.sections[name] !=3D "": # Only warn on user-specified duplicate section names - if name !=3D SECTION_DEFAULT: + # Skip warning for sections that are expected to have dupl= icates + # (like error, param, signal, etc. for API specifications) + if name !=3D SECTION_DEFAULT and name not in ALLOWED_DUPLI= CATE_SECTIONS: self.emit_msg(self.new_start_line, f"duplicate section name '{name}'") # Treat as a new paragraph - add a blank line --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 031E12DECC5; Thu, 18 Dec 2025 20:42:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090569; cv=none; b=jD4i7vNWCz8m80HNhW7oEyyEG7eouRratkSc/qif4B4URZ7bOedPUZelmJ8EcO7PH2I+R1TAA/caGxtdZVC3W6o5ZZ3T5pf2LeYq6cqMcFeiY/++Zc50uBTE4xLANukaZjS9cHAKmjX9598i5xn5yk2J9fpsXNPJ5J0Pd3+i+cM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090569; c=relaxed/simple; bh=4njXAcsXQgxzf4FI+Pl/TwMQWMtJkYo+TBa/CAdD/RQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZALtr1RWIYB8bx4u9YEL++SdIu1yiPn8/tOmPFjwi8Yu7b0piYjAMDUvAQXiPq1KqLOO9ymxRiCXAaf0Pj4uiHbb1C4ZqVdpx7cdwdGTcH4VvqMNonhYPGBiIqArTKXk95mAFSYQPfj+z3fMmFiCDIz8DX7WDbEkIXhswG3Uc64= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ogzR9M+Y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ogzR9M+Y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3DD3EC16AAE; Thu, 18 Dec 2025 20:42:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090568; bh=4njXAcsXQgxzf4FI+Pl/TwMQWMtJkYo+TBa/CAdD/RQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ogzR9M+Y/t6RFbp3aBNkqojGdsWkWS4Hio91QkeoAErCWQXy6ZO38CuHD+oL7ssBo HwN74EuIoVBpI1ty7xBqke1c+xlWTVTOJLsL91g/EEXe7XND5c1UWY04LsE9E4u54M 7XXgw+F2aBHwmEkMGzaCWx0S4Mo2Wjbl8Su1LiiOdMsmveKkNZbZuZXXhI+x7qsHyt xBeKwIsEu1IEO8DrwLftJ+/XktfDI+zbu5CsiB+OrF+qXWnC8sxx0e9ebA0NRAEHYN xPn+h3oq+q9PbNGIcXZ5d4lg2uS6kYqW02yweYkP1chr+59OV1MJlciHikqodiev/d lamxhscxAzxVQ== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 03/15] kernel/api: add debugfs interface for kernel API specifications Date: Thu, 18 Dec 2025 15:42:25 -0500 Message-ID: <20251218204239.4159453-4-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a debugfs interface to expose kernel API specifications at runtime. This allows tools and users to query the complete API specifications through the debugfs filesystem. The interface provides: - /sys/kernel/debug/kapi/list - lists all available API specifications - /sys/kernel/debug/kapi/specs/ - detailed info for each API Each specification file includes: - Function name, version, and descriptions - Execution context requirements and flags - Parameter details with types, flags, and constraints - Return value specifications and success conditions - Error codes with descriptions and conditions - Locking requirements and constraints - Signal handling specifications - Examples, notes, and deprecation status This enables runtime introspection of kernel APIs for documentation tools, static analyzers, and debugging purposes. Signed-off-by: Sasha Levin --- kernel/api/Kconfig | 20 +++ kernel/api/Makefile | 3 + kernel/api/kapi_debugfs.c | 358 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 381 insertions(+) create mode 100644 kernel/api/kapi_debugfs.c diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig index fde25ec70e134..d2754b21acc43 100644 --- a/kernel/api/Kconfig +++ b/kernel/api/Kconfig @@ -33,3 +33,23 @@ config KAPI_RUNTIME_CHECKS development. The checks use WARN_ONCE to report violations. =20 If unsure, say N. + +config KAPI_SPEC_DEBUGFS + bool "Export kernel API specifications via debugfs" + depends on KAPI_SPEC + depends on DEBUG_FS + help + This option enables exporting kernel API specifications through + the debugfs filesystem. When enabled, specifications can be + accessed at /sys/kernel/debug/kapi/. + + The debugfs interface provides: + - A list of all available API specifications + - Detailed information for each API including parameters, + return values, errors, locking requirements, and constraints + - Complete machine-readable representation of the specs + + This is useful for documentation tools, static analyzers, and + runtime introspection of kernel APIs. + + If unsure, say N. diff --git a/kernel/api/Makefile b/kernel/api/Makefile index acab17c78afa3..716f128eea71d 100644 --- a/kernel/api/Makefile +++ b/kernel/api/Makefile @@ -10,6 +10,9 @@ obj-$(CONFIG_KAPI_SPEC) +=3D kernel_api_spec.o ifeq ($(CONFIG_KAPI_SPEC),y) obj-$(CONFIG_KAPI_SPEC) +=3D generated_api_specs.o =20 +# Debugfs interface for kernel API specs +obj-$(CONFIG_KAPI_SPEC_DEBUGFS) +=3D kapi_debugfs.o + # Find all potential apispec files (this is evaluated at make time) apispec-files :=3D $(shell find $(objtree) -name "*.apispec.h" -type f 2>/= dev/null) =20 diff --git a/kernel/api/kapi_debugfs.c b/kernel/api/kapi_debugfs.c new file mode 100644 index 0000000000000..84d5446d93916 --- /dev/null +++ b/kernel/api/kapi_debugfs.c @@ -0,0 +1,358 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Kernel API specification debugfs interface + * + * This provides a debugfs interface to expose kernel API specifications + * at runtime, allowing tools and users to query the complete API specs. + */ + +#include +#include +#include +#include +#include +#include +#include + +/* External symbols for kernel API spec section */ +extern struct kernel_api_spec __start_kapi_specs[]; +extern struct kernel_api_spec __stop_kapi_specs[]; + +static struct dentry *kapi_debugfs_root; + +/* Helper function to print parameter type as string */ +static const char *param_type_str(enum kapi_param_type type) +{ + switch (type) { + case KAPI_TYPE_INT: return "int"; + case KAPI_TYPE_UINT: return "uint"; + case KAPI_TYPE_PTR: return "ptr"; + case KAPI_TYPE_STRUCT: return "struct"; + case KAPI_TYPE_UNION: return "union"; + case KAPI_TYPE_ARRAY: return "array"; + case KAPI_TYPE_FD: return "fd"; + case KAPI_TYPE_ENUM: return "enum"; + case KAPI_TYPE_USER_PTR: return "user_ptr"; + case KAPI_TYPE_PATH: return "path"; + case KAPI_TYPE_FUNC_PTR: return "func_ptr"; + case KAPI_TYPE_CUSTOM: return "custom"; + default: return "unknown"; + } +} + +/* Helper to print parameter flags */ +static void print_param_flags(struct seq_file *m, u32 flags) +{ + seq_printf(m, " flags: "); + if (flags & KAPI_PARAM_IN) seq_printf(m, "IN "); + if (flags & KAPI_PARAM_OUT) seq_printf(m, "OUT "); + if (flags & KAPI_PARAM_INOUT) seq_printf(m, "INOUT "); + if (flags & KAPI_PARAM_OPTIONAL) seq_printf(m, "OPTIONAL "); + if (flags & KAPI_PARAM_CONST) seq_printf(m, "CONST "); + if (flags & KAPI_PARAM_USER) seq_printf(m, "USER "); + if (flags & KAPI_PARAM_VOLATILE) seq_printf(m, "VOLATILE "); + if (flags & KAPI_PARAM_DMA) seq_printf(m, "DMA "); + if (flags & KAPI_PARAM_ALIGNED) seq_printf(m, "ALIGNED "); + seq_printf(m, "\n"); +} + +/* Helper to print context flags */ +static void print_context_flags(struct seq_file *m, u32 flags) +{ + seq_printf(m, "Context flags: "); + if (flags & KAPI_CTX_PROCESS) seq_printf(m, "PROCESS "); + if (flags & KAPI_CTX_HARDIRQ) seq_printf(m, "HARDIRQ "); + if (flags & KAPI_CTX_SOFTIRQ) seq_printf(m, "SOFTIRQ "); + if (flags & KAPI_CTX_NMI) seq_printf(m, "NMI "); + if (flags & KAPI_CTX_SLEEPABLE) seq_printf(m, "SLEEPABLE "); + if (flags & KAPI_CTX_ATOMIC) seq_printf(m, "ATOMIC "); + if (flags & KAPI_CTX_PREEMPT_DISABLED) seq_printf(m, "PREEMPT_DISABLED "); + if (flags & KAPI_CTX_IRQ_DISABLED) seq_printf(m, "IRQ_DISABLED "); + seq_printf(m, "\n"); +} + +/* Show function for individual API spec */ +static int kapi_spec_show(struct seq_file *m, void *v) +{ + struct kernel_api_spec *spec =3D m->private; + int i; + + seq_printf(m, "Kernel API Specification\n"); + seq_printf(m, "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D\n\n"); + + /* Basic info */ + seq_printf(m, "Name: %s\n", spec->name); + seq_printf(m, "Version: %u\n", spec->version); + seq_printf(m, "Description: %s\n", spec->description); + if (strlen(spec->long_description) > 0) + seq_printf(m, "Long description: %s\n", spec->long_description); + + /* Context */ + print_context_flags(m, spec->context_flags); + seq_printf(m, "\n"); + + /* Parameters */ + if (spec->param_count > 0) { + seq_printf(m, "Parameters (%u):\n", spec->param_count); + for (i =3D 0; i < spec->param_count; i++) { + struct kapi_param_spec *param =3D &spec->params[i]; + seq_printf(m, " [%d] %s:\n", i, param->name); + seq_printf(m, " type: %s (%s)\n", + param_type_str(param->type), param->type_name); + print_param_flags(m, param->flags); + if (strlen(param->description) > 0) + seq_printf(m, " description: %s\n", param->description); + if (param->size > 0) + seq_printf(m, " size: %zu\n", param->size); + if (param->alignment > 0) + seq_printf(m, " alignment: %zu\n", param->alignment); + + /* Print constraints if any */ + if (param->constraint_type !=3D KAPI_CONSTRAINT_NONE) { + seq_printf(m, " constraints:\n"); + switch (param->constraint_type) { + case KAPI_CONSTRAINT_RANGE: + seq_printf(m, " type: range\n"); + seq_printf(m, " min: %lld\n", param->min_value); + seq_printf(m, " max: %lld\n", param->max_value); + break; + case KAPI_CONSTRAINT_MASK: + seq_printf(m, " type: mask\n"); + seq_printf(m, " valid_bits: 0x%llx\n", param->valid_mask); + break; + case KAPI_CONSTRAINT_ENUM: + seq_printf(m, " type: enum\n"); + seq_printf(m, " count: %u\n", param->enum_count); + break; + case KAPI_CONSTRAINT_USER_STRING: + seq_printf(m, " type: user_string\n"); + seq_printf(m, " min_len: %lld\n", param->min_value); + seq_printf(m, " max_len: %lld\n", param->max_value); + break; + case KAPI_CONSTRAINT_USER_PATH: + seq_printf(m, " type: user_path\n"); + seq_printf(m, " max_len: PATH_MAX (4096)\n"); + break; + case KAPI_CONSTRAINT_USER_PTR: + seq_printf(m, " type: user_ptr\n"); + seq_printf(m, " size: %zu bytes\n", param->size); + break; + case KAPI_CONSTRAINT_CUSTOM: + seq_printf(m, " type: custom\n"); + if (strlen(param->constraints) > 0) + seq_printf(m, " description: %s\n", + param->constraints); + break; + default: + break; + } + } + seq_printf(m, "\n"); + } + } + + /* Return value */ + seq_printf(m, "Return value:\n"); + seq_printf(m, " type: %s\n", spec->return_spec.type_name); + if (strlen(spec->return_spec.description) > 0) + seq_printf(m, " description: %s\n", spec->return_spec.description); + + switch (spec->return_spec.check_type) { + case KAPI_RETURN_EXACT: + seq_printf(m, " success: =3D=3D %lld\n", spec->return_spec.success_valu= e); + break; + case KAPI_RETURN_RANGE: + seq_printf(m, " success: [%lld, %lld]\n", + spec->return_spec.success_min, + spec->return_spec.success_max); + break; + case KAPI_RETURN_FD: + seq_printf(m, " success: valid file descriptor (>=3D 0)\n"); + break; + case KAPI_RETURN_ERROR_CHECK: + seq_printf(m, " success: error check\n"); + break; + case KAPI_RETURN_CUSTOM: + seq_printf(m, " success: custom check\n"); + break; + default: + break; + } + seq_printf(m, "\n"); + + /* Errors */ + if (spec->error_count > 0) { + seq_printf(m, "Errors (%u):\n", spec->error_count); + for (i =3D 0; i < spec->error_count; i++) { + struct kapi_error_spec *err =3D &spec->errors[i]; + seq_printf(m, " %s (%d): %s\n", + err->name, err->error_code, err->description); + if (strlen(err->condition) > 0) + seq_printf(m, " condition: %s\n", err->condition); + } + seq_printf(m, "\n"); + } + + /* Locks */ + if (spec->lock_count > 0) { + seq_printf(m, "Locks (%u):\n", spec->lock_count); + for (i =3D 0; i < spec->lock_count; i++) { + struct kapi_lock_spec *lock =3D &spec->locks[i]; + const char *type_str, *scope_str; + switch (lock->lock_type) { + case KAPI_LOCK_MUTEX: type_str =3D "mutex"; break; + case KAPI_LOCK_SPINLOCK: type_str =3D "spinlock"; break; + case KAPI_LOCK_RWLOCK: type_str =3D "rwlock"; break; + case KAPI_LOCK_SEMAPHORE: type_str =3D "semaphore"; break; + case KAPI_LOCK_RCU: type_str =3D "rcu"; break; + case KAPI_LOCK_SEQLOCK: type_str =3D "seqlock"; break; + default: type_str =3D "unknown"; break; + } + switch (lock->scope) { + case KAPI_LOCK_INTERNAL: scope_str =3D "acquired and released"; break; + case KAPI_LOCK_ACQUIRES: scope_str =3D "acquired (not released)"; break; + case KAPI_LOCK_RELEASES: scope_str =3D "released (held on entry)"; brea= k; + case KAPI_LOCK_CALLER_HELD: scope_str =3D "held by caller"; break; + default: scope_str =3D "unknown"; break; + } + seq_printf(m, " %s (%s): %s\n", + lock->lock_name, type_str, lock->description); + seq_printf(m, " scope: %s\n", scope_str); + } + seq_printf(m, "\n"); + } + + /* Constraints */ + if (spec->constraint_count > 0) { + seq_printf(m, "Additional constraints (%u):\n", spec->constraint_count); + for (i =3D 0; i < spec->constraint_count; i++) { + struct kapi_constraint_spec *cons =3D &spec->constraints[i]; + + seq_printf(m, " - %s", cons->name); + if (cons->description[0]) + seq_printf(m, ": %s", cons->description); + seq_printf(m, "\n"); + if (cons->expression[0]) + seq_printf(m, " expression: %s\n", cons->expression); + } + seq_printf(m, "\n"); + } + + /* Signals */ + if (spec->signal_count > 0) { + seq_printf(m, "Signal handling (%u):\n", spec->signal_count); + for (i =3D 0; i < spec->signal_count; i++) { + struct kapi_signal_spec *sig =3D &spec->signals[i]; + seq_printf(m, " %s (%d):\n", sig->signal_name, sig->signal_num); + seq_printf(m, " direction: "); + if (sig->direction & KAPI_SIGNAL_SEND) seq_printf(m, "send "); + if (sig->direction & KAPI_SIGNAL_RECEIVE) seq_printf(m, "receive "); + if (sig->direction & KAPI_SIGNAL_HANDLE) seq_printf(m, "handle "); + if (sig->direction & KAPI_SIGNAL_BLOCK) seq_printf(m, "block "); + if (sig->direction & KAPI_SIGNAL_IGNORE) seq_printf(m, "ignore "); + seq_printf(m, "\n"); + seq_printf(m, " action: "); + switch (sig->action) { + case KAPI_SIGNAL_ACTION_DEFAULT: seq_printf(m, "default"); break; + case KAPI_SIGNAL_ACTION_TERMINATE: seq_printf(m, "terminate"); break; + case KAPI_SIGNAL_ACTION_COREDUMP: seq_printf(m, "coredump"); break; + case KAPI_SIGNAL_ACTION_STOP: seq_printf(m, "stop"); break; + case KAPI_SIGNAL_ACTION_CONTINUE: seq_printf(m, "continue"); break; + case KAPI_SIGNAL_ACTION_CUSTOM: seq_printf(m, "custom"); break; + case KAPI_SIGNAL_ACTION_RETURN: seq_printf(m, "return"); break; + case KAPI_SIGNAL_ACTION_RESTART: seq_printf(m, "restart"); break; + default: seq_printf(m, "unknown"); break; + } + seq_printf(m, "\n"); + if (strlen(sig->description) > 0) + seq_printf(m, " description: %s\n", sig->description); + } + seq_printf(m, "\n"); + } + + /* Additional info */ + if (strlen(spec->examples) > 0) { + seq_printf(m, "Examples:\n%s\n\n", spec->examples); + } + if (strlen(spec->notes) > 0) { + seq_printf(m, "Notes:\n%s\n\n", spec->notes); + } + if (strlen(spec->since_version) > 0) { + seq_printf(m, "Since: %s\n", spec->since_version); + } + + return 0; +} + +static int kapi_spec_open(struct inode *inode, struct file *file) +{ + return single_open(file, kapi_spec_show, inode->i_private); +} + +static const struct file_operations kapi_spec_fops =3D { + .open =3D kapi_spec_open, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D single_release, +}; + +/* Show all available API specs */ +static int kapi_list_show(struct seq_file *m, void *v) +{ + struct kernel_api_spec *spec; + int count =3D 0; + + seq_printf(m, "Available Kernel API Specifications\n"); + seq_printf(m, "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D\n\n"); + + for (spec =3D __start_kapi_specs; spec < __stop_kapi_specs; spec++) { + seq_printf(m, "%s - %s\n", spec->name, spec->description); + count++; + } + + seq_printf(m, "\nTotal: %d specifications\n", count); + return 0; +} + +static int kapi_list_open(struct inode *inode, struct file *file) +{ + return single_open(file, kapi_list_show, NULL); +} + +static const struct file_operations kapi_list_fops =3D { + .open =3D kapi_list_open, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D single_release, +}; + +static int __init kapi_debugfs_init(void) +{ + struct kernel_api_spec *spec; + struct dentry *spec_dir; + + /* Create main directory */ + kapi_debugfs_root =3D debugfs_create_dir("kapi", NULL); + + /* Create list file */ + debugfs_create_file("list", 0444, kapi_debugfs_root, NULL, &kapi_list_fop= s); + + /* Create specs subdirectory */ + spec_dir =3D debugfs_create_dir("specs", kapi_debugfs_root); + + /* Create a file for each API spec */ + for (spec =3D __start_kapi_specs; spec < __stop_kapi_specs; spec++) { + debugfs_create_file(spec->name, 0444, spec_dir, spec, &kapi_spec_fops); + } + + pr_info("Kernel API debugfs interface initialized\n"); + return 0; +} + +static void __exit kapi_debugfs_exit(void) +{ + debugfs_remove_recursive(kapi_debugfs_root); +} + +/* Initialize as part of kernel, not as a module */ +fs_initcall(kapi_debugfs_init); \ No newline at end of file --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03BD32E266C; Thu, 18 Dec 2025 20:42:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090570; cv=none; b=YcVkuUqWiJmGSEq2anlyOHzfAmZILs7tWzeq9uYQKWS+UjQEPKLPlGeJ03cFx45BDViC8agWzGql1IUU+XVnsKBm9ImglMaQPX4gpmUURagk4GHeQPambZ1XwwhCns0CX44Y2ZAjmIv3vSPZAkPDUSi4s0cdgb/epC4WYyAYz1A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090570; c=relaxed/simple; bh=UqsaxVMM35ucTyEU6DqnkaL2FZGYvIUE6vZ09Qp+Ne0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rCfOuXzyn28jity3Z71mLCitZeatkM5kEhVbPPzeOCPQVU0bOtpEt5ltrGW29451vdtmOi1mVcHeDqYoDcEPypD5g79LCVglNaoPRgQvVtjc6JXk5ukpt6z/vCXhWJTw1Se1I6cNEjY5Enc+EEIFlIUHScrPDW0yHRUgLZ+8GMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qRU4ny9j; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qRU4ny9j" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26C8CC116D0; Thu, 18 Dec 2025 20:42:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090569; bh=UqsaxVMM35ucTyEU6DqnkaL2FZGYvIUE6vZ09Qp+Ne0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qRU4ny9jvgtITbD7oN/6SnBhNtJUTUTPjqvCk/olIW44g+WsNl42uzy7llwYoJIUM ynRcu0TGSVu6e1RRccC/sMW31lhAks533b2V65txhfsZAPkiRu1yPceox796DFeSgd ueIJAkQ1ln/CeTYbItHNCNvfow7aLjYq9I9hVWHRVmXewOa7RGFwwZuBrmQTgeXMBR 92SbSNZ6cnuc0c3mn/HBpLZxKnUzd07610eRJpNjbBNYsHA6NMXy1szsb6UPyOvczV 5W0xPMPyAbCs7ehx3YEfLi44vTnxDE1NClmV/3aagtY7HO/Dj164AbQb7NGJIrC8GT V1Z6x220xLrng== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 04/15] tools/kapi: Add kernel API specification extraction tool Date: Thu, 18 Dec 2025 15:42:26 -0500 Message-ID: <20251218204239.4159453-5-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The kapi tool extracts and displays kernel API specifications. Signed-off-by: Sasha Levin --- Documentation/dev-tools/kernel-api-spec.rst | 198 +++- tools/kapi/.gitignore | 4 + tools/kapi/Cargo.toml | 19 + tools/kapi/src/extractor/debugfs.rs | 442 +++++++++ tools/kapi/src/extractor/kerneldoc_parser.rs | 692 ++++++++++++++ tools/kapi/src/extractor/mod.rs | 464 ++++++++++ tools/kapi/src/extractor/source_parser.rs | 213 +++++ .../src/extractor/vmlinux/binary_utils.rs | 180 ++++ .../src/extractor/vmlinux/magic_finder.rs | 102 +++ tools/kapi/src/extractor/vmlinux/mod.rs | 864 ++++++++++++++++++ tools/kapi/src/formatter/json.rs | 468 ++++++++++ tools/kapi/src/formatter/mod.rs | 140 +++ tools/kapi/src/formatter/plain.rs | 549 +++++++++++ tools/kapi/src/formatter/rst.rs | 621 +++++++++++++ tools/kapi/src/main.rs | 116 +++ 15 files changed, 5069 insertions(+), 3 deletions(-) create mode 100644 tools/kapi/.gitignore create mode 100644 tools/kapi/Cargo.toml create mode 100644 tools/kapi/src/extractor/debugfs.rs create mode 100644 tools/kapi/src/extractor/kerneldoc_parser.rs create mode 100644 tools/kapi/src/extractor/mod.rs create mode 100644 tools/kapi/src/extractor/source_parser.rs create mode 100644 tools/kapi/src/extractor/vmlinux/binary_utils.rs create mode 100644 tools/kapi/src/extractor/vmlinux/magic_finder.rs create mode 100644 tools/kapi/src/extractor/vmlinux/mod.rs create mode 100644 tools/kapi/src/formatter/json.rs create mode 100644 tools/kapi/src/formatter/mod.rs create mode 100644 tools/kapi/src/formatter/plain.rs create mode 100644 tools/kapi/src/formatter/rst.rs create mode 100644 tools/kapi/src/main.rs diff --git a/Documentation/dev-tools/kernel-api-spec.rst b/Documentation/de= v-tools/kernel-api-spec.rst index 3a63f6711e27b..9b452753111ad 100644 --- a/Documentation/dev-tools/kernel-api-spec.rst +++ b/Documentation/dev-tools/kernel-api-spec.rst @@ -31,7 +31,9 @@ The framework aims to: common programming errors during development and testing. =20 3. **Support Tooling**: Export API specifications in machine-readable form= ats for - use by static analyzers, documentation generators, and development tool= s. + use by static analyzers, documentation generators, and development tool= s. The + ``kapi`` tool (see `The kapi Tool`_) provides comprehensive extraction = and + formatting capabilities. =20 4. **Enhance Debugging**: Provide detailed API information at runtime thro= ugh debugfs for debugging and introspection. @@ -71,6 +73,13 @@ The framework consists of several key components: - Type-safe parameter specifications - Context and constraint definitions =20 +5. **kapi Tool** (``tools/kapi/``) + + - Userspace utility for extracting specifications + - Multiple input sources (source, binary, debugfs) + - Multiple output formats (plain, JSON, RST) + - Testing and validation utilities + Data Model ---------- =20 @@ -344,8 +353,177 @@ Documentation Generation ------------------------ =20 The framework exports specifications via debugfs that can be used -to generate documentation. Tools for automatic documentation generation -from specifications are planned for future development. +to generate documentation. The ``kapi`` tool provides comprehensive +extraction and formatting capabilities for kernel API specifications. + +The kapi Tool +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Overview +-------- + +The ``kapi`` tool is a userspace utility that extracts and displays kernel= API +specifications from multiple sources. It provides a unified interface to a= ccess +API documentation whether from compiled kernels, source code, or runtime s= ystems. + +Installation +------------ + +Build the tool from the kernel source tree:: + + $ cd tools/kapi + $ cargo build --release + + # Optional: Install system-wide + $ cargo install --path . + +The tool requires Rust and Cargo to build. The binary will be available at +``tools/kapi/target/release/kapi``. + +Command-Line Usage +------------------ + +Basic syntax:: + + kapi [OPTIONS] [API_NAME] + +Options: + +- ``--vmlinux ``: Extract from compiled kernel binary +- ``--source ``: Extract from kernel source code +- ``--debugfs ``: Extract from debugfs (default: /sys/kernel/debug) +- ``-f, --format ``: Output format (plain, json, rst) +- ``-h, --help``: Display help information +- ``-V, --version``: Display version information + +Input Modes +----------- + +**1. Source Code Mode** + +Extract specifications directly from kernel source:: + + # Scan entire kernel source tree + $ kapi --source /path/to/linux + + # Extract from specific file + $ kapi --source kernel/sched/core.c + + # Get details for specific API + $ kapi --source /path/to/linux sys_sched_yield + +**2. Vmlinux Mode** + +Extract from compiled kernel with debug symbols:: + + # List all APIs in vmlinux + $ kapi --vmlinux /boot/vmlinux-5.15.0 + + # Get specific syscall details + $ kapi --vmlinux ./vmlinux sys_read + +**3. Debugfs Mode** + +Extract from running kernel via debugfs:: + + # Use default debugfs path + $ kapi + + # Use custom debugfs mount + $ kapi --debugfs /mnt/debugfs + + # Get specific API from running kernel + $ kapi sys_write + +Output Formats +-------------- + +**Plain Text Format** (default):: + + $ kapi sys_read + + Detailed information for sys_read: + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + Description: Read from a file descriptor + + Detailed Description: + Reads up to count bytes from file descriptor fd into the buffer starti= ng at buf. + + Execution Context: + - KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + + Parameters (3): + + Available since: 1.0 + +**JSON Format**:: + + $ kapi --format json sys_read + { + "api_details": { + "name": "sys_read", + "description": "Read from a file descriptor", + "long_description": "Reads up to count bytes...", + "context_flags": ["KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE"], + "since_version": "1.0" + } + } + +**ReStructuredText Format**:: + + $ kapi --format rst sys_read + + sys_read + =3D=3D=3D=3D=3D=3D=3D=3D + + **Read from a file descriptor** + + Reads up to count bytes from file descriptor fd into the buffer... + +Usage Examples +-------------- + +**Generate complete API documentation**:: + + # Export all kernel APIs to JSON + $ kapi --source /path/to/linux --format json > kernel-apis.json + + # Generate RST documentation for all syscalls + $ kapi --vmlinux ./vmlinux --format rst > syscalls.rst + + # List APIs from specific subsystem + $ kapi --source drivers/gpu/drm/ + +**Integration with other tools**:: + + # Find all APIs that can sleep + $ kapi --format json | jq '.apis[] | select(.context_flags[] | contain= s("SLEEPABLE"))' + + # Generate markdown documentation + $ kapi --format rst sys_mmap | pandoc -f rst -t markdown + +**Debugging and analysis**:: + + # Compare API between kernel versions + $ diff <(kapi --vmlinux vmlinux-5.10) <(kapi --vmlinux vmlinux-5.15) + + # Check if specific API exists + $ kapi --source . my_custom_api || echo "API not found" + +Implementation Details +---------------------- + +The tool extracts API specifications from three sources: + +1. **Source Code**: Parses KAPI specification macros using regular express= ions +2. **Vmlinux**: Reads the ``.kapi_specs`` ELF section from compiled kernels +3. **Debugfs**: Reads from ``/sys/kernel/debug/kapi/`` filesystem interface + +The tool supports all KAPI specification types: + +- System calls (``DEFINE_KERNEL_API_SPEC``) +- IOCTLs (``DEFINE_IOCTL_API_SPEC``) +- Kernel functions (``KAPI_DEFINE_SPEC``) =20 IDE Integration --------------- @@ -357,6 +535,11 @@ Modern IDEs can use the JSON export for: - Context validation - Error code documentation =20 +Example IDE integration:: + + # Generate IDE completion data + $ kapi --format json > .vscode/kernel-apis.json + Testing Framework ----------------- =20 @@ -367,6 +550,15 @@ The framework includes test helpers:: kapi_test_api("kmalloc", test_cases); #endif =20 +The kapi tool can verify specifications against implementations:: + + # Run consistency tests + $ cd tools/kapi + $ ./test_consistency.sh + + # Compare source vs binary specifications + $ ./compare_all_syscalls.sh + Best Practices =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/tools/kapi/.gitignore b/tools/kapi/.gitignore new file mode 100644 index 0000000000000..1390bfc12686c --- /dev/null +++ b/tools/kapi/.gitignore @@ -0,0 +1,4 @@ +# Rust build artifacts +/target/ +**/*.rs.bk + diff --git a/tools/kapi/Cargo.toml b/tools/kapi/Cargo.toml new file mode 100644 index 0000000000000..4e6bcb10d132f --- /dev/null +++ b/tools/kapi/Cargo.toml @@ -0,0 +1,19 @@ +[package] +name =3D "kapi" +version =3D "0.1.0" +edition =3D "2024" +authors =3D ["Sasha Levin "] +description =3D "Tool for extracting and displaying kernel API specificati= ons" +license =3D "GPL-2.0" + +[dependencies] +goblin =3D "0.10" +clap =3D { version =3D "4.4", features =3D ["derive"] } +anyhow =3D "1.0" +serde =3D { version =3D "1.0", features =3D ["derive"] } +serde_json =3D "1.0" +regex =3D "1.10" +walkdir =3D "2.4" + +[dev-dependencies] +tempfile =3D "3.8" diff --git a/tools/kapi/src/extractor/debugfs.rs b/tools/kapi/src/extractor= /debugfs.rs new file mode 100644 index 0000000000000..698c51e50438f --- /dev/null +++ b/tools/kapi/src/extractor/debugfs.rs @@ -0,0 +1,442 @@ +use crate::formatter::OutputFormatter; +use anyhow::{Context, Result, bail}; +use serde::Deserialize; +use std::fs; +use std::io::Write; +use std::path::PathBuf; + +use super::{ApiExtractor, ApiSpec, CapabilitySpec, display_api_spec}; + +#[derive(Deserialize)] +struct KernelApiJson { + name: String, + api_type: Option, + version: Option, + description: Option, + long_description: Option, + context_flags: Option, + since_version: Option, + examples: Option, + notes: Option, + capabilities: Option>, +} + +#[derive(Deserialize)] +struct KernelCapabilityJson { + capability: i32, + name: String, + action: String, + allows: String, + without_cap: String, + check_condition: Option, + priority: Option, + alternatives: Option>, +} + +/// Extractor for kernel API specifications from debugfs +pub struct DebugfsExtractor { + debugfs_path: PathBuf, +} + +impl DebugfsExtractor { + /// Create a new debugfs extractor with the specified debugfs path + pub fn new(debugfs_path: Option) -> Result { + let path =3D match debugfs_path { + Some(p) =3D> PathBuf::from(p), + None =3D> PathBuf::from("/sys/kernel/debug"), + }; + + // Check if the debugfs path exists + if !path.exists() { + bail!("Debugfs path does not exist: {}", path.display()); + } + + // Check if kapi directory exists + let kapi_path =3D path.join("kapi"); + if !kapi_path.exists() { + bail!( + "Kernel API debugfs interface not found at: {}", + kapi_path.display() + ); + } + + Ok(Self { debugfs_path: path }) + } + + /// Parse the list file to get all available API names + fn parse_list_file(&self) -> Result> { + let list_path =3D self.debugfs_path.join("kapi/list"); + let content =3D fs::read_to_string(&list_path) + .with_context(|| format!("Failed to read {}", list_path.displa= y()))?; + + let mut apis =3D Vec::new(); + let mut in_list =3D false; + + for line in content.lines() { + if line.contains("=3D=3D=3D") { + in_list =3D true; + continue; + } + + if in_list && line.starts_with("Total:") { + break; + } + + if in_list && !line.trim().is_empty() { + // Extract API name from lines like "sys_read - Read from = a file descriptor" + if let Some(name) =3D line.split(" - ").next() { + apis.push(name.trim().to_string()); + } + } + } + + Ok(apis) + } + + /// Try to parse JSON content, convert context flags from u32 to strin= g representations + fn parse_context_flags(flags: u32) -> Vec { + let mut result =3D Vec::new(); + + // These values should match KAPI_CTX_* flags from kernel + if flags & (1 << 0) !=3D 0 { + result.push("PROCESS".to_string()); + } + if flags & (1 << 1) !=3D 0 { + result.push("SOFTIRQ".to_string()); + } + if flags & (1 << 2) !=3D 0 { + result.push("HARDIRQ".to_string()); + } + if flags & (1 << 3) !=3D 0 { + result.push("NMI".to_string()); + } + if flags & (1 << 4) !=3D 0 { + result.push("ATOMIC".to_string()); + } + if flags & (1 << 5) !=3D 0 { + result.push("SLEEPABLE".to_string()); + } + if flags & (1 << 6) !=3D 0 { + result.push("PREEMPT_DISABLED".to_string()); + } + if flags & (1 << 7) !=3D 0 { + result.push("IRQ_DISABLED".to_string()); + } + + result + } + + /// Convert capability action from kernel representation + fn parse_capability_action(action: &str) -> String { + match action { + "bypass_check" =3D> "Bypasses check".to_string(), + "increase_limit" =3D> "Increases limit".to_string(), + "override_restriction" =3D> "Overrides restriction".to_string(= ), + "grant_permission" =3D> "Grants permission".to_string(), + "modify_behavior" =3D> "Modifies behavior".to_string(), + "access_resource" =3D> "Allows resource access".to_string(), + "perform_operation" =3D> "Allows operation".to_string(), + _ =3D> action.to_string(), + } + } + + /// Try to parse as JSON first + fn try_parse_json(&self, content: &str) -> Option { + let json_data: KernelApiJson =3D serde_json::from_str(content).ok(= )?; + + let mut spec =3D ApiSpec { + name: json_data.name, + api_type: json_data.api_type.unwrap_or_else(|| "unknown".to_st= ring()), + description: json_data.description, + long_description: json_data.long_description, + version: json_data.version.map(|v| v.to_string()), + context_flags: json_data + .context_flags + .map_or_else(Vec::new, Self::parse_context_flags), + param_count: None, + error_count: None, + examples: json_data.examples, + notes: json_data.notes, + since_version: json_data.since_version, + subsystem: None, // Not in current JSON format + sysfs_path: None, // Not in current JSON format + permissions: None, // Not in current JSON format + socket_state: None, + protocol_behaviors: vec![], + addr_families: vec![], + buffer_spec: None, + async_spec: None, + net_data_transfer: None, + capabilities: vec![], + parameters: vec![], + return_spec: None, + errors: vec![], + signals: vec![], + signal_masks: vec![], + side_effects: vec![], + state_transitions: vec![], + constraints: vec![], + locks: vec![], + struct_specs: vec![], + }; + + // Convert capabilities + if let Some(caps) =3D json_data.capabilities { + for cap in caps { + spec.capabilities.push(CapabilitySpec { + capability: cap.capability, + name: cap.name, + action: Self::parse_capability_action(&cap.action), + allows: cap.allows, + without_cap: cap.without_cap, + check_condition: cap.check_condition, + priority: cap.priority, + alternatives: cap.alternatives.unwrap_or_default(), + }); + } + } + + Some(spec) + } + + /// Parse a single API specification file + fn parse_spec_file(&self, api_name: &str) -> Result { + let spec_path =3D self.debugfs_path.join(format!("kapi/specs/{}", = api_name)); + let content =3D fs::read_to_string(&spec_path) + .with_context(|| format!("Failed to read {}", spec_path.displa= y()))?; + + // Try JSON parsing first + if let Some(spec) =3D self.try_parse_json(&content) { + return Ok(spec); + } + + // Fall back to plain text parsing + let mut spec =3D ApiSpec { + name: api_name.to_string(), + api_type: "unknown".to_string(), + description: None, + long_description: None, + version: None, + context_flags: Vec::new(), + param_count: None, + error_count: None, + examples: None, + notes: None, + since_version: None, + subsystem: None, + sysfs_path: None, + permissions: None, + socket_state: None, + protocol_behaviors: vec![], + addr_families: vec![], + buffer_spec: None, + async_spec: None, + net_data_transfer: None, + capabilities: vec![], + parameters: vec![], + return_spec: None, + errors: vec![], + signals: vec![], + signal_masks: vec![], + side_effects: vec![], + state_transitions: vec![], + constraints: vec![], + locks: vec![], + struct_specs: vec![], + }; + + // Parse the content + let mut collecting_multiline =3D false; + let mut multiline_buffer =3D String::new(); + let mut multiline_field =3D ""; + let mut parsing_capability =3D false; + let mut current_capability: Option =3D None; + + for line in content.lines() { + // Handle capability sections + if line.starts_with("Capabilities (") { + continue; // Skip the header + } + if line.starts_with(" ") && line.contains(" (") && line.ends_= with("):") { + // Start of a capability entry like " CAP_IPC_LOCK (14):" + if let Some(cap) =3D current_capability.take() { + spec.capabilities.push(cap); + } + + let parts: Vec<&str> =3D line.trim().split(" (").collect(); + if parts.len() =3D=3D 2 { + let cap_name =3D parts[0].to_string(); + let cap_id =3D parts[1].trim_end_matches("):").parse()= .unwrap_or(0); + current_capability =3D Some(CapabilitySpec { + capability: cap_id, + name: cap_name, + action: String::new(), + allows: String::new(), + without_cap: String::new(), + check_condition: None, + priority: None, + alternatives: Vec::new(), + }); + parsing_capability =3D true; + } + continue; + } + if parsing_capability && line.starts_with(" ") { + // Parse capability fields + if let Some(ref mut cap) =3D current_capability { + if let Some(action) =3D line.strip_prefix(" Action:= ") { + cap.action =3D action.to_string(); + } else if let Some(allows) =3D line.strip_prefix(" = Allows: ") { + cap.allows =3D allows.to_string(); + } else if let Some(without) =3D line.strip_prefix(" = Without: ") { + cap.without_cap =3D without.to_string(); + } else if let Some(cond) =3D line.strip_prefix(" Co= ndition: ") { + cap.check_condition =3D Some(cond.to_string()); + } else if let Some(prio) =3D line.strip_prefix(" Pr= iority: ") { + cap.priority =3D prio.parse().ok(); + } else if let Some(alts) =3D line.strip_prefix(" Al= ternatives: ") { + cap.alternatives =3D + alts.split(", ").filter_map(|s| s.parse().ok()= ).collect(); + } + } + continue; + } + if parsing_capability && !line.starts_with(" ") { + // End of capabilities section + if let Some(cap) =3D current_capability.take() { + spec.capabilities.push(cap); + } + parsing_capability =3D false; + } + + // Handle section headers + if line.starts_with("Parameters (") { + if let Some(count_str) =3D line + .strip_prefix("Parameters (") + .and_then(|s| s.strip_suffix("):")) + { + spec.param_count =3D count_str.parse().ok(); + } + continue; + } else if line.starts_with("Errors (") { + if let Some(count_str) =3D line + .strip_prefix("Errors (") + .and_then(|s| s.strip_suffix("):")) + { + spec.error_count =3D count_str.parse().ok(); + } + continue; + } else if line.starts_with("Examples:") { + collecting_multiline =3D true; + multiline_field =3D "examples"; + multiline_buffer.clear(); + continue; + } else if line.starts_with("Notes:") { + collecting_multiline =3D true; + multiline_field =3D "notes"; + multiline_buffer.clear(); + continue; + } + + // Handle multiline sections + if collecting_multiline { + if line.trim().is_empty() && multiline_buffer.ends_with("\= n\n") { + collecting_multiline =3D false; + match multiline_field { + "examples" =3D> spec.examples =3D Some(multiline_b= uffer.trim().to_string()), + "notes" =3D> spec.notes =3D Some(multiline_buffer.= trim().to_string()), + _ =3D> {} + } + multiline_buffer.clear(); + } else { + if !multiline_buffer.is_empty() { + multiline_buffer.push('\n'); + } + multiline_buffer.push_str(line); + } + continue; + } + + // Parse regular fields + if let Some(desc) =3D line.strip_prefix("Description: ") { + spec.description =3D Some(desc.to_string()); + } else if let Some(long_desc) =3D line.strip_prefix("Long desc= ription: ") { + spec.long_description =3D Some(long_desc.to_string()); + } else if let Some(version) =3D line.strip_prefix("Version: ")= { + spec.version =3D Some(version.to_string()); + } else if let Some(since) =3D line.strip_prefix("Since: ") { + spec.since_version =3D Some(since.to_string()); + } else if let Some(flags) =3D line.strip_prefix("Context flags= : ") { + spec.context_flags =3D flags.split_whitespace().map(str::t= o_string).collect(); + } else if let Some(subsys) =3D line.strip_prefix("Subsystem: "= ) { + spec.subsystem =3D Some(subsys.to_string()); + } else if let Some(path) =3D line.strip_prefix("Sysfs Path: ")= { + spec.sysfs_path =3D Some(path.to_string()); + } else if let Some(perms) =3D line.strip_prefix("Permissions: = ") { + spec.permissions =3D Some(perms.to_string()); + } + } + + // Handle any remaining capability + if let Some(cap) =3D current_capability.take() { + spec.capabilities.push(cap); + } + + // Determine API type based on name + if api_name.starts_with("sys_") { + spec.api_type =3D "syscall".to_string(); + } else if api_name.contains("_ioctl") || api_name.starts_with("ioc= tl_") { + spec.api_type =3D "ioctl".to_string(); + } else if api_name.contains("sysfs") + || api_name.ends_with("_show") + || api_name.ends_with("_store") + { + spec.api_type =3D "sysfs".to_string(); + } else { + spec.api_type =3D "function".to_string(); + } + + Ok(spec) + } +} + +impl ApiExtractor for DebugfsExtractor { + fn extract_all(&self) -> Result> { + let api_names =3D self.parse_list_file()?; + let mut specs =3D Vec::new(); + + for name in api_names { + match self.parse_spec_file(&name) { + Ok(spec) =3D> specs.push(spec), + Err(_e) =3D> {} // Silently skip files that fail to parse + } + } + + Ok(specs) + } + + fn extract_by_name(&self, name: &str) -> Result> { + let api_names =3D self.parse_list_file()?; + + if api_names.contains(&name.to_string()) { + Ok(Some(self.parse_spec_file(name)?)) + } else { + Ok(None) + } + } + + fn display_api_details( + &self, + api_name: &str, + formatter: &mut dyn OutputFormatter, + writer: &mut dyn Write, + ) -> Result<()> { + if let Some(spec) =3D self.extract_by_name(api_name)? { + display_api_spec(&spec, formatter, writer)?; + } else { + writeln!(writer, "API '{api_name}' not found in debugfs")?; + } + + Ok(()) + } +} diff --git a/tools/kapi/src/extractor/kerneldoc_parser.rs b/tools/kapi/src/= extractor/kerneldoc_parser.rs new file mode 100644 index 0000000000000..1c6924a0b5291 --- /dev/null +++ b/tools/kapi/src/extractor/kerneldoc_parser.rs @@ -0,0 +1,692 @@ +use super::{ + ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, LockSpec, ParamSpe= c, + ReturnSpec, SideEffectSpec, SignalSpec, StateTransitionSpec, StructSpe= c, + StructFieldSpec, +}; +use anyhow::Result; +use std::collections::HashMap; + +/// Real kerneldoc parser that extracts KAPI annotations +pub struct KerneldocParserImpl; + +impl KerneldocParserImpl { + pub fn new() -> Self { + KerneldocParserImpl + } + + pub fn parse_kerneldoc( + &self, + doc: &str, + name: &str, + api_type: &str, + _signature: Option<&str>, + ) -> Result { + let mut spec =3D ApiSpec { + name: name.to_string(), + api_type: api_type.to_string(), + description: None, + long_description: None, + version: None, + context_flags: vec![], + param_count: None, + error_count: None, + examples: None, + notes: None, + since_version: None, + subsystem: None, + sysfs_path: None, + permissions: None, + socket_state: None, + protocol_behaviors: vec![], + addr_families: vec![], + buffer_spec: None, + async_spec: None, + net_data_transfer: None, + capabilities: vec![], + parameters: vec![], + return_spec: None, + errors: vec![], + signals: vec![], + signal_masks: vec![], + side_effects: vec![], + state_transitions: vec![], + constraints: vec![], + locks: vec![], + struct_specs: vec![], + }; + + // Parse line by line + let lines: Vec<&str> =3D doc.lines().collect(); + let mut i =3D 0; + + // Extract main description from function name line + if let Some(first_line) =3D lines.first() { + if let Some((_, desc)) =3D first_line.split_once(" - ") { + spec.description =3D Some(desc.trim().to_string()); + } + } + + // Keep track of parameters we've seen + let mut param_map: HashMap =3D HashMap::new(); + let mut struct_fields: Vec =3D Vec::new(); + let mut current_lock: Option =3D None; + let mut current_signal: Option =3D None; + let mut current_capability: Option =3D None; + + while i < lines.len() { + let line =3D lines[i].trim(); + + // Skip empty lines + if line.is_empty() { + i +=3D 1; + continue; + } + + // Parse @param lines + if let Some(rest) =3D line.strip_prefix("@") { + if let Some((param_name, desc)) =3D rest.split_once(':') { + let param_name =3D param_name.trim(); + let desc =3D desc.trim(); + if !param_name.contains('-') { + // This is a basic parameter description - add to = map + param_map.insert(param_name.to_string(), ParamSpec= { + index: param_map.len() as u32, + name: param_name.to_string(), + type_name: String::new(), + description: desc.to_string(), + flags: 0, + param_type: 0, + constraint_type: 0, + constraint: None, + min_value: None, + max_value: None, + valid_mask: None, + enum_values: vec![], + size: None, + alignment: None, + }); + } + } + } + // Parse long-desc + else if let Some(rest) =3D line.strip_prefix("long-desc:") { + spec.long_description =3D Some(self.collect_multiline_valu= e(&lines, i, rest)); + } + // Parse context-flags + else if let Some(rest) =3D line.strip_prefix("context-flags:")= { + spec.context_flags =3D self.parse_context_flags(rest.trim(= )); + } + // Parse param-count + else if let Some(rest) =3D line.strip_prefix("param-count:") { + spec.param_count =3D rest.trim().parse().ok(); + } + // Parse param-type + else if let Some(rest) =3D line.strip_prefix("param-type:") { + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 2 { + if let Some(param) =3D param_map.get_mut(parts[0]) { + param.param_type =3D self.parse_param_type(parts[1= ]); + } + } + } + // Parse param-flags + else if let Some(rest) =3D line.strip_prefix("param-flags:") { + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 2 { + if let Some(param) =3D param_map.get_mut(parts[0]) { + param.flags =3D self.parse_param_flags(parts[1]); + } + } + } + // Parse param-range + else if let Some(rest) =3D line.strip_prefix("param-range:") { + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 3 { + if let Some(param) =3D param_map.get_mut(parts[0]) { + param.min_value =3D parts[1].parse().ok(); + param.max_value =3D parts[2].parse().ok(); + param.constraint_type =3D 1; // KAPI_CONSTRAINT_RA= NGE + } + } + } + // Parse param-constraint + else if let Some(rest) =3D line.strip_prefix("param-constraint= :") { + let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri= m()).collect(); + if parts.len() >=3D 2 { + if let Some(param) =3D param_map.get_mut(parts[0]) { + param.constraint =3D Some(parts[1].to_string()); + } + } + } + // Parse error + else if let Some(rest) =3D line.strip_prefix("error:") { + // Parse error in format: "ERROR_CODE, description" + let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri= m()).collect(); + if parts.len() >=3D 2 { + let error_name =3D parts[0].to_string(); + let description =3D parts[1].to_string(); + + // Look for desc: line on the next line + let mut full_description =3D description; + if i + 1 < lines.len() { + if let Some(desc_line) =3D lines[i + 1].strip_pref= ix("* desc:") { + full_description =3D desc_line.trim().to_strin= g(); + } else if let Some(desc_line) =3D lines[i + 1].str= ip_prefix("* desc:") { + full_description =3D desc_line.trim().to_strin= g(); + } + } + + // Map common error names to codes + let error_code =3D match error_name.as_str() { + "E2BIG" =3D> -7, + "EACCES" =3D> -13, + "EAGAIN" =3D> -11, + "EBADF" =3D> -9, + "EBUSY" =3D> -16, + "EFAULT" =3D> -14, + "EINTR" =3D> -4, + "EINVAL" =3D> -22, + "EIO" =3D> -5, + "EISDIR" =3D> -21, + "ELIBBAD" =3D> -80, + "ELOOP" =3D> -40, + "EMFILE" =3D> -24, + "ENAMETOOLONG" =3D> -36, + "ENFILE" =3D> -23, + "ENOENT" =3D> -2, + "ENOEXEC" =3D> -8, + "ENOMEM" =3D> -12, + "ENOTDIR" =3D> -20, + "EOPNOTSUPP" =3D> -95, + "EPERM" =3D> -1, + "ESRCH" =3D> -3, + "ETXTBSY" =3D> -26, + _ =3D> 0, + }; + + spec.errors.push(ErrorSpec { + error_code, + name: error_name, + condition: String::new(), + description: full_description, + }); + } + } + // Parse lock + else if let Some(rest) =3D line.strip_prefix("lock:") { + // Save previous lock if any + if let Some(lock) =3D current_lock.take() { + spec.locks.push(lock); + } + + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 2 { + current_lock =3D Some(LockSpec { + lock_name: parts[0].to_string(), + lock_type: self.parse_lock_type(parts[1]), + scope: super::KAPI_LOCK_INTERNAL, // default: acqu= ired and released + description: String::new(), + }); + } + } + // Parse lock scope + else if let Some(rest) =3D line.strip_prefix("lock-scope:") { + if let Some(lock) =3D current_lock.as_mut() { + lock.scope =3D match rest.trim() { + "internal" =3D> super::KAPI_LOCK_INTERNAL, + "acquires" =3D> super::KAPI_LOCK_ACQUIRES, + "releases" =3D> super::KAPI_LOCK_RELEASES, + "caller_held" =3D> super::KAPI_LOCK_CALLER_HELD, + _ =3D> super::KAPI_LOCK_INTERNAL, + }; + } + } + else if let Some(rest) =3D line.strip_prefix("lock-desc:") { + if let Some(lock) =3D current_lock.as_mut() { + lock.description =3D self.collect_multiline_value(&lin= es, i, rest); + } + } + // Parse signal + else if let Some(rest) =3D line.strip_prefix("signal:") { + // Save previous signal if any + if let Some(signal) =3D current_signal.take() { + spec.signals.push(signal); + } + + let signal_name =3D rest.trim().to_string(); + current_signal =3D Some(SignalSpec { + signal_num: 0, + signal_name, + direction: 1, + action: 0, + target: None, + condition: None, + description: None, + restartable: false, + timing: 0, + priority: 0, + interruptible: false, + queue: None, + sa_flags: 0, + sa_flags_required: 0, + sa_flags_forbidden: 0, + state_required: 0, + state_forbidden: 0, + error_on_signal: None, + }); + } + // Parse signal attributes + else if let Some(rest) =3D line.strip_prefix("signal-direction= :") { + if let Some(signal) =3D current_signal.as_mut() { + signal.direction =3D self.parse_signal_direction(rest.= trim()); + } + } + else if let Some(rest) =3D line.strip_prefix("signal-action:")= { + if let Some(signal) =3D current_signal.as_mut() { + signal.action =3D self.parse_signal_action(rest.trim()= ); + } + } + else if let Some(rest) =3D line.strip_prefix("signal-condition= :") { + if let Some(signal) =3D current_signal.as_mut() { + signal.condition =3D Some(self.collect_multiline_value= (&lines, i, rest)); + } + } + else if let Some(rest) =3D line.strip_prefix("signal-desc:") { + if let Some(signal) =3D current_signal.as_mut() { + signal.description =3D Some(self.collect_multiline_val= ue(&lines, i, rest)); + } + } + else if let Some(rest) =3D line.strip_prefix("signal-timing:")= { + if let Some(signal) =3D current_signal.as_mut() { + signal.timing =3D self.parse_signal_timing(rest.trim()= ); + } + } + else if let Some(rest) =3D line.strip_prefix("signal-priority:= ") { + if let Some(signal) =3D current_signal.as_mut() { + signal.priority =3D rest.trim().parse().unwrap_or(0); + } + } + else if line.strip_prefix("signal-interruptible:").is_some() { + if let Some(signal) =3D current_signal.as_mut() { + signal.interruptible =3D true; + } + } + else if let Some(rest) =3D line.strip_prefix("signal-state-req= :") { + if let Some(signal) =3D current_signal.as_mut() { + signal.state_required =3D self.parse_signal_state(rest= .trim()); + } + } + // Parse side-effect + else if let Some(rest) =3D line.strip_prefix("side-effect:") { + let full_effect =3D self.collect_multiline_value(&lines, i= , rest); + let parts: Vec<&str> =3D full_effect.splitn(3, ',').map(|s= | s.trim()).collect(); + if parts.len() >=3D 3 { + let mut effect =3D SideEffectSpec { + effect_type: self.parse_effect_type(parts[0]), + target: parts[1].to_string(), + condition: None, + description: parts[2].to_string(), + reversible: false, + }; + + // Check for additional attributes + if let Some(pos) =3D parts[2].find("condition=3D") { + let cond_str =3D &parts[2][pos + 10..]; + if let Some(end) =3D cond_str.find(',') { + effect.condition =3D Some(cond_str[..end].to_s= tring()); + } else { + effect.condition =3D Some(cond_str.to_string()= ); + } + } + + if parts[2].contains("reversible=3Dyes") { + effect.reversible =3D true; + } + + spec.side_effects.push(effect); + } + } + // Parse state-trans + else if let Some(rest) =3D line.strip_prefix("state-trans:") { + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 4 { + spec.state_transitions.push(StateTransitionSpec { + object: parts[0].to_string(), + from_state: parts[1].to_string(), + to_state: parts[2].to_string(), + condition: None, + description: parts[3].to_string(), + }); + } + } + // Parse capability + else if let Some(rest) =3D line.strip_prefix("capability:") { + // Save previous capability if any + if let Some(cap) =3D current_capability.take() { + spec.capabilities.push(cap); + } + + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 3 { + current_capability =3D Some(CapabilitySpec { + capability: self.parse_capability_value(parts[0]), + action: parts[1].to_string(), + name: parts[2].to_string(), + allows: String::new(), + without_cap: String::new(), + check_condition: None, + priority: Some(0), + alternatives: vec![], + }); + } + } + // Parse capability attributes + else if let Some(rest) =3D line.strip_prefix("capability-allow= s:") { + if let Some(cap) =3D current_capability.as_mut() { + cap.allows =3D self.collect_multiline_value(&lines, i,= rest); + } + } + else if let Some(rest) =3D line.strip_prefix("capability-witho= ut:") { + if let Some(cap) =3D current_capability.as_mut() { + cap.without_cap =3D self.collect_multiline_value(&line= s, i, rest); + } + } + else if let Some(rest) =3D line.strip_prefix("capability-condi= tion:") { + if let Some(cap) =3D current_capability.as_mut() { + cap.check_condition =3D Some(self.collect_multiline_va= lue(&lines, i, rest)); + } + } + else if let Some(rest) =3D line.strip_prefix("capability-prior= ity:") { + if let Some(cap) =3D current_capability.as_mut() { + cap.priority =3D rest.trim().parse().ok(); + } + } + // Parse constraint + else if let Some(rest) =3D line.strip_prefix("constraint:") { + let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri= m()).collect(); + if parts.len() >=3D 2 { + spec.constraints.push(ConstraintSpec { + name: parts[0].to_string(), + description: parts[1].to_string(), + expression: None, + }); + } + } + // Parse constraint-expr + else if let Some(rest) =3D line.strip_prefix("constraint-expr:= ") { + let parts: Vec<&str> =3D rest.splitn(2, ',').map(|s| s.tri= m()).collect(); + if parts.len() >=3D 2 { + // Find matching constraint and update it + if let Some(constraint) =3D spec.constraints.iter_mut(= ).find(|c| c.name =3D=3D parts[0]) { + constraint.expression =3D Some(parts[1].to_string(= )); + } + } + } + // Parse struct-field + else if let Some(rest) =3D line.strip_prefix("struct-field:") { + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 3 { + struct_fields.push(StructFieldSpec { + name: parts[0].to_string(), + field_type: self.parse_field_type(parts[1]), + type_name: parts[1].to_string(), + offset: 0, + size: 0, + flags: 0, + constraint_type: 0, + min_value: 0, + max_value: 0, + valid_mask: 0, + description: parts[2].to_string(), + }); + } + } + // Parse struct-field-range + else if let Some(rest) =3D line.strip_prefix("struct-field-ran= ge:") { + let parts: Vec<&str> =3D rest.split(',').map(|s| s.trim())= .collect(); + if parts.len() >=3D 3 { + // Update the field with range + if let Some(field) =3D struct_fields.iter_mut().find(|= f| f.name =3D=3D parts[0]) { + field.min_value =3D parts[1].parse().unwrap_or(0); + field.max_value =3D parts[2].parse().unwrap_or(0); + field.constraint_type =3D 1; // KAPI_CONSTRAINT_RA= NGE + } + } + } + // Parse examples + else if let Some(rest) =3D line.strip_prefix("examples:") { + spec.examples =3D Some(self.collect_multiline_value(&lines= , i, rest)); + } + // Parse notes + else if let Some(rest) =3D line.strip_prefix("notes:") { + spec.notes =3D Some(self.collect_multiline_value(&lines, i= , rest)); + } + // Parse since-version + else if let Some(rest) =3D line.strip_prefix("since-version:")= { + spec.since_version =3D Some(rest.trim().to_string()); + } + // Parse return-type + else if let Some(rest) =3D line.strip_prefix("return-type:") { + if spec.return_spec.is_none() { + spec.return_spec =3D Some(ReturnSpec { + type_name: rest.trim().to_string(), + description: String::new(), + return_type: self.parse_param_type(rest.trim()), + check_type: 0, + success_value: None, + success_min: None, + success_max: None, + error_values: vec![], + }); + } + } + // Parse return-check-type + else if let Some(rest) =3D line.strip_prefix("return-check-typ= e:") { + if let Some(ret) =3D spec.return_spec.as_mut() { + ret.check_type =3D self.parse_return_check_type(rest.t= rim()); + } + } + // Parse return-success + else if let Some(rest) =3D line.strip_prefix("return-success:"= ) { + if let Some(ret) =3D spec.return_spec.as_mut() { + ret.success_value =3D rest.trim().parse().ok(); + } + } + + i +=3D 1; + } + + // Save any remaining items + if let Some(lock) =3D current_lock { + spec.locks.push(lock); + } + if let Some(signal) =3D current_signal { + spec.signals.push(signal); + } + if let Some(cap) =3D current_capability { + spec.capabilities.push(cap); + } + + // Convert param_map to vec preserving order + let mut params: Vec =3D param_map.into_values().collect= (); + params.sort_by_key(|p| p.index); + spec.parameters =3D params; + + // Create struct spec if we have fields + if !struct_fields.is_empty() { + spec.struct_specs.push(StructSpec { + name: "struct sched_attr".to_string(), + size: 120, // Default for sched_attr + alignment: 8, + field_count: struct_fields.len() as u32, + fields: struct_fields, + description: "Structure specification".to_string(), + }); + } + + Ok(spec) + } + + fn collect_multiline_value(&self, lines: &[&str], start_idx: usize, fi= rst_part: &str) -> String { + let mut result =3D String::from(first_part.trim()); + let mut i =3D start_idx + 1; + + // Continue collecting lines until we hit another annotation or end + while i < lines.len() { + let line =3D lines[i]; + + // Stop if we hit another annotation (contains ':' and starts = with valid keyword) + if self.is_annotation_line(line) { + break; + } + + // Add continuation lines + if !line.trim().is_empty() && line.starts_with(" ") { + if !result.is_empty() { + result.push(' '); + } + result.push_str(line.trim()); + } else if line.trim().is_empty() { + // Empty line might be part of multiline + i +=3D 1; + continue; + } else { + // Non-continuation line, stop + break; + } + + i +=3D 1; + } + + result + } + + fn is_annotation_line(&self, line: &str) -> bool { + let annotations =3D [ + "param-", "error-", "lock", "signal", "side-effect:", + "state-trans:", "capability", "constraint", "struct-", + "return-", "examples:", "notes:", "since-", "context-", + "long-desc:" + ]; + + for ann in &annotations { + if line.trim_start().starts_with(ann) { + return true; + } + } + false + } + + fn parse_context_flags(&self, flags: &str) -> Vec { + flags.split('|') + .map(|f| f.trim().to_string()) + .collect() + } + + fn parse_param_type(&self, type_str: &str) -> u32 { + match type_str { + "KAPI_TYPE_INT" =3D> 1, + "KAPI_TYPE_UINT" =3D> 2, + "KAPI_TYPE_LONG" =3D> 3, + "KAPI_TYPE_ULONG" =3D> 4, + "KAPI_TYPE_STRING" =3D> 5, + "KAPI_TYPE_USER_PTR" =3D> 6, + _ =3D> 0, + } + } + + fn parse_field_type(&self, type_str: &str) -> u32 { + match type_str { + "__s32" | "int" =3D> 1, + "__u32" | "unsigned int" =3D> 2, + "__s64" | "long" =3D> 3, + "__u64" | "unsigned long" =3D> 4, + _ =3D> 0, + } + } + + fn parse_param_flags(&self, flags: &str) -> u32 { + let mut result =3D 0; + for flag in flags.split('|') { + match flag.trim() { + "KAPI_PARAM_IN" =3D> result |=3D 1, + "KAPI_PARAM_OUT" =3D> result |=3D 2, + "KAPI_PARAM_INOUT" =3D> result |=3D 3, + "KAPI_PARAM_USER" =3D> result |=3D 4, + _ =3D> {} + } + } + result + } + + fn parse_lock_type(&self, type_str: &str) -> u32 { + match type_str { + "KAPI_LOCK_SPINLOCK" =3D> 0, + "KAPI_LOCK_MUTEX" =3D> 1, + "KAPI_LOCK_RWLOCK" =3D> 2, + _ =3D> 3, + } + } + + fn parse_signal_direction(&self, dir: &str) -> u32 { + match dir { + "KAPI_SIGNAL_SEND" =3D> 1, + "KAPI_SIGNAL_RECEIVE" =3D> 2, + _ =3D> 0, + } + } + + fn parse_signal_action(&self, action: &str) -> u32 { + match action { + "KAPI_SIGNAL_ACTION_DEFAULT" =3D> 0, + "KAPI_SIGNAL_ACTION_IGNORE" =3D> 1, + "KAPI_SIGNAL_ACTION_CUSTOM" =3D> 2, + _ =3D> 0, + } + } + + fn parse_signal_timing(&self, timing: &str) -> u32 { + match timing { + "KAPI_SIGNAL_TIME_BEFORE" =3D> 0, + "KAPI_SIGNAL_TIME_DURING" =3D> 1, + "KAPI_SIGNAL_TIME_AFTER" =3D> 2, + _ =3D> 0, + } + } + + fn parse_signal_state(&self, state: &str) -> u32 { + match state { + "KAPI_SIGNAL_STATE_RUNNING" =3D> 1, + "KAPI_SIGNAL_STATE_SLEEPING" =3D> 2, + _ =3D> 0, + } + } + + fn parse_effect_type(&self, type_str: &str) -> u32 { + let mut result =3D 0; + for flag in type_str.split('|') { + match flag.trim() { + "KAPI_EFFECT_MODIFY_STATE" =3D> result |=3D 1, + "KAPI_EFFECT_PROCESS_STATE" =3D> result |=3D 2, + "KAPI_EFFECT_SCHEDULE" =3D> result |=3D 4, + _ =3D> {} + } + } + result + } + + fn parse_capability_value(&self, cap: &str) -> i32 { + match cap { + "CAP_SYS_NICE" =3D> 23, + _ =3D> 0, + } + } + + fn parse_return_check_type(&self, check: &str) -> u32 { + match check { + "KAPI_RETURN_ERROR_CHECK" =3D> 1, + "KAPI_RETURN_SUCCESS_CHECK" =3D> 2, + _ =3D> 0, + } + } +} \ No newline at end of file diff --git a/tools/kapi/src/extractor/mod.rs b/tools/kapi/src/extractor/mod= .rs new file mode 100644 index 0000000000000..4eeb03b9a4ca3 --- /dev/null +++ b/tools/kapi/src/extractor/mod.rs @@ -0,0 +1,464 @@ +use crate::formatter::OutputFormatter; +use anyhow::Result; +use std::convert::TryInto; +use std::io::Write; + +pub mod debugfs; +pub mod kerneldoc_parser; +pub mod source_parser; +pub mod vmlinux; + +pub use debugfs::DebugfsExtractor; +pub use source_parser::SourceExtractor; +pub use vmlinux::VmlinuxExtractor; + +/// Socket state specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct SocketStateSpec { + pub required_states: Vec, + pub forbidden_states: Vec, + pub resulting_state: Option, + pub condition: Option, + pub applicable_protocols: Option, +} + +/// Protocol behavior specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct ProtocolBehaviorSpec { + pub applicable_protocols: String, + pub behavior: String, + pub protocol_flags: Option, + pub flag_description: Option, +} + +/// Address family specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct AddrFamilySpec { + pub family: i32, + pub family_name: String, + pub addr_struct_size: usize, + pub min_addr_len: usize, + pub max_addr_len: usize, + pub addr_format: Option, + pub supports_wildcard: bool, + pub supports_multicast: bool, + pub supports_broadcast: bool, + pub special_addresses: Option, + pub port_range_min: u32, + pub port_range_max: u32, +} + +/// Buffer specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct BufferSpec { + pub buffer_behaviors: Option, + pub min_buffer_size: Option, + pub max_buffer_size: Option, + pub optimal_buffer_size: Option, +} + +/// Async specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct AsyncSpec { + pub supported_modes: Option, + pub nonblock_errno: Option, +} + +/// Capability specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct CapabilitySpec { + pub capability: i32, + pub name: String, + pub action: String, + pub allows: String, + pub without_cap: String, + pub check_condition: Option, + pub priority: Option, + pub alternatives: Vec, +} + +/// Parameter specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct ParamSpec { + pub index: u32, + pub name: String, + pub type_name: String, + pub description: String, + pub flags: u32, + pub param_type: u32, + pub constraint_type: u32, + pub constraint: Option, + pub min_value: Option, + pub max_value: Option, + pub valid_mask: Option, + pub enum_values: Vec, + pub size: Option, + pub alignment: Option, +} + +/// Return value specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct ReturnSpec { + pub type_name: String, + pub description: String, + pub return_type: u32, + pub check_type: u32, + pub success_value: Option, + pub success_min: Option, + pub success_max: Option, + pub error_values: Vec, +} + +/// Error specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct ErrorSpec { + pub error_code: i32, + pub name: String, + pub condition: String, + pub description: String, +} + +/// Signal specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct SignalSpec { + pub signal_num: i32, + pub signal_name: String, + pub direction: u32, + pub action: u32, + pub target: Option, + pub condition: Option, + pub description: Option, + pub timing: u32, + pub priority: u32, + pub restartable: bool, + pub interruptible: bool, + pub queue: Option, + pub sa_flags: u32, + pub sa_flags_required: u32, + pub sa_flags_forbidden: u32, + pub state_required: u32, + pub state_forbidden: u32, + pub error_on_signal: Option, +} + +/// Signal mask specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct SignalMaskSpec { + pub name: String, + pub description: String, +} + +/// Side effect specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct SideEffectSpec { + pub effect_type: u32, + pub target: String, + pub condition: Option, + pub description: String, + pub reversible: bool, +} + +/// State transition specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct StateTransitionSpec { + pub object: String, + pub from_state: String, + pub to_state: String, + pub condition: Option, + pub description: String, +} + +/// Constraint specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct ConstraintSpec { + pub name: String, + pub description: String, + pub expression: Option, +} + +/// Lock scope enum values matching kernel enum kapi_lock_scope +pub const KAPI_LOCK_INTERNAL: u32 =3D 0; +pub const KAPI_LOCK_ACQUIRES: u32 =3D 1; +pub const KAPI_LOCK_RELEASES: u32 =3D 2; +pub const KAPI_LOCK_CALLER_HELD: u32 =3D 3; + +/// Lock specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct LockSpec { + pub lock_name: String, + pub lock_type: u32, + pub scope: u32, + pub description: String, +} + +/// Struct field specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct StructFieldSpec { + pub name: String, + pub field_type: u32, + pub type_name: String, + pub offset: usize, + pub size: usize, + pub flags: u32, + pub constraint_type: u32, + pub min_value: i64, + pub max_value: i64, + pub valid_mask: u64, + pub description: String, +} + +/// Struct specification +#[derive(Debug, Clone, serde::Serialize)] +pub struct StructSpec { + pub name: String, + pub size: usize, + pub alignment: usize, + pub field_count: u32, + pub fields: Vec, + pub description: String, +} + +/// Common API specification information that all extractors should provide +#[derive(Debug, Clone)] +pub struct ApiSpec { + pub name: String, + pub api_type: String, + pub description: Option, + pub long_description: Option, + pub version: Option, + pub context_flags: Vec, + pub param_count: Option, + pub error_count: Option, + pub examples: Option, + pub notes: Option, + pub since_version: Option, + // Sysfs-specific fields + pub subsystem: Option, + pub sysfs_path: Option, + pub permissions: Option, + // Networking-specific fields + pub socket_state: Option, + pub protocol_behaviors: Vec, + pub addr_families: Vec, + pub buffer_spec: Option, + pub async_spec: Option, + pub net_data_transfer: Option, + pub capabilities: Vec, + pub parameters: Vec, + pub return_spec: Option, + pub errors: Vec, + pub signals: Vec, + pub signal_masks: Vec, + pub side_effects: Vec, + pub state_transitions: Vec, + pub constraints: Vec, + pub locks: Vec, + pub struct_specs: Vec, +} + +/// Trait for extracting API specifications from different sources +pub trait ApiExtractor { + /// Extract all API specifications from the source + fn extract_all(&self) -> Result>; + + /// Extract a specific API specification by name + fn extract_by_name(&self, name: &str) -> Result>; + + /// Display detailed information about a specific API + fn display_api_details( + &self, + api_name: &str, + formatter: &mut dyn OutputFormatter, + writer: &mut dyn Write, + ) -> Result<()>; +} + +/// Helper function to display an ApiSpec using a formatter +pub fn display_api_spec( + spec: &ApiSpec, + formatter: &mut dyn OutputFormatter, + writer: &mut dyn Write, +) -> Result<()> { + formatter.begin_api_details(writer, &spec.name)?; + + if let Some(desc) =3D &spec.description { + formatter.description(writer, desc)?; + } + + if let Some(long_desc) =3D &spec.long_description { + formatter.long_description(writer, long_desc)?; + } + + if let Some(version) =3D &spec.since_version { + formatter.since_version(writer, version)?; + } + + if !spec.context_flags.is_empty() { + formatter.begin_context_flags(writer)?; + for flag in &spec.context_flags { + formatter.context_flag(writer, flag)?; + } + formatter.end_context_flags(writer)?; + } + + if !spec.parameters.is_empty() { + formatter.begin_parameters(writer, spec.parameters.len().try_into(= ).unwrap_or(u32::MAX))?; + for param in &spec.parameters { + formatter.parameter(writer, param)?; + } + formatter.end_parameters(writer)?; + } + + if let Some(ret) =3D &spec.return_spec { + formatter.return_spec(writer, ret)?; + } + + if !spec.errors.is_empty() { + formatter.begin_errors(writer, spec.errors.len().try_into().unwrap= _or(u32::MAX))?; + for error in &spec.errors { + formatter.error(writer, error)?; + } + formatter.end_errors(writer)?; + } + + if let Some(notes) =3D &spec.notes { + formatter.notes(writer, notes)?; + } + + if let Some(examples) =3D &spec.examples { + formatter.examples(writer, examples)?; + } + + // Display sysfs-specific fields + if spec.api_type =3D=3D "sysfs" { + if let Some(subsystem) =3D &spec.subsystem { + formatter.sysfs_subsystem(writer, subsystem)?; + } + if let Some(path) =3D &spec.sysfs_path { + formatter.sysfs_path(writer, path)?; + } + if let Some(perms) =3D &spec.permissions { + formatter.sysfs_permissions(writer, perms)?; + } + } + + // Display networking-specific fields + if let Some(socket_state) =3D &spec.socket_state { + formatter.socket_state(writer, socket_state)?; + } + + if !spec.protocol_behaviors.is_empty() { + formatter.begin_protocol_behaviors(writer)?; + for behavior in &spec.protocol_behaviors { + formatter.protocol_behavior(writer, behavior)?; + } + formatter.end_protocol_behaviors(writer)?; + } + + if !spec.addr_families.is_empty() { + formatter.begin_addr_families(writer)?; + for family in &spec.addr_families { + formatter.addr_family(writer, family)?; + } + formatter.end_addr_families(writer)?; + } + + if let Some(buffer_spec) =3D &spec.buffer_spec { + formatter.buffer_spec(writer, buffer_spec)?; + } + + if let Some(async_spec) =3D &spec.async_spec { + formatter.async_spec(writer, async_spec)?; + } + + if let Some(net_data_transfer) =3D &spec.net_data_transfer { + formatter.net_data_transfer(writer, net_data_transfer)?; + } + + if !spec.capabilities.is_empty() { + formatter.begin_capabilities(writer)?; + for cap in &spec.capabilities { + formatter.capability(writer, cap)?; + } + formatter.end_capabilities(writer)?; + } + + // Display signals + if !spec.signals.is_empty() { + formatter.begin_signals(writer, spec.signals.len().try_into().unwr= ap_or(u32::MAX))?; + for signal in &spec.signals { + formatter.signal(writer, signal)?; + } + formatter.end_signals(writer)?; + } + + // Display signal masks + if !spec.signal_masks.is_empty() { + formatter.begin_signal_masks( + writer, + spec.signal_masks.len().try_into().unwrap_or(u32::MAX), + )?; + for mask in &spec.signal_masks { + formatter.signal_mask(writer, mask)?; + } + formatter.end_signal_masks(writer)?; + } + + // Display side effects + if !spec.side_effects.is_empty() { + formatter.begin_side_effects( + writer, + spec.side_effects.len().try_into().unwrap_or(u32::MAX), + )?; + for effect in &spec.side_effects { + formatter.side_effect(writer, effect)?; + } + formatter.end_side_effects(writer)?; + } + + // Display state transitions + if !spec.state_transitions.is_empty() { + formatter.begin_state_transitions( + writer, + spec.state_transitions.len().try_into().unwrap_or(u32::MAX), + )?; + for trans in &spec.state_transitions { + formatter.state_transition(writer, trans)?; + } + formatter.end_state_transitions(writer)?; + } + + // Display constraints + if !spec.constraints.is_empty() { + formatter.begin_constraints( + writer, + spec.constraints.len().try_into().unwrap_or(u32::MAX), + )?; + for constraint in &spec.constraints { + formatter.constraint(writer, constraint)?; + } + formatter.end_constraints(writer)?; + } + + // Display locks + if !spec.locks.is_empty() { + formatter.begin_locks(writer, spec.locks.len().try_into().unwrap_o= r(u32::MAX))?; + for lock in &spec.locks { + formatter.lock(writer, lock)?; + } + formatter.end_locks(writer)?; + } + + // Display struct specs + if !spec.struct_specs.is_empty() { + formatter.begin_struct_specs(writer, spec.struct_specs.len().try_i= nto().unwrap_or(u32::MAX))?; + for struct_spec in &spec.struct_specs { + formatter.struct_spec(writer, struct_spec)?; + } + formatter.end_struct_specs(writer)?; + } + + formatter.end_api_details(writer)?; + + Ok(()) +} diff --git a/tools/kapi/src/extractor/source_parser.rs b/tools/kapi/src/ext= ractor/source_parser.rs new file mode 100644 index 0000000000000..7a72b85a83bea --- /dev/null +++ b/tools/kapi/src/extractor/source_parser.rs @@ -0,0 +1,213 @@ +use super::{ + ApiExtractor, ApiSpec, display_api_spec, +}; +use super::kerneldoc_parser::KerneldocParserImpl; +use crate::formatter::OutputFormatter; +use anyhow::{Context, Result}; +use regex::Regex; +use std::fs; +use std::io::Write; +use std::path::Path; +use walkdir::WalkDir; + +/// Extractor for kernel source files with KAPI-annotated kerneldoc +pub struct SourceExtractor { + path: String, + parser: KerneldocParserImpl, + syscall_regex: Regex, + ioctl_regex: Regex, + function_regex: Regex, +} + +impl SourceExtractor { + pub fn new(path: &str) -> Result { + Ok(SourceExtractor { + path: path.to_string(), + parser: KerneldocParserImpl::new(), + syscall_regex: Regex::new(r"SYSCALL_DEFINE\d+\((\w+)")?, + ioctl_regex: Regex::new(r"(?:static\s+)?long\s+(\w+_ioctl)\s*\= (")?, + function_regex: Regex::new( + r"(?m)^(?:static\s+)?(?:inline\s+)?(?:(?:unsigned\s+)?(?:l= ong|int|void|char|short|struct\s+\w+\s*\*?|[\w_]+_t)\s*\*?\s+)?(\w+)\s*\([^= )]*\)", + )?, + }) + } + + fn extract_from_file(&self, path: &Path) -> Result> { + let content =3D fs::read_to_string(path) + .with_context(|| format!("Failed to read file: {}", path.displ= ay()))?; + + self.extract_from_content(&content) + } + + fn extract_from_content(&self, content: &str) -> Result> { + let mut specs =3D Vec::new(); + let mut in_kerneldoc =3D false; + let mut current_doc =3D String::new(); + let lines: Vec<&str> =3D content.lines().collect(); + let mut i =3D 0; + + while i < lines.len() { + let line =3D lines[i]; + + // Start of kerneldoc comment + if line.trim_start().starts_with("/**") { + in_kerneldoc =3D true; + current_doc.clear(); + i +=3D 1; + continue; + } + + // Inside kerneldoc comment + if in_kerneldoc { + if line.contains("*/") { + in_kerneldoc =3D false; + + // Check if this kerneldoc has KAPI annotations + if current_doc.contains("context-flags:") || + current_doc.contains("param-count:") || + current_doc.contains("side-effect:") || + current_doc.contains("state-trans:") || + current_doc.contains("error-code:") { + + // Look ahead for the function declaration + if let Some((name, api_type, signature)) =3D self.= find_function_after(&lines, i + 1) { + if let Ok(spec) =3D self.parser.parse_kerneldo= c(¤t_doc, &name, &api_type, Some(&signature)) { + specs.push(spec); + } + } + } + } else { + // Remove leading asterisk and preserve content + let cleaned =3D if let Some(stripped) =3D line.trim_st= art().strip_prefix("*") { + if let Some(no_space) =3D stripped.strip_prefix(' = ') { + no_space + } else { + stripped + } + } else { + line.trim_start() + }; + current_doc.push_str(cleaned); + current_doc.push('\n'); + } + } + + i +=3D 1; + } + + Ok(specs) + } + + fn find_function_after(&self, lines: &[&str], start: usize) -> Option<= (String, String, String)> { + for i in start..lines.len().min(start + 10) { + let line =3D lines[i]; + + // Skip empty lines + if line.trim().is_empty() { + continue; + } + + // Check for SYSCALL_DEFINE + if let Some(caps) =3D self.syscall_regex.captures(line) { + let name =3D format!("sys_{}", caps.get(1).unwrap().as_str= ()); + let signature =3D self.extract_syscall_signature(lines, i); + return Some((name, "syscall".to_string(), signature)); + } + + // Check for ioctl function + if let Some(caps) =3D self.ioctl_regex.captures(line) { + let name =3D caps.get(1).unwrap().as_str().to_string(); + return Some((name, "ioctl".to_string(), line.to_string())); + } + + // Check for regular function + if let Some(caps) =3D self.function_regex.captures(line) { + let name =3D caps.get(1).unwrap().as_str().to_string(); + return Some((name, "function".to_string(), line.to_string(= ))); + } + + // Stop if we hit something that's clearly not part of the fun= ction declaration + if !line.starts_with(' ') && !line.starts_with('\t') && !line.= trim().is_empty() { + break; + } + } + + None + } + + fn extract_syscall_signature(&self, lines: &[&str], start: usize) -> S= tring { + // Extract the full SYSCALL_DEFINE signature + let mut sig =3D String::new(); + let mut in_paren =3D false; + let mut paren_count =3D 0; + + for line in lines.iter().skip(start).take(20) { + let line =3D *line; + + // Start of SYSCALL_DEFINE + if line.contains("SYSCALL_DEFINE") { + if let Some(pos) =3D line.find('(') { + sig.push_str(&line[pos..]); + in_paren =3D true; + paren_count =3D line[pos..].chars().filter(|&c| c =3D= =3D '(').count() - + line[pos..].chars().filter(|&c| c =3D=3D= ')').count(); + } + } else if in_paren { + sig.push(' '); + sig.push_str(line.trim()); + paren_count +=3D line.chars().filter(|&c| c =3D=3D '(').co= unt(); + paren_count -=3D line.chars().filter(|&c| c =3D=3D ')').co= unt(); + + if paren_count =3D=3D 0 { + break; + } + } + } + + sig + } +} + +impl ApiExtractor for SourceExtractor { + fn extract_all(&self) -> Result> { + let path =3D Path::new(&self.path); + let mut all_specs =3D Vec::new(); + + if path.is_file() { + // Single file + all_specs.extend(self.extract_from_file(path)?); + } else if path.is_dir() { + // Directory - walk all .c files + for entry in WalkDir::new(path) + .into_iter() + .filter_map(|e| e.ok()) + .filter(|e| e.path().extension().is_some_and(|ext| ext =3D= =3D "c")) + { + if let Ok(specs) =3D self.extract_from_file(entry.path()) { + all_specs.extend(specs); + } + } + } + + Ok(all_specs) + } + + fn extract_by_name(&self, name: &str) -> Result> { + let all_specs =3D self.extract_all()?; + Ok(all_specs.into_iter().find(|s| s.name =3D=3D name)) + } + + fn display_api_details( + &self, + api_name: &str, + formatter: &mut dyn OutputFormatter, + output: &mut dyn Write, + ) -> Result<()> { + if let Some(spec) =3D self.extract_by_name(api_name)? { + display_api_spec(&spec, formatter, output)?; + } else { + writeln!(output, "API '{}' not found", api_name)?; + } + Ok(()) + } +} \ No newline at end of file diff --git a/tools/kapi/src/extractor/vmlinux/binary_utils.rs b/tools/kapi/= src/extractor/vmlinux/binary_utils.rs new file mode 100644 index 0000000000000..0a51943e1c027 --- /dev/null +++ b/tools/kapi/src/extractor/vmlinux/binary_utils.rs @@ -0,0 +1,180 @@ +// Constants for all structure field sizes +pub mod sizes { + pub const NAME: usize =3D 128; + pub const DESC: usize =3D 512; + pub const MAX_PARAMS: usize =3D 16; + pub const MAX_ERRORS: usize =3D 32; + pub const MAX_CONSTRAINTS: usize =3D 16; + pub const MAX_CAPABILITIES: usize =3D 8; + pub const MAX_SIGNALS: usize =3D 16; + pub const MAX_STRUCT_SPECS: usize =3D 8; + pub const MAX_SIDE_EFFECTS: usize =3D 32; + pub const MAX_STATE_TRANS: usize =3D 16; + pub const MAX_PROTOCOL_BEHAVIORS: usize =3D 8; + pub const MAX_ADDR_FAMILIES: usize =3D 8; +} + +// Helper for reading data at specific offsets +pub struct DataReader<'a> { + pub data: &'a [u8], + pub pos: usize, +} + +impl<'a> DataReader<'a> { + pub fn new(data: &'a [u8], offset: usize) -> Self { + Self { data, pos: offset } + } + + pub fn read_bytes(&mut self, len: usize) -> Option<&'a [u8]> { + if self.pos + len <=3D self.data.len() { + let bytes =3D &self.data[self.pos..self.pos + len]; + self.pos +=3D len; + Some(bytes) + } else { + None + } + } + + pub fn read_cstring(&mut self, max_len: usize) -> Option { + let bytes =3D self.read_bytes(max_len)?; + if let Some(null_pos) =3D bytes.iter().position(|&b| b =3D=3D 0) { + if null_pos > 0 { + if let Ok(s) =3D std::str::from_utf8(&bytes[..null_pos]) { + return Some(s.to_string()); + } + } + } + None + } + + pub fn read_u32(&mut self) -> Option { + self.read_bytes(4).map(|b| u32::from_le_bytes(b.try_into().unwrap(= ))) + } + + pub fn read_u8(&mut self) -> Option { + self.read_bytes(1).map(|b| b[0]) + } + + pub fn read_i32(&mut self) -> Option { + self.read_bytes(4).map(|b| i32::from_le_bytes(b.try_into().unwrap(= ))) + } + + pub fn read_u64(&mut self) -> Option { + self.read_bytes(8).map(|b| u64::from_le_bytes(b.try_into().unwrap(= ))) + } + + pub fn read_i64(&mut self) -> Option { + self.read_bytes(8).map(|b| i64::from_le_bytes(b.try_into().unwrap(= ))) + } + + pub fn read_usize(&mut self) -> Option { + self.read_u64().map(|v| v as usize) + } + + pub fn skip(&mut self, len: usize) { + self.pos =3D (self.pos + len).min(self.data.len()); + } + + // Helper methods for common patterns + pub fn read_bool(&mut self) -> Option { + self.read_u8().map(|v| v !=3D 0) + } + + pub fn read_optional_string(&mut self, max_len: usize) -> Option { + self.read_cstring(max_len).filter(|s| !s.is_empty()) + } + + pub fn read_string_or_default(&mut self, max_len: usize) -> String { + self.read_cstring(max_len).unwrap_or_default() + } + + // Skip and discard - advances position by reading and discarding + pub fn discard_cstring(&mut self, max_len: usize) { + let _ =3D self.read_cstring(max_len); + } + + // Read multiple booleans at once + pub fn read_bools(&mut self) -> Option<[bool; N]> { + let mut result =3D [false; N]; + for item in &mut result { + *item =3D self.read_bool()?; + } + Some(result) + } + + +} + +// Structure layout definitions for calculating sizes +pub fn signal_mask_spec_layout_size() -> usize { + // Packed structure from struct kapi_signal_mask_spec + sizes::NAME + // mask_name + 4 * sizes::MAX_SIGNALS + // signals array + 4 + // signal_count + sizes::DESC // description +} + +pub fn struct_field_layout_size() -> usize { + // Packed structure from struct kapi_struct_field + sizes::NAME + // name + 4 + // type (enum) + sizes::NAME + // type_name + 8 + // offset (size_t) + 8 + // size (size_t) + 4 + // flags + 4 + // constraint_type (enum) + 8 + // min_value (s64) + 8 + // max_value (s64) + 8 + // valid_mask (u64) + sizes::DESC + // enum_values + sizes::DESC // description +} + +pub fn socket_state_spec_layout_size() -> usize { + // struct kapi_socket_state_spec + sizes::NAME * sizes::MAX_CONSTRAINTS + // required_states array + sizes::NAME * sizes::MAX_CONSTRAINTS + // forbidden_states array + sizes::NAME + // resulting_state + sizes::DESC + // condition + sizes::NAME + // applicable_protocols + 4 + // required_count + 4 // forbidden_count +} + +pub fn protocol_behavior_spec_layout_size() -> usize { + // struct kapi_protocol_behavior + sizes::NAME + // applicable_protocols + sizes::DESC + // behavior + sizes::NAME + // protocol_flags + sizes::DESC // flag_description +} + +pub fn buffer_spec_layout_size() -> usize { + // struct kapi_buffer_spec + sizes::DESC + // buffer_behaviors + 8 + // min_buffer_size (size_t) + 8 + // max_buffer_size (size_t) + 8 // optimal_buffer_size (size_t) +} + +pub fn async_spec_layout_size() -> usize { + // struct kapi_async_spec + sizes::NAME + // supported_modes + 4 // nonblock_errno (int) +} + +pub fn addr_family_spec_layout_size() -> usize { + // struct kapi_addr_family_spec + 4 + // family (int) + sizes::NAME + // family_name + 8 + // addr_struct_size (size_t) + 8 + // min_addr_len (size_t) + 8 + // max_addr_len (size_t) + sizes::DESC + // addr_format + 1 + // supports_wildcard (bool) + 1 + // supports_multicast (bool) + 1 + // supports_broadcast (bool) + sizes::DESC + // special_addresses + 4 + // port_range_min (u32) + 4 // port_range_max (u32) +} diff --git a/tools/kapi/src/extractor/vmlinux/magic_finder.rs b/tools/kapi/= src/extractor/vmlinux/magic_finder.rs new file mode 100644 index 0000000000000..cb7dc535801a0 --- /dev/null +++ b/tools/kapi/src/extractor/vmlinux/magic_finder.rs @@ -0,0 +1,102 @@ +// Magic markers for each section +pub const MAGIC_PARAM: u32 =3D 0x4B415031; // 'KAP1' +pub const MAGIC_RETURN: u32 =3D 0x4B415232; // 'KAR2' +pub const MAGIC_ERROR: u32 =3D 0x4B414533; // 'KAE3' +pub const MAGIC_LOCK: u32 =3D 0x4B414C34; // 'KAL4' +pub const MAGIC_CONSTRAINT: u32 =3D 0x4B414335; // 'KAC5' +pub const MAGIC_INFO: u32 =3D 0x4B414936; // 'KAI6' +pub const MAGIC_SIGNAL: u32 =3D 0x4B415337; // 'KAS7' +pub const MAGIC_SIGMASK: u32 =3D 0x4B414D38; // 'KAM8' +pub const MAGIC_STRUCT: u32 =3D 0x4B415439; // 'KAT9' +pub const MAGIC_EFFECT: u32 =3D 0x4B414641; // 'KAFA' +pub const MAGIC_TRANS: u32 =3D 0x4B415442; // 'KATB' +pub const MAGIC_CAP: u32 =3D 0x4B414343; // 'KACC' + +pub struct MagicOffsets { + pub param_offset: Option, + pub return_offset: Option, + pub error_offset: Option, + pub lock_offset: Option, + pub constraint_offset: Option, + pub info_offset: Option, + pub signal_offset: Option, + pub sigmask_offset: Option, + pub struct_offset: Option, + pub effect_offset: Option, + pub trans_offset: Option, + pub cap_offset: Option, +} + +impl MagicOffsets { + /// Find magic markers in the provided data slice + /// data: slice of data to search (typically one spec's worth) + /// base_offset: absolute offset where this slice starts in the full b= uffer + pub fn find_in_data(data: &[u8], base_offset: usize) -> Self { + let mut offsets =3D MagicOffsets { + param_offset: None, + return_offset: None, + error_offset: None, + lock_offset: None, + constraint_offset: None, + info_offset: None, + signal_offset: None, + sigmask_offset: None, + struct_offset: None, + effect_offset: None, + trans_offset: None, + cap_offset: None, + }; + + // Scan through data looking for magic markers + // Only find the first occurrence of each magic to avoid cross-spe= c contamination + let mut i =3D 0; + while i + 4 <=3D data.len() { + let bytes =3D &data[i..i + 4]; + let value =3D u32::from_le_bytes([bytes[0], bytes[1], bytes[2]= , bytes[3]]); + + match value { + MAGIC_PARAM if offsets.param_offset.is_none() =3D> { + offsets.param_offset =3D Some(base_offset + i); + }, + MAGIC_RETURN if offsets.return_offset.is_none() =3D> { + offsets.return_offset =3D Some(base_offset + i); + }, + MAGIC_ERROR if offsets.error_offset.is_none() =3D> { + offsets.error_offset =3D Some(base_offset + i); + }, + MAGIC_LOCK if offsets.lock_offset.is_none() =3D> { + offsets.lock_offset =3D Some(base_offset + i); + }, + MAGIC_CONSTRAINT if offsets.constraint_offset.is_none() = =3D> { + offsets.constraint_offset =3D Some(base_offset + i); + }, + MAGIC_INFO if offsets.info_offset.is_none() =3D> { + offsets.info_offset =3D Some(base_offset + i); + }, + MAGIC_SIGNAL if offsets.signal_offset.is_none() =3D> { + offsets.signal_offset =3D Some(base_offset + i); + }, + MAGIC_SIGMASK if offsets.sigmask_offset.is_none() =3D> { + offsets.sigmask_offset =3D Some(base_offset + i); + }, + MAGIC_STRUCT if offsets.struct_offset.is_none() =3D> { + offsets.struct_offset =3D Some(base_offset + i); + }, + MAGIC_EFFECT if offsets.effect_offset.is_none() =3D> { + offsets.effect_offset =3D Some(base_offset + i); + }, + MAGIC_TRANS if offsets.trans_offset.is_none() =3D> { + offsets.trans_offset =3D Some(base_offset + i); + }, + MAGIC_CAP if offsets.cap_offset.is_none() =3D> { + offsets.cap_offset =3D Some(base_offset + i); + }, + _ =3D> {} + } + + i +=3D 1; + } + + offsets + } +} \ No newline at end of file diff --git a/tools/kapi/src/extractor/vmlinux/mod.rs b/tools/kapi/src/extra= ctor/vmlinux/mod.rs new file mode 100644 index 0000000000000..bf3da4df6e66a --- /dev/null +++ b/tools/kapi/src/extractor/vmlinux/mod.rs @@ -0,0 +1,864 @@ +use super::{ + ApiExtractor, ApiSpec, CapabilitySpec, ConstraintSpec, ErrorSpec, Lock= Spec, ParamSpec, + ReturnSpec, SideEffectSpec, SignalMaskSpec, SignalSpec, StateTransitio= nSpec, StructSpec, + StructFieldSpec, +}; +use crate::formatter::OutputFormatter; +use anyhow::{Context, Result}; +use goblin::elf::Elf; +use std::convert::TryInto; +use std::fs; +use std::io::Write; + +mod binary_utils; +mod magic_finder; +use binary_utils::{ + DataReader, addr_family_spec_layout_size, async_spec_layout_size, buff= er_spec_layout_size, + protocol_behavior_spec_layout_size, signal_mask_spec_layout_size, + sizes, socket_state_spec_layout_size, struct_field_layout_size, +}; + +// Helper to convert empty strings to None +fn opt_string(s: String) -> Option { + if s.is_empty() { None } else { Some(s) } +} + +pub struct VmlinuxExtractor { + kapi_data: Vec, + specs: Vec, +} + +#[derive(Debug)] +struct KapiSpec { + name: String, + api_type: String, + offset: usize, +} + +impl VmlinuxExtractor { + pub fn new(vmlinux_path: &str) -> Result { + let vmlinux_data =3D fs::read(vmlinux_path) + .with_context(|| format!("Failed to read vmlinux file: {vmlinu= x_path}"))?; + + let elf =3D Elf::parse(&vmlinux_data).context("Failed to parse ELF= file")?; + + // Find __start_kapi_specs and __stop_kapi_specs symbols first + let mut start_addr =3D None; + let mut stop_addr =3D None; + + for sym in &elf.syms { + if let Some(name) =3D elf.strtab.get_at(sym.st_name) { + match name { + "__start_kapi_specs" =3D> start_addr =3D Some(sym.st_v= alue), + "__stop_kapi_specs" =3D> stop_addr =3D Some(sym.st_val= ue), + _ =3D> {} + } + } + } + + let start =3D start_addr.context("Could not find __start_kapi_spec= s symbol")?; + let stop =3D stop_addr.context("Could not find __stop_kapi_specs s= ymbol")?; + + if stop <=3D start { + anyhow::bail!("No kernel API specifications found in vmlinux"); + } + + // Find the section containing the kapi specs data + // The specs may be in .kapi_specs (standalone) or .rodata (embedd= ed in RO_DATA) + let containing_section =3D elf + .section_headers + .iter() + .find(|sh| { + // Check if this section contains the start address + start >=3D sh.sh_addr && start < sh.sh_addr + sh.sh_size + }) + .context("Could not find section containing kapi_specs data")?; + + // Calculate the offset within the file + let section_vaddr =3D containing_section.sh_addr; + let file_offset =3D containing_section.sh_offset + (start - sectio= n_vaddr); + let data_size: usize =3D (stop - start) + .try_into() + .context("Data size too large for platform")?; + + let file_offset_usize: usize =3D file_offset + .try_into() + .context("File offset too large for platform")?; + + if file_offset_usize + data_size > vmlinux_data.len() { + anyhow::bail!("Invalid offset/size for kapi_specs data"); + } + + // Extract the raw data + let kapi_data =3D vmlinux_data[file_offset_usize..(file_offset_usi= ze + data_size)].to_vec(); + + // Parse the specifications + let specs =3D parse_kapi_specs(&kapi_data)?; + + Ok(VmlinuxExtractor { kapi_data, specs }) + } +} + +fn parse_kapi_specs(data: &[u8]) -> Result> { + let mut specs =3D Vec::new(); + let mut offset =3D 0; + let mut last_found_offset =3D None; + + // Expected offset from struct start to param_magic based on struct la= yout + let param_magic_offset =3D sizes::NAME + 4 + sizes::DESC + (sizes::DES= C * 4) + 4; + + // Find specs by validating API name and magic marker pairs + while offset + param_magic_offset + 4 <=3D data.len() { + // Read potential API name + let name_bytes =3D &data[offset..offset + sizes::NAME.min(data.len= () - offset)]; + + // Find null terminator + let name_len =3D name_bytes.iter().position(|&b| b =3D=3D 0).unwra= p_or(0); + + if name_len > 0 && name_len < 100 { + let name =3D String::from_utf8_lossy(&name_bytes[..name_len]).= to_string(); + + // Validate API name format + if is_valid_api_name(&name) { + // Verify magic marker at expected position + let magic_offset =3D offset + param_magic_offset; + if magic_offset + 4 <=3D data.len() { + let magic_bytes =3D &data[magic_offset..magic_offset += 4]; + let magic_value =3D u32::from_le_bytes([magic_bytes[0]= , magic_bytes[1], magic_bytes[2], magic_bytes[3]]); + + if magic_value =3D=3D magic_finder::MAGIC_PARAM { + // Avoid duplicate detection of the same spec + if last_found_offset.is_none() || offset >=3D last= _found_offset.unwrap() + param_magic_offset { + let api_type =3D if name.starts_with("sys_") { + "syscall" + } else if name.ends_with("_ioctl") { + "ioctl" + } else if name.contains("sysfs") { + "sysfs" + } else { + "function" + } + .to_string(); + + specs.push(KapiSpec { + name: name.clone(), + api_type, + offset, + }); + + last_found_offset =3D Some(offset); + } + } + } + } + } + + // Scan byte by byte to find all specs + offset +=3D 1; + } + + Ok(specs) +} + + + + +fn is_valid_api_name(name: &str) -> bool { + // Validate API name format and length + if name.is_empty() || name.len() < 3 || name.len() > 100 { + return false; + } + + // Alphanumeric and underscore characters only + if !name.chars().all(|c| c.is_ascii_alphanumeric() || c =3D=3D '_') { + return false; + } + + // Must start with letter or underscore + let first_char =3D name.chars().next().unwrap(); + if !first_char.is_ascii_alphabetic() && first_char !=3D '_' { + return false; + } + + // Match common kernel API patterns + name.starts_with("sys_") || + name.starts_with("__") || + name.ends_with("_ioctl") || + name.contains("_") || + name.len() > 6 +} + +impl ApiExtractor for VmlinuxExtractor { + fn extract_all(&self) -> Result> { + Ok(self + .specs + .iter() + .map(|spec| { + // Parse the full spec for listing + parse_binary_to_api_spec(&self.kapi_data, spec.offset) + .unwrap_or_else(|_| ApiSpec { + name: spec.name.clone(), + api_type: spec.api_type.clone(), + description: None, + long_description: None, + version: None, + context_flags: vec![], + param_count: None, + error_count: None, + examples: None, + notes: None, + since_version: None, + subsystem: None, + sysfs_path: None, + permissions: None, + socket_state: None, + protocol_behaviors: vec![], + addr_families: vec![], + buffer_spec: None, + async_spec: None, + net_data_transfer: None, + capabilities: vec![], + parameters: vec![], + return_spec: None, + errors: vec![], + signals: vec![], + signal_masks: vec![], + side_effects: vec![], + state_transitions: vec![], + constraints: vec![], + locks: vec![], + struct_specs: vec![], + }) + }) + .collect()) + } + + fn extract_by_name(&self, api_name: &str) -> Result> { + if let Some(spec) =3D self.specs.iter().find(|s| s.name =3D=3D api= _name) { + Ok(Some(parse_binary_to_api_spec(&self.kapi_data, spec.offset)= ?)) + } else { + Ok(None) + } + } + + fn display_api_details( + &self, + api_name: &str, + formatter: &mut dyn OutputFormatter, + writer: &mut dyn Write, + ) -> Result<()> { + if let Some(spec) =3D self.specs.iter().find(|s| s.name =3D=3D api= _name) { + let api_spec =3D parse_binary_to_api_spec(&self.kapi_data, spe= c.offset)?; + super::display_api_spec(&api_spec, formatter, writer)?; + } + Ok(()) + } +} + +/// Helper to read count and parse array items with optional magic offset +fn parse_array_with_magic( + reader: &mut DataReader, + magic_offset: Option, + max_items: u32, + parse_fn: F, +) -> Vec +where + F: Fn(&mut DataReader) -> Option, +{ + // Read count - position at magic+4 if magic offset exists + let count =3D if let Some(offset) =3D magic_offset { + reader.pos =3D offset + 4; + reader.read_u32() + } else { + reader.read_u32() + }; + + let mut items =3D Vec::new(); + if let Some(count) =3D count { + // Position at start of array data if magic offset exists + if let Some(offset) =3D magic_offset { + reader.pos =3D offset + 8; // +4 for magic, +4 for count + } + // Parse items up to max_items + for _ in 0..count.min(max_items) as usize { + if let Some(item) =3D parse_fn(reader) { + items.push(item); + } + } + } + items +} + +fn parse_binary_to_api_spec(data: &[u8], offset: usize) -> Result= { + let mut reader =3D DataReader::new(data, offset); + + // Search for magic markers in the entire spec data + let search_end =3D (offset + 0x70000).min(data.len()); // Search full = spec size + let spec_data =3D &data[offset..search_end]; + + // Find magic markers relative to the spec start + let magic_offsets =3D magic_finder::MagicOffsets::find_in_data(spec_da= ta, offset); + + // Read fields in exact order of struct kernel_api_spec + + // Read name (128 bytes) + let name =3D reader + .read_cstring(sizes::NAME) + .ok_or_else(|| anyhow::anyhow!("Failed to read API name"))?; + + // Determine API type + let api_type =3D if name.starts_with("sys_") { + "syscall" + } else if name.ends_with("_ioctl") { + "ioctl" + } else if name.contains("sysfs") { + "sysfs" + } else { + "function" + } + .to_string(); + + // Read version (u32) + let version =3D reader.read_u32().map(|v| v.to_string()); + + // Read description (512 bytes) + let description =3D reader.read_cstring(sizes::DESC).filter(|s| !s.is_= empty()); + + // Read long_description (2048 bytes) + let long_description =3D reader + .read_cstring(sizes::DESC * 4) + .filter(|s| !s.is_empty()); + + // Read context_flags (u32) + let context_flags =3D parse_context_flags(&mut reader); + + // Parse params array + let parameters =3D parse_array_with_magic( + &mut reader, + magic_offsets.param_offset, + sizes::MAX_PARAMS as u32, + |r| parse_param(r, 0), // Index doesn't seem to be used in parse_= param + ); + + // Read return_spec + let return_spec =3D parse_return_spec(&mut reader); + + // Parse errors array + let errors =3D parse_array_with_magic( + &mut reader, + magic_offsets.error_offset, + sizes::MAX_ERRORS as u32, + parse_error, + ); + + // Parse locks array + let locks =3D parse_array_with_magic( + &mut reader, + magic_offsets.lock_offset, + sizes::MAX_CONSTRAINTS as u32, + parse_lock, + ); + + // Parse constraints array + let constraints =3D parse_array_with_magic( + &mut reader, + magic_offsets.constraint_offset, + sizes::MAX_CONSTRAINTS as u32, + parse_constraint, + ); + + // Read examples and notes - position reader at info section if magic = found + let (examples, notes) =3D if let Some(info_offset) =3D magic_offsets.i= nfo_offset { + reader.pos =3D info_offset + 4; // +4 to skip magic + let examples =3D reader.read_cstring(sizes::DESC * 2).filter(|s| != s.is_empty()); + let notes =3D reader.read_cstring(sizes::DESC * 2).filter(|s| !s.i= s_empty()); + (examples, notes) + } else { + let examples =3D reader.read_cstring(sizes::DESC * 2).filter(|s| != s.is_empty()); + let notes =3D reader.read_cstring(sizes::DESC * 2).filter(|s| !s.i= s_empty()); + (examples, notes) + }; + + // Read since_version (32 bytes) + let since_version =3D reader.read_cstring(32).filter(|s| !s.is_empty()= ); + + // Skip deprecated (bool =3D 1 byte + 3 bytes padding) and replacement= (128 bytes) + // These fields were removed from kernel but we need to skip them for = binary compatibility + reader.skip(4); // deprecated + padding + reader.discard_cstring(sizes::NAME); // replacement + + // Parse signals array + let signals =3D parse_array_with_magic( + &mut reader, + magic_offsets.signal_offset, + sizes::MAX_SIGNALS as u32, + parse_signal, + ); + + // Read signal_mask_count (u32) + let signal_mask_count =3D reader.read_u32(); + + // Parse signal_masks array + let mut signal_masks =3D Vec::new(); + if let Some(count) =3D signal_mask_count { + for i in 0..sizes::MAX_SIGNALS { + if i < count as usize { + if let Some(mask) =3D parse_signal_mask(&mut reader) { + signal_masks.push(mask); + } + } else { + reader.skip(signal_mask_spec_layout_size()); + } + } + } else { + reader.skip(signal_mask_spec_layout_size() * sizes::MAX_SIGNALS); + } + + // Parse struct_specs array + let struct_specs =3D parse_array_with_magic( + &mut reader, + magic_offsets.struct_offset, + sizes::MAX_STRUCT_SPECS as u32, + parse_struct_spec, + ); + + // According to the C struct, the order is: + // side_effect_count, side_effects array, state_trans_count, state_tra= nsitions array, + // capability_count, capabilities array + + // Parse side_effects array + let side_effects =3D parse_array_with_magic( + &mut reader, + magic_offsets.effect_offset, + sizes::MAX_SIDE_EFFECTS as u32, + parse_side_effect, + ); + + // Parse state_transitions array + let state_transitions =3D parse_array_with_magic( + &mut reader, + magic_offsets.trans_offset, + sizes::MAX_STATE_TRANS as u32, + parse_state_transition, + ); + + // Parse capabilities array + let capabilities =3D parse_array_with_magic( + &mut reader, + magic_offsets.cap_offset, + sizes::MAX_CAPABILITIES as u32, + parse_capability, + ); + + // Skip remaining network/socket fields + reader.skip( + socket_state_spec_layout_size() + + protocol_behavior_spec_layout_size() * sizes::MAX_PROTOCOL_BEHAVIO= RS + + 4 + // protocol_behavior_count + buffer_spec_layout_size() + + async_spec_layout_size() + + addr_family_spec_layout_size() * sizes::MAX_ADDR_FAMILIES + + 4 + // addr_family_count + 6 + 2 + // 6 bool flags + padding + sizes::DESC * 3 // 3 semantic descriptions + ); + + Ok(ApiSpec { + name, + api_type, + description, + long_description, + version, + context_flags, + param_count: if parameters.is_empty() { None } else { Some(paramet= ers.len() as u32) }, + error_count: if errors.is_empty() { None } else { Some(errors.len(= ) as u32) }, + examples, + notes, + since_version, + subsystem: None, + sysfs_path: None, + permissions: None, + socket_state: None, + protocol_behaviors: vec![], + addr_families: vec![], + buffer_spec: None, + async_spec: None, + net_data_transfer: None, + capabilities, + parameters, + return_spec, + errors, + signals, + signal_masks, + side_effects, + state_transitions, + constraints, + locks, + struct_specs, + }) +} + +// Helper parsing functions + +fn parse_context_flags(reader: &mut DataReader) -> Vec { + const KAPI_CTX_PROCESS: u32 =3D 1 << 0; + const KAPI_CTX_SOFTIRQ: u32 =3D 1 << 1; + const KAPI_CTX_HARDIRQ: u32 =3D 1 << 2; + const KAPI_CTX_NMI: u32 =3D 1 << 3; + const KAPI_CTX_ATOMIC: u32 =3D 1 << 4; + const KAPI_CTX_SLEEPABLE: u32 =3D 1 << 5; + const KAPI_CTX_PREEMPT_DISABLED: u32 =3D 1 << 6; + const KAPI_CTX_IRQ_DISABLED: u32 =3D 1 << 7; + + if let Some(flags) =3D reader.read_u32() { + let mut parts =3D Vec::new(); + + if flags & KAPI_CTX_PROCESS !=3D 0 { + parts.push("KAPI_CTX_PROCESS"); + } + if flags & KAPI_CTX_SOFTIRQ !=3D 0 { + parts.push("KAPI_CTX_SOFTIRQ"); + } + if flags & KAPI_CTX_HARDIRQ !=3D 0 { + parts.push("KAPI_CTX_HARDIRQ"); + } + if flags & KAPI_CTX_NMI !=3D 0 { + parts.push("KAPI_CTX_NMI"); + } + if flags & KAPI_CTX_ATOMIC !=3D 0 { + parts.push("KAPI_CTX_ATOMIC"); + } + if flags & KAPI_CTX_SLEEPABLE !=3D 0 { + parts.push("KAPI_CTX_SLEEPABLE"); + } + if flags & KAPI_CTX_PREEMPT_DISABLED !=3D 0 { + parts.push("KAPI_CTX_PREEMPT_DISABLED"); + } + if flags & KAPI_CTX_IRQ_DISABLED !=3D 0 { + parts.push("KAPI_CTX_IRQ_DISABLED"); + } + + if !parts.is_empty() { + vec![parts.join(" | ")] + } else { + vec![] + } + } else { + vec![] + } +} + +fn parse_param(reader: &mut DataReader, index: usize) -> Option= { + let name =3D reader.read_cstring(sizes::NAME)?; + let type_name =3D reader.read_cstring(sizes::NAME)?; + let param_type =3D reader.read_u32()?; + let flags =3D reader.read_u32()?; + let size =3D reader.read_usize()?; + let alignment =3D reader.read_usize()?; + let min_value =3D reader.read_i64()?; + let max_value =3D reader.read_i64()?; + let valid_mask =3D reader.read_u64()?; + + // Skip enum_values pointer (8 bytes) + reader.skip(8); + let _enum_count =3D reader.read_u32()?; // Must use ? to propagate err= ors + let constraint_type =3D reader.read_u32()?; + // Skip validate function pointer (8 bytes) + reader.skip(8); + + let description =3D reader.read_string_or_default(sizes::DESC); + let constraint =3D reader.read_optional_string(sizes::DESC); + let _size_param_idx =3D reader.read_i32()?; // Must use ? to propagate= errors + let _size_multiplier =3D reader.read_usize()?; // Must use ? to propag= ate errors + + Some(ParamSpec { + index: index as u32, + name, + type_name, + description, + flags, + param_type, + constraint_type, + constraint, + min_value: Some(min_value), + max_value: Some(max_value), + valid_mask: Some(valid_mask), + enum_values: vec![], + size: Some(size as u32), + alignment: Some(alignment as u32), + }) +} + +fn parse_return_spec(reader: &mut DataReader) -> Option { + // Read type_name, but treat empty as valid (will be empty string) + let type_name =3D reader.read_string_or_default(sizes::NAME); + + // Read return_type and check_type + let return_type =3D reader.read_u32().unwrap_or(0); + let check_type =3D reader.read_u32().unwrap_or(0); + let success_value =3D reader.read_i64().unwrap_or(0); + let success_min =3D reader.read_i64().unwrap_or(0); + let success_max =3D reader.read_i64().unwrap_or(0); + + // Skip error_values pointer (8 bytes) + reader.skip(8); + let _error_count =3D reader.read_u32().unwrap_or(0); // Don't fail on = return spec + // Skip is_success function pointer (8 bytes) + reader.skip(8); + + let description =3D reader.read_string_or_default(sizes::DESC); + + // Return a spec even if type_name is empty, as long as we have some d= ata + // The type_name might be a string like "KAPI_TYPE_INT" that gets stor= ed literally + if type_name.is_empty() && return_type =3D=3D 0 && check_type =3D=3D 0= && success_value =3D=3D 0 { + // No return spec at all + return None; + } + + Some(ReturnSpec { + type_name, + description, + return_type, + check_type, + success_value: Some(success_value), + success_min: Some(success_min), + success_max: Some(success_max), + error_values: vec![], + }) +} + +fn parse_error(reader: &mut DataReader) -> Option { + let error_code =3D reader.read_i32()?; + let name =3D reader.read_cstring(sizes::NAME)?; + let condition =3D reader.read_string_or_default(sizes::DESC); + let description =3D reader.read_string_or_default(sizes::DESC); + + Some(ErrorSpec { + error_code, + name, + condition, + description, + }) +} + +fn parse_lock(reader: &mut DataReader) -> Option { + let lock_name =3D reader.read_cstring(sizes::NAME)?; + let lock_type =3D reader.read_u32()?; + let scope =3D reader.read_u32()?; + let description =3D reader.read_string_or_default(sizes::DESC); + + Some(LockSpec { + lock_name, + lock_type, + scope, + description, + }) +} + +fn parse_constraint(reader: &mut DataReader) -> Option { + let name =3D reader.read_cstring(sizes::NAME)?; + let description =3D reader.read_string_or_default(sizes::DESC); + let expression =3D reader.read_string_or_default(sizes::DESC); + + // No function pointer in packed struct + + Some(ConstraintSpec { + name, + description, + expression: opt_string(expression), + }) +} + +fn parse_signal(reader: &mut DataReader) -> Option { + let signal_num =3D reader.read_i32()?; + let signal_name =3D reader.read_cstring(32)?; // signal_name[32] + let direction =3D reader.read_u32()?; + let action =3D reader.read_u32()?; + let target =3D reader.read_optional_string(sizes::DESC); // target[512] + let condition =3D reader.read_optional_string(sizes::DESC); // conditi= on[512] + let description =3D reader.read_optional_string(sizes::DESC); // descr= iption[512] + let restartable =3D reader.read_bool()?; + let sa_flags_required =3D reader.read_u32()?; + let sa_flags_forbidden =3D reader.read_u32()?; + let error_on_signal =3D reader.read_i32()?; + let _transform_to =3D reader.read_i32()?; // transform_to + let timing_bytes =3D reader.read_bytes(32)?; // timing[32] + let timing =3D if let Some(end) =3D timing_bytes.iter().position(|&b| = b =3D=3D 0) { + String::from_utf8_lossy(&timing_bytes[..end]).parse().unwrap_or(0) + } else { + 0 + }; + let priority =3D reader.read_u8()?; + let interruptible =3D reader.read_bool()?; + let _queue_behavior =3D reader.read_bytes(128)?; // queue_behavior[128] + let state_required =3D reader.read_u32()?; + let state_forbidden =3D reader.read_u32()?; + + Some(SignalSpec { + signal_num, + signal_name, + direction, + action, + target, + condition, + description, + timing, + priority: priority as u32, + restartable, + interruptible, + queue: None, // queue_behavior not exposed in SignalSpec + sa_flags: 0, // Not directly available + sa_flags_required, + sa_flags_forbidden, + state_required, + state_forbidden, + error_on_signal: Some(error_on_signal), + }) +} + +fn parse_signal_mask(reader: &mut DataReader) -> Option { + let name =3D reader.read_cstring(sizes::NAME)?; + let description =3D reader.read_string_or_default(sizes::DESC); + + // Skip signals array + for _ in 0..sizes::MAX_SIGNALS { + reader.read_i32(); + } + + let _signal_count =3D reader.read_u32()?; + + Some(SignalMaskSpec { + name, + description, + }) +} + +fn parse_struct_field(reader: &mut DataReader) -> Option { + let name =3D reader.read_cstring(sizes::NAME)?; + let field_type =3D reader.read_u32()?; + let type_name =3D reader.read_cstring(sizes::NAME)?; + let offset =3D reader.read_usize()?; + let size =3D reader.read_usize()?; + let flags =3D reader.read_u32()?; + let constraint_type =3D reader.read_u32()?; + let min_value =3D reader.read_i64()?; + let max_value =3D reader.read_i64()?; + let valid_mask =3D reader.read_u64()?; + // Skip enum_values field (512 bytes) + let _enum_values =3D reader.read_cstring(sizes::DESC); // Don't fail o= n optional field + let description =3D reader.read_string_or_default(sizes::DESC); + + Some(StructFieldSpec { + name, + field_type, + type_name, + offset, + size, + flags, + constraint_type, + min_value, + max_value, + valid_mask, + description, + }) +} + +fn parse_struct_spec(reader: &mut DataReader) -> Option { + let name =3D reader.read_cstring(sizes::NAME)?; + let size =3D reader.read_usize()?; + let alignment =3D reader.read_usize()?; + let field_count =3D reader.read_u32()?; + + // Parse fields array + let mut fields =3D Vec::new(); + for _ in 0..field_count.min(sizes::MAX_PARAMS as u32) { + if let Some(field) =3D parse_struct_field(reader) { + fields.push(field); + } else { + // Skip this field if we can't parse it + reader.skip(struct_field_layout_size()); + } + } + + // Skip remaining fields if any + let remaining =3D sizes::MAX_PARAMS as u32 - field_count.min(sizes::MA= X_PARAMS as u32); + for _ in 0..remaining { + reader.skip(struct_field_layout_size()); + } + + let description =3D reader.read_string_or_default(sizes::DESC); + + Some(StructSpec { + name, + size, + alignment, + field_count, + fields, + description, + }) +} + +fn parse_side_effect(reader: &mut DataReader) -> Option { + let effect_type =3D reader.read_u32()?; + let target =3D reader.read_cstring(sizes::NAME)?; + let condition =3D reader.read_string_or_default(sizes::DESC); + let description =3D reader.read_string_or_default(sizes::DESC); + let reversible =3D reader.read_bool()?; + // No padding needed for packed struct + + Some(SideEffectSpec { + effect_type, + target, + condition: opt_string(condition), + description, + reversible, + }) +} + +fn parse_state_transition(reader: &mut DataReader) -> Option { + let from_state =3D reader.read_cstring(sizes::NAME)?; + let to_state =3D reader.read_cstring(sizes::NAME)?; + let condition =3D reader.read_string_or_default(sizes::DESC); + let object =3D reader.read_cstring(sizes::NAME)?; + let description =3D reader.read_string_or_default(sizes::DESC); + + Some(StateTransitionSpec { + object, + from_state, + to_state, + condition: opt_string(condition), + description, + }) +} + +fn parse_capability(reader: &mut DataReader) -> Option { + let capability =3D reader.read_i32()?; + let cap_name =3D reader.read_cstring(sizes::NAME)?; + let action =3D reader.read_u32()?; + let allows =3D reader.read_string_or_default(sizes::DESC); + let without_cap =3D reader.read_string_or_default(sizes::DESC); + let check_condition =3D reader.read_optional_string(sizes::DESC); + let priority =3D reader.read_u32()?; + + let mut alternatives =3D Vec::new(); + for _ in 0..sizes::MAX_CAPABILITIES { + if let Some(alt) =3D reader.read_i32() { + if alt !=3D 0 { + alternatives.push(alt); + } + } + } + + let _alternative_count =3D reader.read_u32()?; // alternative_count + + Some(CapabilitySpec { + capability, + name: cap_name, + action: action.to_string(), + allows, + without_cap, + check_condition, + priority: Some(priority as u8), + alternatives, + }) +} \ No newline at end of file diff --git a/tools/kapi/src/formatter/json.rs b/tools/kapi/src/formatter/js= on.rs new file mode 100644 index 0000000000000..8025467409d64 --- /dev/null +++ b/tools/kapi/src/formatter/json.rs @@ -0,0 +1,468 @@ +use super::OutputFormatter; +use crate::extractor::{ + AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,= ErrorSpec, LockSpec, + ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas= kSpec, SignalSpec, + SocketStateSpec, StateTransitionSpec, StructSpec, +}; +use serde::Serialize; +use std::io::Write; + +pub struct JsonFormatter { + data: JsonData, +} + +#[derive(Serialize)] +struct JsonData { + #[serde(skip_serializing_if =3D "Option::is_none")] + apis: Option>, + #[serde(skip_serializing_if =3D "Option::is_none")] + api_details: Option, +} + +#[derive(Serialize)] +struct JsonApi { + name: String, + api_type: String, +} + +#[derive(Serialize)] +struct JsonApiDetails { + name: String, + #[serde(skip_serializing_if =3D "Option::is_none")] + description: Option, + #[serde(skip_serializing_if =3D "Option::is_none")] + long_description: Option, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + context_flags: Vec, + #[serde(skip_serializing_if =3D "Option::is_none")] + examples: Option, + #[serde(skip_serializing_if =3D "Option::is_none")] + notes: Option, + #[serde(skip_serializing_if =3D "Option::is_none")] + since_version: Option, + // Sysfs-specific fields + #[serde(skip_serializing_if =3D "Option::is_none")] + subsystem: Option, + #[serde(skip_serializing_if =3D "Option::is_none")] + sysfs_path: Option, + #[serde(skip_serializing_if =3D "Option::is_none")] + permissions: Option, + // Networking-specific fields + #[serde(skip_serializing_if =3D "Option::is_none")] + socket_state: Option, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + protocol_behaviors: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + addr_families: Vec, + #[serde(skip_serializing_if =3D "Option::is_none")] + buffer_spec: Option, + #[serde(skip_serializing_if =3D "Option::is_none")] + async_spec: Option, + #[serde(skip_serializing_if =3D "Option::is_none")] + net_data_transfer: Option, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + capabilities: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + state_transitions: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + side_effects: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + parameters: Vec, + #[serde(skip_serializing_if =3D "Option::is_none")] + return_spec: Option, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + errors: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + locks: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + struct_specs: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + signals: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + signal_masks: Vec, + #[serde(skip_serializing_if =3D "Vec::is_empty")] + constraints: Vec, +} + +impl JsonFormatter { + pub fn new() -> Self { + JsonFormatter { + data: JsonData { + apis: None, + api_details: None, + }, + } + } +} + +impl OutputFormatter for JsonFormatter { + fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()= > { + Ok(()) + } + + fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()> { + let json =3D serde_json::to_string_pretty(&self.data)?; + writeln!(w, "{json}")?; + Ok(()) + } + + fn begin_api_list(&mut self, _w: &mut dyn Write, _title: &str) -> std:= :io::Result<()> { + self.data.apis =3D Some(Vec::new()); + Ok(()) + } + + fn api_item(&mut self, _w: &mut dyn Write, name: &str, api_type: &str)= -> std::io::Result<()> { + if let Some(apis) =3D &mut self.data.apis { + apis.push(JsonApi { + name: name.to_string(), + api_type: api_type.to_string(), + }); + } + Ok(()) + } + + fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn total_specs(&mut self, _w: &mut dyn Write, _count: usize) -> std::i= o::Result<()> { + Ok(()) + } + + fn begin_api_details(&mut self, _w: &mut dyn Write, name: &str) -> std= ::io::Result<()> { + self.data.api_details =3D Some(JsonApiDetails { + name: name.to_string(), + description: None, + long_description: None, + context_flags: Vec::new(), + examples: None, + notes: None, + since_version: None, + subsystem: None, + sysfs_path: None, + permissions: None, + socket_state: None, + protocol_behaviors: Vec::new(), + addr_families: Vec::new(), + buffer_spec: None, + async_spec: None, + net_data_transfer: None, + capabilities: Vec::new(), + state_transitions: Vec::new(), + side_effects: Vec::new(), + parameters: Vec::new(), + return_spec: None, + errors: Vec::new(), + locks: Vec::new(), + struct_specs: Vec::new(), + signals: Vec::new(), + signal_masks: Vec::new(), + constraints: Vec::new(), + }); + Ok(()) + } + + fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<(= )> { + Ok(()) + } + + fn description(&mut self, _w: &mut dyn Write, desc: &str) -> std::io::= Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.description =3D Some(desc.to_string()); + } + Ok(()) + } + + fn long_description(&mut self, _w: &mut dyn Write, desc: &str) -> std:= :io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.long_description =3D Some(desc.to_string()); + } + Ok(()) + } + + fn begin_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Resu= lt<()> { + Ok(()) + } + + fn context_flag(&mut self, _w: &mut dyn Write, flag: &str) -> std::io:= :Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.context_flags.push(flag.to_string()); + } + Ok(()) + } + + fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result= <()> { + Ok(()) + } + + fn begin_parameters(&mut self, _w: &mut dyn Write, _count: u32) -> std= ::io::Result<()> { + Ok(()) + } + + fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()= > { + Ok(()) + } + + fn begin_errors(&mut self, _w: &mut dyn Write, _count: u32) -> std::io= ::Result<()> { + Ok(()) + } + + fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn examples(&mut self, _w: &mut dyn Write, examples: &str) -> std::io:= :Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.examples =3D Some(examples.to_string()); + } + Ok(()) + } + + fn notes(&mut self, _w: &mut dyn Write, notes: &str) -> std::io::Resul= t<()> { + if let Some(details) =3D &mut self.data.api_details { + details.notes =3D Some(notes.to_string()); + } + Ok(()) + } + + fn since_version(&mut self, _w: &mut dyn Write, version: &str) -> std:= :io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.since_version =3D Some(version.to_string()); + } + Ok(()) + } + + fn sysfs_subsystem(&mut self, _w: &mut dyn Write, subsystem: &str) -> = std::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.subsystem =3D Some(subsystem.to_string()); + } + Ok(()) + } + + fn sysfs_path(&mut self, _w: &mut dyn Write, path: &str) -> std::io::R= esult<()> { + if let Some(details) =3D &mut self.data.api_details { + details.sysfs_path =3D Some(path.to_string()); + } + Ok(()) + } + + fn sysfs_permissions(&mut self, _w: &mut dyn Write, perms: &str) -> st= d::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.permissions =3D Some(perms.to_string()); + } + Ok(()) + } + + // Networking-specific methods + fn socket_state(&mut self, _w: &mut dyn Write, state: &SocketStateSpec= ) -> std::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.socket_state =3D Some(state.clone()); + } + Ok(()) + } + + fn begin_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io:= :Result<()> { + Ok(()) + } + + fn protocol_behavior( + &mut self, + _w: &mut dyn Write, + behavior: &ProtocolBehaviorSpec, + ) -> std::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.protocol_behaviors.push(behavior.clone()); + } + Ok(()) + } + + fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::R= esult<()> { + Ok(()) + } + + fn begin_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Resu= lt<()> { + Ok(()) + } + + fn addr_family(&mut self, _w: &mut dyn Write, family: &AddrFamilySpec)= -> std::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.addr_families.push(family.clone()); + } + Ok(()) + } + + fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result= <()> { + Ok(()) + } + + fn buffer_spec(&mut self, _w: &mut dyn Write, spec: &BufferSpec) -> st= d::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.buffer_spec =3D Some(spec.clone()); + } + Ok(()) + } + + fn async_spec(&mut self, _w: &mut dyn Write, spec: &AsyncSpec) -> std:= :io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.async_spec =3D Some(spec.clone()); + } + Ok(()) + } + + fn net_data_transfer(&mut self, _w: &mut dyn Write, desc: &str) -> std= ::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.net_data_transfer =3D Some(desc.to_string()); + } + Ok(()) + } + + fn begin_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Resul= t<()> { + Ok(()) + } + + fn capability(&mut self, _w: &mut dyn Write, cap: &CapabilitySpec) -> = std::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.capabilities.push(cap.clone()); + } + Ok(()) + } + + fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + // Stub implementations for new methods + fn parameter(&mut self, _w: &mut dyn Write, param: &ParamSpec) -> std:= :io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.parameters.push(param.clone()); + } + Ok(()) + } + + fn return_spec(&mut self, _w: &mut dyn Write, ret: &ReturnSpec) -> std= ::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.return_spec =3D Some(ret.clone()); + } + Ok(()) + } + + fn error(&mut self, _w: &mut dyn Write, error: &ErrorSpec) -> std::io:= :Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.errors.push(error.clone()); + } + Ok(()) + } + + fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::i= o::Result<()> { + Ok(()) + } + + fn signal(&mut self, _w: &mut dyn Write, signal: &SignalSpec) -> std::= io::Result<()> { + if let Some(api_details) =3D &mut self.data.api_details { + api_details.signals.push(signal.clone()); + } + Ok(()) + } + + fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> s= td::io::Result<()> { + Ok(()) + } + + fn signal_mask(&mut self, _w: &mut dyn Write, mask: &SignalMaskSpec) -= > std::io::Result<()> { + if let Some(api_details) =3D &mut self.data.api_details { + api_details.signal_masks.push(mask.clone()); + } + Ok(()) + } + + fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + fn begin_side_effects(&mut self, _w: &mut dyn Write, _count: u32) -> s= td::io::Result<()> { + Ok(()) + } + + fn side_effect(&mut self, _w: &mut dyn Write, effect: &SideEffectSpec)= -> std::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.side_effects.push(effect.clone()); + } + Ok(()) + } + + fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + fn begin_state_transitions(&mut self, _w: &mut dyn Write, _count: u32)= -> std::io::Result<()> { + Ok(()) + } + + fn state_transition( + &mut self, + _w: &mut dyn Write, + trans: &StateTransitionSpec, + ) -> std::io::Result<()> { + if let Some(details) =3D &mut self.data.api_details { + details.state_transitions.push(trans.clone()); + } + Ok(()) + } + + fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Re= sult<()> { + Ok(()) + } + + fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> st= d::io::Result<()> { + Ok(()) + } + + fn constraint( + &mut self, + _w: &mut dyn Write, + constraint: &ConstraintSpec, + ) -> std::io::Result<()> { + if let Some(api_details) =3D &mut self.data.api_details { + api_details.constraints.push(constraint.clone()); + } + Ok(()) + } + + fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<(= )> { + Ok(()) + } + + fn begin_locks(&mut self, _w: &mut dyn Write, _count: u32) -> std::io:= :Result<()> { + Ok(()) + } + + fn lock(&mut self, _w: &mut dyn Write, lock: &LockSpec) -> std::io::Re= sult<()> { + if let Some(details) =3D &mut self.data.api_details { + details.locks.push(lock.clone()); + } + Ok(()) + } + + fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_struct_specs(&mut self, _w: &mut dyn Write, _count: u32) -> s= td::io::Result<()> { + Ok(()) + } + + fn struct_spec(&mut self, _w: &mut dyn Write, spec: &StructSpec) -> st= d::io::Result<()> { + if let Some(ref mut details) =3D self.data.api_details { + details.struct_specs.push(spec.clone()); + } + Ok(()) + } + + fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } +} diff --git a/tools/kapi/src/formatter/mod.rs b/tools/kapi/src/formatter/mod= .rs new file mode 100644 index 0000000000000..3de8bf23bc29a --- /dev/null +++ b/tools/kapi/src/formatter/mod.rs @@ -0,0 +1,140 @@ +use crate::extractor::{ + AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,= ErrorSpec, LockSpec, + ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas= kSpec, SignalSpec, + SocketStateSpec, StateTransitionSpec, StructSpec, +}; +use std::io::Write; + +mod json; +mod plain; +mod rst; + +pub use json::JsonFormatter; +pub use plain::PlainFormatter; +pub use rst::RstFormatter; + +#[derive(Debug, Clone, Copy, PartialEq)] +pub enum OutputFormat { + Plain, + Json, + Rst, +} + +impl std::str::FromStr for OutputFormat { + type Err =3D String; + + fn from_str(s: &str) -> Result { + match s.to_lowercase().as_str() { + "plain" =3D> Ok(OutputFormat::Plain), + "json" =3D> Ok(OutputFormat::Json), + "rst" =3D> Ok(OutputFormat::Rst), + _ =3D> Err(format!("Unknown output format: {}", s)), + } + } +} + +pub trait OutputFormatter { + fn begin_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>; + fn end_document(&mut self, w: &mut dyn Write) -> std::io::Result<()>; + + fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::i= o::Result<()>; + fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) = -> std::io::Result<()>; + fn end_api_list(&mut self, w: &mut dyn Write) -> std::io::Result<()>; + + fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io:= :Result<()>; + + fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std:= :io::Result<()>; + fn end_api_details(&mut self, w: &mut dyn Write) -> std::io::Result<()= >; + fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::R= esult<()>; + fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::= io::Result<()>; + + fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Resul= t<()>; + fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::= Result<()>; + fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<= ()>; + + fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::= io::Result<()>; + fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::= io::Result<()>; + fn end_parameters(&mut self, w: &mut dyn Write) -> std::io::Result<()>; + + fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std:= :io::Result<()>; + + fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::= Result<()>; + fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::= Result<()>; + fn end_errors(&mut self, w: &mut dyn Write) -> std::io::Result<()>; + + fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::= Result<()>; + fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result= <()>; + fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::= io::Result<()>; + + // Sysfs-specific methods + fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> s= td::io::Result<()>; + fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Re= sult<()>; + fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std= ::io::Result<()>; + + // Networking-specific methods + fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec)= -> std::io::Result<()>; + + fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::= Result<()>; + fn protocol_behavior( + &mut self, + w: &mut dyn Write, + behavior: &ProtocolBehaviorSpec, + ) -> std::io::Result<()>; + fn end_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::Re= sult<()>; + + fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Resul= t<()>; + fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) = -> std::io::Result<()>; + fn end_addr_families(&mut self, w: &mut dyn Write) -> std::io::Result<= ()>; + + fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std= ::io::Result<()>; + fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::= io::Result<()>; + fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std:= :io::Result<()>; + + fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result= <()>; + fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> s= td::io::Result<()>; + fn end_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result<(= )>; + + // Signal-related methods + fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io:= :Result<()>; + fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::i= o::Result<()>; + fn end_signals(&mut self, w: &mut dyn Write) -> std::io::Result<()>; + + fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std= ::io::Result<()>; + fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) ->= std::io::Result<()>; + fn end_signal_masks(&mut self, w: &mut dyn Write) -> std::io::Result<(= )>; + + // Side effects and state transitions + fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std= ::io::Result<()>; + fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) = -> std::io::Result<()>; + fn end_side_effects(&mut self, w: &mut dyn Write) -> std::io::Result<(= )>; + + fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -= > std::io::Result<()>; + fn state_transition( + &mut self, + w: &mut dyn Write, + trans: &StateTransitionSpec, + ) -> std::io::Result<()>; + fn end_state_transitions(&mut self, w: &mut dyn Write) -> std::io::Res= ult<()>; + + // Constraints and locks + fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std:= :io::Result<()>; + fn constraint(&mut self, w: &mut dyn Write, constraint: &ConstraintSpe= c) + -> std::io::Result<()>; + fn end_constraints(&mut self, w: &mut dyn Write) -> std::io::Result<()= >; + + fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::R= esult<()>; + fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Res= ult<()>; + fn end_locks(&mut self, w: &mut dyn Write) -> std::io::Result<()>; + + fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std= ::io::Result<()>; + fn struct_spec(&mut self, w: &mut dyn Write, spec: &StructSpec) -> std= ::io::Result<()>; + fn end_struct_specs(&mut self, w: &mut dyn Write) -> std::io::Result<(= )>; +} + +pub fn create_formatter(format: OutputFormat) -> Box { + match format { + OutputFormat::Plain =3D> Box::new(PlainFormatter::new()), + OutputFormat::Json =3D> Box::new(JsonFormatter::new()), + OutputFormat::Rst =3D> Box::new(RstFormatter::new()), + } +} diff --git a/tools/kapi/src/formatter/plain.rs b/tools/kapi/src/formatter/p= lain.rs new file mode 100644 index 0000000000000..569af9fd7b09b --- /dev/null +++ b/tools/kapi/src/formatter/plain.rs @@ -0,0 +1,549 @@ +use super::OutputFormatter; +use crate::extractor::{ + AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,= ErrorSpec, LockSpec, + ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas= kSpec, SignalSpec, + SocketStateSpec, StateTransitionSpec, +}; +use std::io::Write; + +pub struct PlainFormatter; + +impl PlainFormatter { + pub fn new() -> Self { + PlainFormatter + } +} + +impl OutputFormatter for PlainFormatter { + fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()= > { + Ok(()) + } + + fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::i= o::Result<()> { + writeln!(w, "\n{title}:")?; + writeln!(w, "{}", "-".repeat(title.len() + 1)) + } + + fn api_item(&mut self, w: &mut dyn Write, name: &str, _api_type: &str)= -> std::io::Result<()> { + writeln!(w, " {name}") + } + + fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io:= :Result<()> { + writeln!(w, "\nTotal specifications found: {count}") + } + + fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std:= :io::Result<()> { + writeln!(w, "\nDetailed information for {name}:")?; + writeln!(w, "{}=3D", "=3D".repeat(25 + name.len())) + } + + fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<(= )> { + Ok(()) + } + + fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::R= esult<()> { + writeln!(w, "Description: {desc}") + } + + fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::= io::Result<()> { + writeln!(w, "\nDetailed Description:")?; + writeln!(w, "{desc}") + } + + fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Resul= t<()> { + writeln!(w, "\nExecution Context:") + } + + fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::= Result<()> { + writeln!(w, " - {flag}") + } + + fn end_context_flags(&mut self, _w: &mut dyn Write) -> std::io::Result= <()> { + Ok(()) + } + + fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::= io::Result<()> { + writeln!(w, "\nParameters ({count}):") + } + + fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::= io::Result<()> { + writeln!( + w, + " [{}] {} ({})", + param.index, param.name, param.type_name + )?; + if !param.description.is_empty() { + writeln!(w, " {}", param.description)?; + } + + // Display flags + let mut flags =3D Vec::new(); + if param.flags & 0x01 !=3D 0 { + flags.push("IN"); + } + if param.flags & 0x02 !=3D 0 { + flags.push("OUT"); + } + if param.flags & 0x04 !=3D 0 { + flags.push("INOUT"); + } + if param.flags & 0x08 !=3D 0 { + flags.push("USER"); + } + if param.flags & 0x10 !=3D 0 { + flags.push("OPTIONAL"); + } + if !flags.is_empty() { + writeln!(w, " Flags: {}", flags.join(" | "))?; + } + + // Display constraints + if let Some(constraint) =3D ¶m.constraint { + writeln!(w, " Constraint: {constraint}")?; + } + if let (Some(min), Some(max)) =3D (param.min_value, param.max_valu= e) { + writeln!(w, " Range: {min} to {max}")?; + } + if let Some(mask) =3D param.valid_mask { + writeln!(w, " Valid mask: 0x{mask:x}")?; + } + Ok(()) + } + + fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()= > { + Ok(()) + } + + fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std:= :io::Result<()> { + writeln!(w, "\nReturn Value:")?; + writeln!(w, " Type: {}", ret.type_name)?; + writeln!(w, " {}", ret.description)?; + if let Some(val) =3D ret.success_value { + writeln!(w, " Success value: {val}")?; + } + if let (Some(min), Some(max)) =3D (ret.success_min, ret.success_ma= x) { + writeln!(w, " Success range: {min} to {max}")?; + } + Ok(()) + } + + fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::= Result<()> { + writeln!(w, "\nPossible Errors ({count}):") + } + + fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::= Result<()> { + writeln!(w, " {} ({})", error.name, error.error_code)?; + if !error.condition.is_empty() { + writeln!(w, " Condition: {}", error.condition)?; + } + if !error.description.is_empty() { + writeln!(w, " {}", error.description)?; + } + Ok(()) + } + + fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::= Result<()> { + writeln!(w, "\nExamples:")?; + writeln!(w, "{examples}") + } + + fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result= <()> { + writeln!(w, "\nNotes:")?; + writeln!(w, "{notes}") + } + + fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::= io::Result<()> { + writeln!(w, "\nAvailable since: {version}") + } + + fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> s= td::io::Result<()> { + writeln!(w, "Subsystem: {subsystem}") + } + + fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Re= sult<()> { + writeln!(w, "Sysfs Path: {path}") + } + + fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std= ::io::Result<()> { + writeln!(w, "Permissions: {perms}") + } + + // Networking-specific methods + fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec)= -> std::io::Result<()> { + writeln!(w, "\nSocket State Requirements:")?; + if !state.required_states.is_empty() { + writeln!(w, " Required states: {:?}", state.required_states)?; + } + if !state.forbidden_states.is_empty() { + writeln!(w, " Forbidden states: {:?}", state.forbidden_states= )?; + } + if let Some(result) =3D &state.resulting_state { + writeln!(w, " Resulting state: {result}")?; + } + if let Some(cond) =3D &state.condition { + writeln!(w, " Condition: {cond}")?; + } + if let Some(protos) =3D &state.applicable_protocols { + writeln!(w, " Applicable protocols: {protos}")?; + } + Ok(()) + } + + fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::= Result<()> { + writeln!(w, "\nProtocol-Specific Behaviors:") + } + + fn protocol_behavior( + &mut self, + w: &mut dyn Write, + behavior: &ProtocolBehaviorSpec, + ) -> std::io::Result<()> { + writeln!( + w, + " {} - {}", + behavior.applicable_protocols, behavior.behavior + )?; + if let Some(flags) =3D &behavior.protocol_flags { + writeln!(w, " Flags: {flags}")?; + } + Ok(()) + } + + fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::R= esult<()> { + Ok(()) + } + + fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Resul= t<()> { + writeln!(w, "\nSupported Address Families:") + } + + fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) = -> std::io::Result<()> { + writeln!(w, " {} ({}):", family.family_name, family.family)?; + writeln!(w, " Struct size: {} bytes", family.addr_struct_size)?; + writeln!( + w, + " Address length: {}-{} bytes", + family.min_addr_len, family.max_addr_len + )?; + if let Some(format) =3D &family.addr_format { + writeln!(w, " Format: {format}")?; + } + writeln!( + w, + " Features: wildcard=3D{}, multicast=3D{}, broadcast=3D{}", + family.supports_wildcard, family.supports_multicast, family.su= pports_broadcast + )?; + if let Some(special) =3D &family.special_addresses { + writeln!(w, " Special addresses: {special}")?; + } + if family.port_range_max > 0 { + writeln!( + w, + " Port range: {}-{}", + family.port_range_min, family.port_range_max + )?; + } + Ok(()) + } + + fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result= <()> { + Ok(()) + } + + fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std= ::io::Result<()> { + writeln!(w, "\nBuffer Specification:")?; + if let Some(behaviors) =3D &spec.buffer_behaviors { + writeln!(w, " Behaviors: {behaviors}")?; + } + if let Some(min) =3D spec.min_buffer_size { + writeln!(w, " Min size: {min} bytes")?; + } + if let Some(max) =3D spec.max_buffer_size { + writeln!(w, " Max size: {max} bytes")?; + } + if let Some(optimal) =3D spec.optimal_buffer_size { + writeln!(w, " Optimal size: {optimal} bytes")?; + } + Ok(()) + } + + fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::= io::Result<()> { + writeln!(w, "\nAsynchronous Operation:")?; + if let Some(modes) =3D &spec.supported_modes { + writeln!(w, " Supported modes: {modes}")?; + } + if let Some(errno) =3D spec.nonblock_errno { + writeln!(w, " Non-blocking errno: {errno}")?; + } + Ok(()) + } + + fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std:= :io::Result<()> { + writeln!(w, "\nNetwork Data Transfer: {desc}") + } + + fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result= <()> { + writeln!(w, "\nRequired Capabilities:") + } + + fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> s= td::io::Result<()> { + writeln!(w, " {} ({}) - {}", cap.name, cap.capability, cap.action= )?; + if !cap.allows.is_empty() { + writeln!(w, " Allows: {}", cap.allows)?; + } + if !cap.without_cap.is_empty() { + writeln!(w, " Without capability: {}", cap.without_cap)?; + } + if let Some(cond) =3D &cap.check_condition { + writeln!(w, " Condition: {cond}")?; + } + Ok(()) + } + + fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + // Signal-related methods + fn begin_signals(&mut self, w: &mut dyn Write, count: u32) -> std::io:= :Result<()> { + writeln!(w, "\nSignal Specifications ({count}):") + } + + fn signal(&mut self, w: &mut dyn Write, signal: &SignalSpec) -> std::i= o::Result<()> { + write!(w, " {} ({})", signal.signal_name, signal.signal_num)?; + + // Display direction + let direction =3D match signal.direction { + 0 =3D> "SEND", + 1 =3D> "RECEIVE", + 2 =3D> "HANDLE", + 3 =3D> "IGNORE", + _ =3D> "UNKNOWN", + }; + write!(w, " - {direction}")?; + + // Display action + let action =3D match signal.action { + 0 =3D> "DEFAULT", + 1 =3D> "TERMINATE", + 2 =3D> "COREDUMP", + 3 =3D> "STOP", + 4 =3D> "CONTINUE", + 5 =3D> "IGNORE", + 6 =3D> "CUSTOM", + 7 =3D> "DISCARD", + _ =3D> "UNKNOWN", + }; + writeln!(w, " - {action}")?; + + if let Some(target) =3D &signal.target { + writeln!(w, " Target: {target}")?; + } + if let Some(condition) =3D &signal.condition { + writeln!(w, " Condition: {condition}")?; + } + if let Some(desc) =3D &signal.description { + writeln!(w, " {desc}")?; + } + + // Display timing + let timing =3D match signal.timing { + 0 =3D> "BEFORE", + 1 =3D> "DURING", + 2 =3D> "AFTER", + 3 =3D> "EXIT", + _ =3D> "UNKNOWN", + }; + writeln!(w, " Timing: {timing}")?; + writeln!(w, " Priority: {}", signal.priority)?; + + if signal.restartable { + writeln!(w, " Restartable: yes")?; + } + if signal.interruptible { + writeln!(w, " Interruptible: yes")?; + } + if let Some(error) =3D signal.error_on_signal { + writeln!(w, " Error on signal: {error}")?; + } + Ok(()) + } + + fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_signal_masks(&mut self, w: &mut dyn Write, count: u32) -> std= ::io::Result<()> { + writeln!(w, "\nSignal Masks ({count}):") + } + + fn signal_mask(&mut self, w: &mut dyn Write, mask: &SignalMaskSpec) ->= std::io::Result<()> { + writeln!(w, " {}", mask.name)?; + if !mask.description.is_empty() { + writeln!(w, " {}", mask.description)?; + } + Ok(()) + } + + fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + // Side effects and state transitions + fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std= ::io::Result<()> { + writeln!(w, "\nSide Effects ({count}):") + } + + fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) = -> std::io::Result<()> { + writeln!(w, " {} - {}", effect.target, effect.description)?; + if let Some(condition) =3D &effect.condition { + writeln!(w, " Condition: {condition}")?; + } + if effect.reversible { + writeln!(w, " Reversible: yes")?; + } + Ok(()) + } + + fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -= > std::io::Result<()> { + writeln!(w, "\nState Transitions ({count}):") + } + + fn state_transition( + &mut self, + w: &mut dyn Write, + trans: &StateTransitionSpec, + ) -> std::io::Result<()> { + writeln!( + w, + " {} : {} -> {}", + trans.object, trans.from_state, trans.to_state + )?; + if let Some(condition) =3D &trans.condition { + writeln!(w, " Condition: {condition}")?; + } + if !trans.description.is_empty() { + writeln!(w, " {}", trans.description)?; + } + Ok(()) + } + + fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Re= sult<()> { + Ok(()) + } + + // Constraints and locks + fn begin_constraints(&mut self, w: &mut dyn Write, count: u32) -> std:= :io::Result<()> { + writeln!(w, "\nAdditional Constraints ({count}):") + } + + fn constraint( + &mut self, + w: &mut dyn Write, + constraint: &ConstraintSpec, + ) -> std::io::Result<()> { + writeln!(w, " {}", constraint.name)?; + if !constraint.description.is_empty() { + writeln!(w, " {}", constraint.description)?; + } + if let Some(expr) =3D &constraint.expression { + writeln!(w, " Expression: {expr}")?; + } + Ok(()) + } + + fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<(= )> { + Ok(()) + } + + fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::R= esult<()> { + writeln!(w, "\nLocking Requirements ({count}):") + } + + fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Res= ult<()> { + write!(w, " {}", lock.lock_name)?; + + // Display lock type + let lock_type =3D match lock.lock_type { + 0 =3D> "NONE", + 1 =3D> "MUTEX", + 2 =3D> "SPINLOCK", + 3 =3D> "RWLOCK", + 4 =3D> "SEQLOCK", + 5 =3D> "RCU", + 6 =3D> "SEMAPHORE", + 7 =3D> "CUSTOM", + _ =3D> "UNKNOWN", + }; + writeln!(w, " ({lock_type})")?; + + let scope_str =3D match lock.scope { + 0 =3D> "acquired and released", + 1 =3D> "acquired (not released)", + 2 =3D> "released (held on entry)", + 3 =3D> "held by caller", + _ =3D> "unknown", + }; + writeln!(w, " Scope: {scope_str}")?; + + if !lock.description.is_empty() { + writeln!(w, " {}", lock.description)?; + } + Ok(()) + } + + fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_struct_specs(&mut self, w: &mut dyn Write, count: u32) -> std= ::io::Result<()> { + writeln!(w, "\nStructure Specifications ({count}):") + } + + fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::= StructSpec) -> std::io::Result<()> { + writeln!(w, " {} (size=3D{}, align=3D{}):", spec.name, spec.size,= spec.alignment)?; + if !spec.description.is_empty() { + writeln!(w, " {}", spec.description)?; + } + + if !spec.fields.is_empty() { + writeln!(w, " Fields ({}):", spec.field_count)?; + for field in &spec.fields { + write!(w, " - {} ({}):", field.name, field.type_nam= e)?; + if !field.description.is_empty() { + write!(w, " {}", field.description)?; + } + writeln!(w)?; + + // Show constraints if present + if field.min_value !=3D 0 || field.max_value !=3D 0 { + writeln!(w, " Range: [{}, {}]", field.min_val= ue, field.max_value)?; + } + if field.valid_mask !=3D 0 { + writeln!(w, " Mask: {:#x}", field.valid_mask)= ?; + } + } + } + Ok(()) + } + + fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } +} diff --git a/tools/kapi/src/formatter/rst.rs b/tools/kapi/src/formatter/rst= .rs new file mode 100644 index 0000000000000..51d0be911480b --- /dev/null +++ b/tools/kapi/src/formatter/rst.rs @@ -0,0 +1,621 @@ +use super::OutputFormatter; +use crate::extractor::{ + AddrFamilySpec, AsyncSpec, BufferSpec, CapabilitySpec, ConstraintSpec,= ErrorSpec, LockSpec, + ParamSpec, ProtocolBehaviorSpec, ReturnSpec, SideEffectSpec, SignalMas= kSpec, SignalSpec, + SocketStateSpec, StateTransitionSpec, +}; +use std::io::Write; + +pub struct RstFormatter { + current_section_level: usize, +} + +impl RstFormatter { + pub fn new() -> Self { + RstFormatter { + current_section_level: 0, + } + } + + fn section_char(level: usize) -> char { + match level { + 0 =3D> '=3D', + 1 =3D> '-', + 2 =3D> '~', + 3 =3D> '^', + _ =3D> '"', + } + } +} + +impl OutputFormatter for RstFormatter { + fn begin_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()= > { + Ok(()) + } + + fn end_document(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_api_list(&mut self, w: &mut dyn Write, title: &str) -> std::i= o::Result<()> { + writeln!(w, "\n{title}")?; + writeln!( + w, + "{}", + Self::section_char(0).to_string().repeat(title.len()) + )?; + writeln!(w) + } + + fn api_item(&mut self, w: &mut dyn Write, name: &str, api_type: &str) = -> std::io::Result<()> { + writeln!(w, "* **{name}** (*{api_type}*)") + } + + fn end_api_list(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn total_specs(&mut self, w: &mut dyn Write, count: usize) -> std::io:= :Result<()> { + writeln!(w, "\n**Total specifications found:** {count}") + } + + fn begin_api_details(&mut self, w: &mut dyn Write, name: &str) -> std:= :io::Result<()> { + self.current_section_level =3D 0; + writeln!(w, "\n{name}")?; + writeln!( + w, + "{}", + Self::section_char(0).to_string().repeat(name.len()) + )?; + writeln!(w) + } + + fn end_api_details(&mut self, _w: &mut dyn Write) -> std::io::Result<(= )> { + Ok(()) + } + + fn description(&mut self, w: &mut dyn Write, desc: &str) -> std::io::R= esult<()> { + writeln!(w, "**{desc}**")?; + writeln!(w) + } + + fn long_description(&mut self, w: &mut dyn Write, desc: &str) -> std::= io::Result<()> { + writeln!(w, "{desc}")?; + writeln!(w) + } + + fn begin_context_flags(&mut self, w: &mut dyn Write) -> std::io::Resul= t<()> { + self.current_section_level =3D 1; + let title =3D "Execution Context"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w) + } + + fn context_flag(&mut self, w: &mut dyn Write, flag: &str) -> std::io::= Result<()> { + writeln!(w, "* {flag}") + } + + fn end_context_flags(&mut self, w: &mut dyn Write) -> std::io::Result<= ()> { + writeln!(w) + } + + fn begin_parameters(&mut self, w: &mut dyn Write, count: u32) -> std::= io::Result<()> { + self.current_section_level =3D 1; + let title =3D format!("Parameters ({count})"); + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w) + } + + fn end_parameters(&mut self, _w: &mut dyn Write) -> std::io::Result<()= > { + Ok(()) + } + + fn begin_errors(&mut self, w: &mut dyn Write, count: u32) -> std::io::= Result<()> { + self.current_section_level =3D 1; + let title =3D format!("Possible Errors ({count})"); + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w) + } + + fn end_errors(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn examples(&mut self, w: &mut dyn Write, examples: &str) -> std::io::= Result<()> { + self.current_section_level =3D 1; + let title =3D "Examples"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w)?; + writeln!(w, ".. code-block:: c")?; + writeln!(w)?; + for line in examples.lines() { + writeln!(w, " {line}")?; + } + writeln!(w) + } + + fn notes(&mut self, w: &mut dyn Write, notes: &str) -> std::io::Result= <()> { + self.current_section_level =3D 1; + let title =3D "Notes"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w)?; + writeln!(w, "{notes}")?; + writeln!(w) + } + + fn since_version(&mut self, w: &mut dyn Write, version: &str) -> std::= io::Result<()> { + writeln!(w, ":Available since: {version}")?; + writeln!(w) + } + + fn sysfs_subsystem(&mut self, w: &mut dyn Write, subsystem: &str) -> s= td::io::Result<()> { + writeln!(w, ":Subsystem: {subsystem}")?; + writeln!(w) + } + + fn sysfs_path(&mut self, w: &mut dyn Write, path: &str) -> std::io::Re= sult<()> { + writeln!(w, ":Sysfs Path: {path}")?; + writeln!(w) + } + + fn sysfs_permissions(&mut self, w: &mut dyn Write, perms: &str) -> std= ::io::Result<()> { + writeln!(w, ":Permissions: {perms}")?; + writeln!(w) + } + + // Networking-specific methods + fn socket_state(&mut self, w: &mut dyn Write, state: &SocketStateSpec)= -> std::io::Result<()> { + self.current_section_level =3D 1; + let title =3D "Socket State Requirements"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w)?; + + if !state.required_states.is_empty() { + writeln!( + w, + "**Required states:** {}", + state.required_states.join(", ") + )?; + } + if !state.forbidden_states.is_empty() { + writeln!( + w, + "**Forbidden states:** {}", + state.forbidden_states.join(", ") + )?; + } + if let Some(result) =3D &state.resulting_state { + writeln!(w, "**Resulting state:** {result}")?; + } + if let Some(cond) =3D &state.condition { + writeln!(w, "**Condition:** {cond}")?; + } + if let Some(protos) =3D &state.applicable_protocols { + writeln!(w, "**Applicable protocols:** {protos}")?; + } + writeln!(w) + } + + fn begin_protocol_behaviors(&mut self, w: &mut dyn Write) -> std::io::= Result<()> { + self.current_section_level =3D 1; + let title =3D "Protocol-Specific Behaviors"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w) + } + + fn protocol_behavior( + &mut self, + w: &mut dyn Write, + behavior: &ProtocolBehaviorSpec, + ) -> std::io::Result<()> { + writeln!(w, "**{}**", behavior.applicable_protocols)?; + writeln!(w)?; + writeln!(w, "{}", behavior.behavior)?; + if let Some(flags) =3D &behavior.protocol_flags { + writeln!(w)?; + writeln!(w, "*Flags:* {flags}")?; + } + writeln!(w) + } + + fn end_protocol_behaviors(&mut self, _w: &mut dyn Write) -> std::io::R= esult<()> { + Ok(()) + } + + fn begin_addr_families(&mut self, w: &mut dyn Write) -> std::io::Resul= t<()> { + self.current_section_level =3D 1; + let title =3D "Supported Address Families"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w) + } + + fn addr_family(&mut self, w: &mut dyn Write, family: &AddrFamilySpec) = -> std::io::Result<()> { + writeln!(w, "**{} ({})**", family.family_name, family.family)?; + writeln!(w)?; + writeln!(w, "* **Struct size:** {} bytes", family.addr_struct_size= )?; + writeln!( + w, + "* **Address length:** {}-{} bytes", + family.min_addr_len, family.max_addr_len + )?; + if let Some(format) =3D &family.addr_format { + writeln!(w, "* **Format:** ``{format}``")?; + } + writeln!( + w, + "* **Features:** wildcard=3D{}, multicast=3D{}, broadcast=3D{}= ", + family.supports_wildcard, family.supports_multicast, family.su= pports_broadcast + )?; + if let Some(special) =3D &family.special_addresses { + writeln!(w, "* **Special addresses:** {special}")?; + } + if family.port_range_max > 0 { + writeln!( + w, + "* **Port range:** {}-{}", + family.port_range_min, family.port_range_max + )?; + } + writeln!(w) + } + + fn end_addr_families(&mut self, _w: &mut dyn Write) -> std::io::Result= <()> { + Ok(()) + } + + fn buffer_spec(&mut self, w: &mut dyn Write, spec: &BufferSpec) -> std= ::io::Result<()> { + self.current_section_level =3D 1; + let title =3D "Buffer Specification"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w)?; + + if let Some(behaviors) =3D &spec.buffer_behaviors { + writeln!(w, "**Behaviors:** {behaviors}")?; + } + if let Some(min) =3D spec.min_buffer_size { + writeln!(w, "**Min size:** {min} bytes")?; + } + if let Some(max) =3D spec.max_buffer_size { + writeln!(w, "**Max size:** {max} bytes")?; + } + if let Some(optimal) =3D spec.optimal_buffer_size { + writeln!(w, "**Optimal size:** {optimal} bytes")?; + } + writeln!(w) + } + + fn async_spec(&mut self, w: &mut dyn Write, spec: &AsyncSpec) -> std::= io::Result<()> { + self.current_section_level =3D 1; + let title =3D "Asynchronous Operation"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w)?; + + if let Some(modes) =3D &spec.supported_modes { + writeln!(w, "**Supported modes:** {modes}")?; + } + if let Some(errno) =3D spec.nonblock_errno { + writeln!(w, "**Non-blocking errno:** {errno}")?; + } + writeln!(w) + } + + fn net_data_transfer(&mut self, w: &mut dyn Write, desc: &str) -> std:= :io::Result<()> { + writeln!(w, "**Network Data Transfer:** {desc}")?; + writeln!(w) + } + + fn begin_capabilities(&mut self, w: &mut dyn Write) -> std::io::Result= <()> { + self.current_section_level =3D 1; + let title =3D "Required Capabilities"; + writeln!(w, "{title}")?; + writeln!( + w, + "{}", + Self::section_char(1).to_string().repeat(title.len()) + )?; + writeln!(w) + } + + fn capability(&mut self, w: &mut dyn Write, cap: &CapabilitySpec) -> s= td::io::Result<()> { + writeln!(w, "**{} ({})** - {}", cap.name, cap.capability, cap.acti= on)?; + writeln!(w)?; + if !cap.allows.is_empty() { + writeln!(w, "* **Allows:** {}", cap.allows)?; + } + if !cap.without_cap.is_empty() { + writeln!(w, "* **Without capability:** {}", cap.without_cap)?; + } + if let Some(cond) =3D &cap.check_condition { + writeln!(w, "* **Condition:** {}", cond)?; + } + writeln!(w) + } + + fn end_capabilities(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + // Stub implementations for new methods + fn parameter(&mut self, w: &mut dyn Write, param: &ParamSpec) -> std::= io::Result<()> { + writeln!( + w, + "**[{}] {}** (*{}*)", + param.index, param.name, param.type_name + )?; + writeln!(w)?; + writeln!(w, " {}", param.description)?; + + // Display flags + let mut flags =3D Vec::new(); + if param.flags & 0x01 !=3D 0 { + flags.push("IN"); + } + if param.flags & 0x02 !=3D 0 { + flags.push("OUT"); + } + if param.flags & 0x04 !=3D 0 { + flags.push("USER"); + } + if param.flags & 0x08 !=3D 0 { + flags.push("OPTIONAL"); + } + if !flags.is_empty() { + writeln!(w, " :Flags: {}", flags.join(", "))?; + } + + if let Some(constraint) =3D ¶m.constraint { + writeln!(w, " :Constraint: {}", constraint)?; + } + + if let (Some(min), Some(max)) =3D (param.min_value, param.max_valu= e) { + writeln!(w, " :Range: {} to {}", min, max)?; + } + + writeln!(w) + } + + fn return_spec(&mut self, w: &mut dyn Write, ret: &ReturnSpec) -> std:= :io::Result<()> { + writeln!(w, "\nReturn Value")?; + writeln!(w, "{}\n", Self::section_char(1).to_string().repeat(12))?; + writeln!(w)?; + writeln!(w, ":Type: {}", ret.type_name)?; + writeln!(w, ":Description: {}", ret.description)?; + if let Some(success) =3D ret.success_value { + writeln!(w, ":Success value: {}", success)?; + } + writeln!(w) + } + + fn error(&mut self, w: &mut dyn Write, error: &ErrorSpec) -> std::io::= Result<()> { + writeln!(w, "**{}** ({})", error.name, error.error_code)?; + writeln!(w)?; + writeln!(w, " :Condition: {}", error.condition)?; + if !error.description.is_empty() { + writeln!(w, " :Description: {}", error.description)?; + } + writeln!(w) + } + + fn begin_signals(&mut self, _w: &mut dyn Write, _count: u32) -> std::i= o::Result<()> { + Ok(()) + } + + fn signal(&mut self, _w: &mut dyn Write, _signal: &SignalSpec) -> std:= :io::Result<()> { + Ok(()) + } + + fn end_signals(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_signal_masks(&mut self, _w: &mut dyn Write, _count: u32) -> s= td::io::Result<()> { + Ok(()) + } + + fn signal_mask(&mut self, _w: &mut dyn Write, _mask: &SignalMaskSpec) = -> std::io::Result<()> { + Ok(()) + } + + fn end_signal_masks(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + fn begin_side_effects(&mut self, w: &mut dyn Write, count: u32) -> std= ::io::Result<()> { + self.current_section_level =3D 1; + let title =3D format!("Side Effects ({count})"); + writeln!(w, "{}\n", title)?; + writeln!( + w, + "{}\n", + Self::section_char(1).to_string().repeat(title.len()) + ) + } + + fn side_effect(&mut self, w: &mut dyn Write, effect: &SideEffectSpec) = -> std::io::Result<()> { + write!(w, "* **{}**", effect.target)?; + if effect.reversible { + write!(w, " *(reversible)*")?; + } + writeln!(w)?; + writeln!(w, " {}", effect.description)?; + if let Some(cond) =3D &effect.condition { + writeln!(w, " :Condition: {}", cond)?; + } + writeln!(w) + } + + fn end_side_effects(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } + + fn begin_state_transitions(&mut self, w: &mut dyn Write, count: u32) -= > std::io::Result<()> { + self.current_section_level =3D 1; + let title =3D format!("State Transitions ({count})"); + writeln!(w, "{}\n", title)?; + writeln!( + w, + "{}\n", + Self::section_char(1).to_string().repeat(title.len()) + ) + } + + fn state_transition( + &mut self, + w: &mut dyn Write, + trans: &StateTransitionSpec, + ) -> std::io::Result<()> { + writeln!( + w, + "* **{}**: {} =E2=86=92 {}", + trans.object, trans.from_state, trans.to_state + )?; + writeln!(w, " {}", trans.description)?; + if let Some(cond) =3D &trans.condition { + writeln!(w, " :Condition: {}", cond)?; + } + writeln!(w) + } + + fn end_state_transitions(&mut self, _w: &mut dyn Write) -> std::io::Re= sult<()> { + Ok(()) + } + + fn begin_constraints(&mut self, _w: &mut dyn Write, _count: u32) -> st= d::io::Result<()> { + Ok(()) + } + + fn constraint( + &mut self, + _w: &mut dyn Write, + _constraint: &ConstraintSpec, + ) -> std::io::Result<()> { + Ok(()) + } + + fn end_constraints(&mut self, _w: &mut dyn Write) -> std::io::Result<(= )> { + Ok(()) + } + + fn begin_locks(&mut self, w: &mut dyn Write, count: u32) -> std::io::R= esult<()> { + self.current_section_level =3D 1; + let title =3D format!("Locks ({count})"); + writeln!(w, "{}\n", title)?; + writeln!( + w, + "{}\n", + Self::section_char(1).to_string().repeat(title.len()) + ) + } + + fn lock(&mut self, w: &mut dyn Write, lock: &LockSpec) -> std::io::Res= ult<()> { + write!(w, "* **{}**", lock.lock_name)?; + let lock_type_str =3D match lock.lock_type { + 1 =3D> " *(mutex)*", + 2 =3D> " *(spinlock)*", + 3 =3D> " *(rwlock)*", + 4 =3D> " *(semaphore)*", + 5 =3D> " *(RCU)*", + _ =3D> "", + }; + writeln!(w, "{}", lock_type_str)?; + if !lock.description.is_empty() { + writeln!(w, " {}", lock.description)?; + } + writeln!(w) + } + + fn end_locks(&mut self, _w: &mut dyn Write) -> std::io::Result<()> { + Ok(()) + } + + fn begin_struct_specs(&mut self, w: &mut dyn Write, _count: u32) -> st= d::io::Result<()> { + writeln!(w)?; + writeln!(w, "Structure Specifications")?; + writeln!(w, "~~~~~~~~~~~~~~~~~~~~~~~")?; + writeln!(w) + } + + fn struct_spec(&mut self, w: &mut dyn Write, spec: &crate::extractor::= StructSpec) -> std::io::Result<()> { + writeln!(w, "**{}**", spec.name)?; + writeln!(w)?; + + if !spec.description.is_empty() { + writeln!(w, " {}", spec.description)?; + writeln!(w)?; + } + + writeln!(w, " :Size: {} bytes", spec.size)?; + writeln!(w, " :Alignment: {} bytes", spec.alignment)?; + writeln!(w, " :Fields: {}", spec.field_count)?; + writeln!(w)?; + + if !spec.fields.is_empty() { + for field in &spec.fields { + writeln!(w, " * **{}** ({})", field.name, field.type_name= )?; + if !field.description.is_empty() { + writeln!(w, " {}", field.description)?; + } + if field.min_value !=3D 0 || field.max_value !=3D 0 { + writeln!(w, " Range: [{}, {}]", field.min_value, fi= eld.max_value)?; + } + } + writeln!(w)?; + } + + Ok(()) + } + + fn end_struct_specs(&mut self, _w: &mut dyn Write) -> std::io::Result<= ()> { + Ok(()) + } +} diff --git a/tools/kapi/src/main.rs b/tools/kapi/src/main.rs new file mode 100644 index 0000000000000..2d219046f3287 --- /dev/null +++ b/tools/kapi/src/main.rs @@ -0,0 +1,116 @@ +//! kapi - Kernel API Specification Tool +//! +//! This tool extracts and displays kernel API specifications from multipl= e sources: +//! - Kernel source code (KAPI macros) +//! - Compiled vmlinux binaries (`.kapi_specs` ELF section) +//! - Running kernel via debugfs + +use anyhow::Result; +use clap::Parser; +use std::io::{self, Write}; + +mod extractor; +mod formatter; + +use extractor::{ApiExtractor, DebugfsExtractor, SourceExtractor, VmlinuxEx= tractor}; +use formatter::{OutputFormat, create_formatter}; + +#[derive(Parser, Debug)] +#[command(author, version, about, long_about =3D None)] +struct Args { + /// Path to the vmlinux file + #[arg(long, value_name =3D "PATH", group =3D "input")] + vmlinux: Option, + + /// Path to kernel source directory or file + #[arg(long, value_name =3D "PATH", group =3D "input")] + source: Option, + + /// Path to debugfs (defaults to /sys/kernel/debug if not specified) + #[arg(long, value_name =3D "PATH", group =3D "input")] + debugfs: Option, + + /// Optional: Name of specific API to show details for + api_name: Option, + + /// Output format + #[arg(long, short =3D 'f', default_value =3D "plain")] + format: String, +} + +fn main() -> Result<()> { + let args =3D Args::parse(); + + let output_format: OutputFormat =3D args + .format + .parse() + .map_err(|e: String| anyhow::anyhow!(e))?; + + let extractor: Box =3D match (args.vmlinux, args.sou= rce, args.debugfs.clone()) { + (Some(vmlinux_path), None, None) =3D> Box::new(VmlinuxExtractor::n= ew(&vmlinux_path)?), + (None, Some(source_path), None) =3D> Box::new(SourceExtractor::new= (&source_path)?), + (None, None, Some(_) | None) =3D> { + // If debugfs is specified or no input is provided, use debugfs + Box::new(DebugfsExtractor::new(args.debugfs)?) + } + _ =3D> { + anyhow::bail!("Please specify only one of --vmlinux, --source,= or --debugfs") + } + }; + + display_apis(extractor.as_ref(), args.api_name, output_format) +} + +fn display_apis( + extractor: &dyn ApiExtractor, + api_name: Option, + output_format: OutputFormat, +) -> Result<()> { + let mut formatter =3D create_formatter(output_format); + let mut stdout =3D io::stdout(); + + formatter.begin_document(&mut stdout)?; + + if let Some(api_name_req) =3D api_name { + // Use the extractor to display API details + if let Some(_spec) =3D extractor.extract_by_name(&api_name_req)? { + extractor.display_api_details(&api_name_req, &mut *formatter, = &mut stdout)?; + } else if output_format =3D=3D OutputFormat::Plain { + writeln!(stdout, "\nAPI '{}' not found.", api_name_req)?; + writeln!(stdout, "\nAvailable APIs:")?; + for spec in extractor.extract_all()? { + writeln!(stdout, " {} ({})", spec.name, spec.api_type)?; + } + } + } else { + // Display list of APIs using the extractor + let all_specs =3D extractor.extract_all()?; + + // Helper to display API list for a specific type + let mut display_api_type =3D |api_type: &str, title: &str| -> Resu= lt<()> { + let filtered: Vec<_> =3D all_specs.iter() + .filter(|s| s.api_type =3D=3D api_type) + .collect(); + + if !filtered.is_empty() { + formatter.begin_api_list(&mut stdout, title)?; + for spec in filtered { + formatter.api_item(&mut stdout, &spec.name, &spec.api_= type)?; + } + formatter.end_api_list(&mut stdout)?; + } + Ok(()) + }; + + display_api_type("syscall", "System Calls")?; + display_api_type("ioctl", "IOCTLs")?; + display_api_type("function", "Functions")?; + display_api_type("sysfs", "Sysfs Attributes")?; + + formatter.total_specs(&mut stdout, all_specs.len())?; + } + + formatter.end_document(&mut stdout)?; + + Ok(()) +} --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE91A2E62B4; Thu, 18 Dec 2025 20:42:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090571; cv=none; b=sRJY1uR2QIFk+4TCQ+7oi6AuN33hF+zm4g1JJGguyh/bUpirZ18ts7dj+Dk7OoK5nSBWo0sgpt4MagEoqYHnNFNQKhlZNVmXiwmaLWCwCLGaifkneBHaAoyloEXAIa2V5+OxwHJjWsQXpDrtnyu3w64adlpbNwGb95swcv0urFA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090571; c=relaxed/simple; bh=201PmRgYE1b+3daKnu5U4am8WX5QK9hEVu07e1v1kmc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FlNTNwvHiidMPz+z4hEsthK3TjOhiD9uUE3ElU8YrzUZdKTvd/bN40QR6iln5yi/HK8MTCXPb5k53rV3ePBxnm4+GVQr8MKKw1FvWZH1en9QOLbkxfO74AUdGLp6RbLz8rtzNpyXvWaVAIO2pzPBVF2iPgrmOSuzncKk6ZYGkQU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=j5IIw4FP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="j5IIw4FP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24998C113D0; Thu, 18 Dec 2025 20:42:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090570; bh=201PmRgYE1b+3daKnu5U4am8WX5QK9hEVu07e1v1kmc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=j5IIw4FPtkgCPI1cSTQmj9p+iSdliZmfR/xE2E1Ky0X3S46Uq8fzzwFSrrAd0dvvv JZwydI5K9hjV42sQ2tRohvaKMNS25d3SH5JA8cFxwFxrYGXMgQwxkvz6BqWrg6dgZW Ho1YUGSFgN5U0iDrSxapM7l2awMnaOtDYIsF3mmYPmMN1JrvhjWflW53w9Y5vt4+sb rtqlUlecHoHMDw+AqBBrA3Gkr3fWPP19a/D2USVx++wO5YjscNDcX5tNZytIW4NRoK MA3QKtCQWwtAKpzWIyM+omIXlTh6e8lWrbWPTkVJE5qL6Iboj2JHGBuBqfDaenC10e vaYdebaL1YfGQ== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 05/15] kernel/api: add API specification for io_setup Date: Thu, 18 Dec 2025 15:42:27 -0500 Message-ID: <20251218204239.4159453-6-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/aio.c | 228 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 216 insertions(+), 12 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 0a23a8c0717ff..36556e7a8e2c0 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1366,18 +1366,222 @@ static long read_events(struct kioctx *ctx, long m= in_nr, long nr, return ret; } =20 -/* sys_io_setup: - * Create an aio_context capable of receiving at least nr_events. - * ctxp must not point to an aio_context that already exists, and - * must be initialized to 0 prior to the call. On successful - * creation of the aio_context, *ctxp is filled in with the resulting=20 - * handle. May fail with -EINVAL if *ctxp is not initialized, - * if the specified nr_events exceeds internal limits. May fail=20 - * with -EAGAIN if the specified nr_events exceeds the user's limit=20 - * of available events. May fail with -ENOMEM if insufficient kernel - * resources are available. May fail with -EFAULT if an invalid - * pointer is passed for ctxp. Will fail with -ENOSYS if not - * implemented. +/** + * sys_io_setup - Create an asynchronous I/O context + * @nr_events: Minimum number of concurrent AIO operations the context sho= uld support + * @ctxp: Pointer to aio_context_t variable to receive the context handle + * + * long-desc: Creates an asynchronous I/O context capable of receiving at = least + * nr_events concurrent operations. The context handle is returned via c= txp, + * which must be initialized to 0 prior to the call. The returned context + * handle is used with subsequent AIO operations (io_submit, io_getevent= s, + * io_cancel, io_destroy). + * + * The AIO context consists of a memory-mapped ring buffer shared between + * kernel and userspace for efficient completion notification. The kernel + * internally allocates more capacity than requested to account for perc= pu + * batching (approximately nr_events * 2, but at least num_cpus * 8). + * + * The context is bound to the calling process and cannot be shared acro= ss + * processes. Each process can have multiple AIO contexts, limited only = by + * the system-wide aio-max-nr sysctl. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: nr_events + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 1, 8388608 + * constraint: Must be greater than 0. Internal limit of approximately 8= M events + * prevents overflow when calculating ring buffer size (0x10000000 / 3= 2 bytes + * per io_event). The kernel may allocate more capacity than requested= to + * optimize for percpu batching. + * + * param: ctxp + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_INOUT | KAPI_PARAM_USER + * size: sizeof(aio_context_t) + * constraint-type: KAPI_CONSTRAINT_USER_PTR + * constraint: Must be a valid userspace pointer to an aio_context_t var= iable. + * The memory pointed to MUST be initialized to 0 before the call. On = success, + * receives the context handle (actually the mmap address of the ring = buffer). + * The context handle is opaque and should not be interpreted by users= pace + * except to pass to other io_* syscalls. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_ERROR_CHECK + * success: 0 + * desc: Returns 0 on success. On success, *ctxp contains the new contex= t handle. + * + * error: EFAULT, Invalid pointer + * desc: The ctxp pointer is invalid, not accessible, or points to memor= y that + * cannot be read or written. Returned from get_user() when reading the + * initial value or from put_user() when storing the context handle. + * + * error: EINVAL, Invalid parameter + * desc: Either *ctxp is not initialized to 0 (indicating an existing co= ntext or + * uninitialized memory), or nr_events is 0, or nr_events is too large= causing + * internal overflow when calculating ring buffer size. The internal l= imit is + * approximately 0x10000000 / sizeof(struct io_event) events. + * + * error: EAGAIN, Resource limit exceeded + * desc: The system-wide limit on AIO contexts would be exceeded. The li= mit is + * controlled by /proc/sys/fs/aio-max-nr (default 65536). Each context= counts + * as nr_events toward this limit. Also returned if nr_events exceeds = the + * current aio-max-nr value. Unlike ENOMEM, this error indicates a pol= icy + * limit rather than physical resource exhaustion. + * + * error: ENOMEM, Insufficient memory + * desc: Kernel could not allocate required memory for the AIO context. = This + * includes the kioctx structure, percpu data, ring buffer pages, or t= he + * anonymous file backing the ring buffer. Also returned if the kernel= could + * not establish the memory mapping for the ring buffer, or if ioctx_t= able + * expansion failed. + * + * error: EINTR, Interrupted by signal + * desc: A fatal signal was received while attempting to acquire the mma= p_lock + * for the ring buffer memory mapping. The operation was aborted befor= e any + * state was modified. Only fatal signals (SIGKILL) can cause this err= or; + * normal signals like SIGINT do not interrupt the operation. + * + * lock: aio_nr_lock + * type: KAPI_LOCK_SPINLOCK + * desc: Global spinlock protecting the system-wide aio_nr counter. Held= briefly + * to check and update the system-wide AIO context count. + * + * lock: mm->ioctx_lock + * type: KAPI_LOCK_SPINLOCK + * desc: Per-mm spinlock protecting the ioctx_table. Held while adding t= he new + * context to the process's AIO context table. + * + * lock: ctx->ring_lock + * type: KAPI_LOCK_MUTEX + * desc: Per-context mutex protecting ring buffer setup. Held throughout= context + * initialization to prevent page migration during setup, then release= d once + * the context is fully initialized. + * + * lock: mmap_lock + * type: KAPI_LOCK_RWLOCK + * desc: Process memory map write lock. Acquired via mmap_write_lock_kil= lable() + * during ring buffer mmap operation. This is where EINTR can occur. + * + * signal: SIGKILL + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RETURN + * condition: Fatal signal pending during mmap_write_lock_killable + * desc: Fatal signals can interrupt the context creation during the mma= p phase. + * The mmap_write_lock_killable() function checks for fatal signals an= d returns + * -EINTR if one is pending. Non-fatal signals do not interrupt this s= yscall. + * error: -EINTR + * timing: KAPI_SIGNAL_TIME_DURING + * priority: 0 + * restartable: no + * + * side-effect: KAPI_EFFECT_ALLOC_MEMORY + * target: kioctx structure + * desc: Allocates the main AIO context structure from kioctx_cachep sla= b cache. + * Contains ring buffer metadata, locks, and request tracking. + * reversible: yes + * + * side-effect: KAPI_EFFECT_ALLOC_MEMORY + * target: percpu kioctx_cpu structures + * desc: Allocates per-CPU structures for request batching via alloc_per= cpu(). + * Used to reduce contention on the global request counter. + * reversible: yes + * + * side-effect: KAPI_EFFECT_ALLOC_MEMORY + * target: ring buffer pages + * desc: Allocates pages for the completion event ring buffer. The ring = is backed + * by an anonymous file on the internal "aio" filesystem and memory-ma= pped into + * the process address space. + * reversible: yes + * + * side-effect: KAPI_EFFECT_RESOURCE_CREATE + * target: anonymous inode and file + * desc: Creates an anonymous inode and file on the internal aio filesys= tem to + * back the ring buffer mapping. This enables proper page migration su= pport. + * reversible: yes + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: process virtual memory + * desc: Creates a new memory mapping (VMA) for the ring buffer in the p= rocess + * address space. The mapping is marked VM_DONTEXPAND and uses aio_rin= g_vm_ops. + * reversible: yes + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: mm->ioctx_table + * desc: Adds the new context to the process's AIO context table. The ta= ble is + * dynamically expanded if needed (grows by 4x each time). + * reversible: yes + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: aio_nr (global counter) + * desc: Increments the system-wide AIO context counter by nr_events. Th= is counter + * is visible via /proc/sys/fs/aio-nr and counts toward the aio-max-nr= limit. + * reversible: yes + * + * state-trans: process AIO state + * from: no AIO context (or fewer contexts) + * to: has AIO context + * condition: successful io_setup + * desc: Process gains an AIO context that can be used for asynchronous = I/O + * operations. The context remains until explicitly destroyed via io_d= estroy + * or process exit. + * + * state-trans: system AIO resources + * from: aio_nr =3D N + * to: aio_nr =3D N + nr_events + * condition: successful io_setup + * desc: System-wide AIO resource counter increases. The counter tracks = total + * requested AIO capacity across all processes. + * + * constraint: System-wide AIO limit (aio-max-nr) + * desc: The /proc/sys/fs/aio-max-nr sysctl (default 65536) limits the t= otal + * number of AIO events system-wide. Each io_setup call adds nr_events= to + * the aio_nr counter. If aio_nr + nr_events would exceed aio_max_nr, = the + * call fails with EAGAIN. Administrators can increase aio-max-nr if n= eeded. + * expr: aio_nr + nr_events <=3D aio_max_nr + * + * constraint: Per-process context limit + * desc: Each process can have multiple AIO contexts, limited only by the + * system-wide aio-max-nr limit and available memory. The ioctx_table = grows + * dynamically to accommodate new contexts. + * + * constraint: CONFIG_AIO required + * desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscal= l to be + * available. If not configured, the syscall returns -ENOSYS. This is = typically + * enabled by default but may be disabled on embedded systems. + * + * constraint: Memory for ring buffer + * desc: The kernel must be able to allocate sufficient contiguous pages= for the + * ring buffer and establish the memory mapping. Large nr_events value= s require + * more memory and may fail with ENOMEM on memory-constrained systems. + * + * examples: aio_context_t ctx =3D 0; io_setup(128, &ctx); // Create cont= ext for 128 events + * aio_context_t ctx =3D 0; io_setup(1024, &ctx); // Create context for= 1024 events + * + * notes: The returned context handle is actually the virtual address of t= he ring + * buffer mapping in the process address space. This allows userspace li= braries + * to directly access completion events without syscall overhead in some= cases. + * + * The kernel internally doubles nr_events and ensures a minimum of num_= cpus * 8 + * events for percpu batching efficiency. This means the actual ring cap= acity may + * be significantly larger than requested. + * + * Historical note: A race condition between io_setup and io_destroy was= fixed + * in commit 86b62a2cb4fc ("aio: fix io_setup/io_destroy race"). Earlier= kernels + * could have the context freed while io_setup was still completing. + * + * io_uring (since Linux 5.1) is a more modern alternative that provides= better + * performance and more features. Consider using io_uring for new applic= ations. + * + * There is no glibc wrapper for this syscall. Use syscall(SYS_io_setup,= ...) or + * the libaio library wrapper (note: libaio has slightly different error= semantics, + * returning negative error numbers directly instead of -1 with errno). + * + * since-version: 2.5 */ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctx= p) { --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6E9F2DA76F; Thu, 18 Dec 2025 20:42:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090571; cv=none; b=KQGZzd6lS6wXELOmEB07ota3rnUkvLMbXzBxJsbRXRtoF5PM7qs7LwQdLehcUcGoS/I1wKvqTdFkreIV256FkOhkrtZz4glcXS1q8mY3oy2yayrpmD12tpaPLGD/M7jvX3ZHlDCsOlWfn2uq61OKdqKk/3WsniXIKruycCpkpBc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090571; c=relaxed/simple; bh=oadfCL6p3szPYR2rpUi6FOTD0UWKkS2PI57OHH5y6lA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RANKyyTVymGQ14OeywkXkaXpnXO1VDcVl6hJSn22Ev9S30E1X8gMqEiw3HNHbUJdkdFHa9+/3gOyojVRayhrZb9QkQt2yc4Zgo2STnSqfIxP4WaiqwH39W6Pc6O0hHx/TgRZQ+sX4PiCffXNDsI8N6d0gIdEEmJUUmUfkVOEzaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fm6VCal9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fm6VCal9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B6AFC19423; Thu, 18 Dec 2025 20:42:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090571; bh=oadfCL6p3szPYR2rpUi6FOTD0UWKkS2PI57OHH5y6lA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fm6VCal9OVCusmvNBANryV9Ku48KmSgUngHs6E9F9poBj/ENW8+xW3bdhfePPfrnc ub9jivnDRBaNuMVq66Sxj7ZAADm5y2EReU2qsber1EMth8JhXo8b/IU6wp3RYHn0Do 5p9OXfqEcQ/JhfwXcWJyp8ZSe/tQ0UxloTleGr1adR60QIwFGI3+l6hwyPwnRc0R4A e2G5m/mdx0T3olPLGGocY1v7fd1c6GAS3s5yBDlg8bj8/rKjD1yRtk8VFAb/Lx9VRs wDVr2rVoY742WUaQYU4VvXfHC+phLNtAqFoFv0ZOb24IWtkahCmThWfuxkvi6BUXDO htch9anzS6V6g== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 06/15] kernel/api: add API specification for io_destroy Date: Thu, 18 Dec 2025 15:42:28 -0500 Message-ID: <20251218204239.4159453-7-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/aio.c | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 184 insertions(+), 5 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 36556e7a8e2c0..ff2a8527e1b85 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1646,11 +1646,190 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_even= ts, u32 __user *, ctx32p) } #endif =20 -/* sys_io_destroy: - * Destroy the aio_context specified. May cancel any outstanding=20 - * AIOs and block on completion. Will fail with -ENOSYS if not - * implemented. May fail with -EINVAL if the context pointed to - * is invalid. +/** + * sys_io_destroy - Destroy an asynchronous I/O context + * @ctx: AIO context handle returned by io_setup + * + * long-desc: Destroys the asynchronous I/O context identified by ctx. This + * syscall will attempt to cancel all outstanding asynchronous I/O opera= tions + * against the context and block until all operations have completed. On= ce + * this syscall returns successfully, the context handle becomes invalid= and + * must not be used with any other io_* syscalls. + * + * The context's memory-mapped ring buffer is unmapped from the process = address + * space, and all associated kernel resources are freed. The system-wide= AIO + * event counter (aio_nr) is decremented by the original nr_events value= that + * was passed to io_setup when creating this context. + * + * This syscall blocks until all in-flight I/O operations have completed= . This + * ensures that userspace buffers passed to io_submit are no longer acce= ssed + * by the kernel after io_destroy returns. The wait is NOT interruptible= by + * signals, so callers cannot cancel this blocking behavior. + * + * If two threads call io_destroy on the same context simultaneously, on= ly the + * first call will succeed; subsequent calls return -EINVAL as the conte= xt is + * already marked as dead. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: ctx + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid context handle previously returned by io_= setup. + * The handle is actually the virtual address of the ring buffer mappi= ng in + * the calling process's address space. A value of 0 is always invalid. + * The context must not have been previously destroyed. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_ERROR_CHECK + * success: 0 + * desc: Returns 0 on success. After successful return, the context hand= le is + * invalid and all resources have been released. All outstanding I/O + * operations have completed. + * + * error: EINVAL, Invalid context + * desc: The ctx argument does not refer to a valid AIO context in the c= alling + * process. This can occur if: (1) ctx was never returned by io_setup, + * (2) ctx was returned by io_setup in a different process, (3) ctx was + * already destroyed by a previous io_destroy call, (4) ctx is 0 or an + * arbitrary invalid value, or (5) the ring buffer at the ctx address = has + * been corrupted (e.g., the id field no longer matches). + * + * lock: mm->ioctx_lock + * type: KAPI_LOCK_SPINLOCK + * desc: Per-mm spinlock protecting the ioctx_table. Held briefly while + * marking the context as dead and removing it from the process's AIO + * context table. + * + * lock: RCU read lock + * type: KAPI_LOCK_RCU + * desc: RCU read-side critical section held during context lookup in + * lookup_ioctx(). Protects against concurrent modification of the + * ioctx_table. + * + * lock: ctx->ctx_lock + * type: KAPI_LOCK_SPINLOCK + * desc: Per-context spinlock held while cancelling outstanding I/O requ= ests + * in free_ioctx_users(). Protects the active_reqs list. + * + * lock: mmap_lock + * type: KAPI_LOCK_RWLOCK + * desc: Process memory map write lock acquired during vm_munmap() when + * unmapping the ring buffer. May contend with other memory operations + * in the same process. + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: ctx->dead flag + * desc: Atomically sets the context's dead flag to 1, marking it as bei= ng + * destroyed. This prevents new I/O submissions and ensures subsequent + * io_destroy calls return -EINVAL. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: mm->ioctx_table + * desc: Removes the context from the process's AIO context table by set= ting + * the corresponding table entry to NULL. After this, lookup_ioctx will + * no longer find this context. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: aio_nr (global counter) + * desc: Decrements the system-wide AIO context counter by the context's + * max_reqs value (the nr_events originally passed to io_setup). This + * counter is visible via /proc/sys/fs/aio-nr. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: process virtual memory + * desc: Unmaps the ring buffer from the process's address space via + * vm_munmap(). The memory region at ctx becomes invalid. + * condition: ctx->mmap_size > 0 + * reversible: no + * + * side-effect: KAPI_EFFECT_FREE_MEMORY + * target: kioctx structure and associated resources + * desc: Frees the AIO context structure, percpu data, ring buffer pages= , and + * the anonymous file backing the ring buffer. Deferred via RCU work q= ueue + * to ensure safe cleanup after all references are dropped. + * reversible: no + * + * side-effect: KAPI_EFFECT_SIGNAL_SEND + * target: outstanding AIO operations + * desc: Cancels all outstanding asynchronous I/O operations by invoking= their + * ki_cancel callbacks. The specific effect depends on the operation t= ype + * (read, write, fsync, poll). + * condition: active_reqs list is not empty + * reversible: no + * + * state-trans: AIO context state + * from: alive (ctx->dead =3D=3D 0) + * to: dead (ctx->dead =3D=3D 1) + * condition: successful atomic exchange in kill_ioctx + * desc: The context transitions from usable to destroyed. Once dead, the + * context cannot be used for any operations and will be freed after a= ll + * references are dropped. + * + * state-trans: process AIO state + * from: has AIO context(s) + * to: context removed (or no contexts) + * condition: successful io_destroy + * desc: The destroyed context is removed from the process's context tab= le. + * If this was the only context, the process no longer has any active + * AIO contexts. + * + * state-trans: system AIO resources + * from: aio_nr =3D N + * to: aio_nr =3D N - max_reqs + * condition: successful io_destroy + * desc: System-wide AIO resource counter decreases, making room for oth= er + * processes to create new AIO contexts. + * + * constraint: CONFIG_AIO required + * desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscal= l to be + * available. If not configured, the syscall returns -ENOSYS. This is + * typically enabled by default but may be disabled on embedded system= s. + * + * constraint: Context must belong to calling process + * desc: Each AIO context is bound to a specific process (mm_struct). A = context + * created by one process cannot be destroyed by another process, even= if + * the context handle value is somehow known. + * expr: ctx belongs to current->mm + * + * examples: io_destroy(ctx); // Destroy context and wait for completion + * if (io_destroy(ctx) =3D=3D -EINVAL) handle_error(); // Invalid conte= xt + * + * notes: The man page documents EFAULT as a possible error, but code anal= ysis + * shows that EFAULT conditions (e.g., invalid ring buffer pointer) actu= ally + * result in EINVAL being returned, as lookup_ioctx returns NULL on any + * failure to access the ring buffer header. + * + * This syscall blocks in TASK_UNINTERRUPTIBLE state while waiting for + * outstanding I/O operations to complete. This means the process cannot= be + * interrupted by signals during this wait. In extreme cases with very s= low + * I/O devices, this could cause the process to appear hung. + * + * Historical note: Before kernel 3.11, io_destroy blocked waiting for I= /O + * completion. A refactoring in 3.11 accidentally removed this behavior, + * creating a race where userspace buffers could be freed while the kern= el + * was still using them. This was fixed by commit e02ba72aabfa that bloc= ks + * io_destroy until all context requests are completed. + * + * Race condition handling: A race between io_destroy and io_submit was = fixed + * by commit 7137c6bd4552. A race between io_setup and io_destroy was fi= xed + * by commit 86b62a2cb4fc. Both fixes ensure proper synchronization via + * reference counting. + * + * io_uring (since Linux 5.1) is a more modern alternative that provides= better + * performance and more features. Consider using io_uring for new applic= ations. + * + * There is no glibc wrapper for this syscall. Use syscall(SYS_io_destro= y, ctx) + * or the libaio library wrapper io_destroy(). Note: libaio has slightly + * different error semantics, returning negative error numbers directly = instead + * of -1 with errno. + * + * since-version: 2.5 */ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx) { --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1EF42E92D4; Thu, 18 Dec 2025 20:42:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090572; cv=none; b=OdO/aE7E5I/uIR0IeMpEatCIPnrOWV0Gd9m2hRYtTM3q/JJJa3TlM7FiJ+gn7j6Z4QJB12ZimtHCu4RaKkLT7Kgnq9i5IrhhP1+THPEZq815vX5mp+RMbbByaAniidqcivlKTX9y3dECw4JwrxIcIDPtXysYJdFB7Lebkn33w4I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090572; c=relaxed/simple; bh=15QMa2QRWzuHNFPoY2VDNiHPRh2YTSroy0oCN7V30og=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mg8S+JPgDJdv5vcpGyIabVpxw3HcDnjtzNer9AOrl6s19eJHWlFZNVn2+g4SjDAeeYELPJbRO/AzcFnNi0eE88f83Sa5LtB3jq+3sew28t93xZNBmNJX459+MVAogzaYHU4a8e7nrh4/TmBMRwetDRP86jM6q4MSEwF+GoyQ5ws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WaJ3h4kz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WaJ3h4kz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E831DC113D0; Thu, 18 Dec 2025 20:42:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090572; bh=15QMa2QRWzuHNFPoY2VDNiHPRh2YTSroy0oCN7V30og=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WaJ3h4kzgOxMbfRPwjaKsQf1Q0ly1MlEO2Z6as2aBmSn2YHPh/S3laqZbnBjh0rNg 3fiKDg3f6HmZj8UKBTHgfzDR3lvymRX6uOKqyCxVX4Uq0/KKsHxIestUJ5y8NB/q/f hQ+3qrQVFq2sNaTfXvx2Si/kbAz4m/8MZX0uJX5IWrPcfctHTpW7oC8ojb40zIL0Q1 wNHuHcRFtWzd4r9hepjKf3zIPrLodONtFc3oYWyEE3Am8WKdprLrnpv8iiQrPDEplk L7lpLmhUz40izRmbA0Z0V8qZnANCmVZMpFua5MSResYufSi4AyPi6CadgERE8Aseb2 bqu9oatLRvmDw== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 07/15] kernel/api: add API specification for io_submit Date: Thu, 18 Dec 2025 15:42:29 -0500 Message-ID: <20251218204239.4159453-8-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/aio.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 308 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index ff2a8527e1b85..f6f1b3790c88b 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2450,17 +2450,314 @@ static int io_submit_one(struct kioctx *ctx, struc= t iocb __user *user_iocb, return err; } =20 -/* sys_io_submit: - * Queue the nr iocbs pointed to by iocbpp for processing. Returns - * the number of iocbs queued. May return -EINVAL if the aio_context - * specified by ctx_id is invalid, if nr is < 0, if the iocb at - * *iocbpp[0] is not properly initialized, if the operation specified - * is invalid for the file descriptor in the iocb. May fail with - * -EFAULT if any of the data structures point to invalid data. May - * fail with -EBADF if the file descriptor specified in the first - * iocb is invalid. May fail with -EAGAIN if insufficient resources - * are available to queue any iocbs. Will return 0 if nr is 0. Will - * fail with -ENOSYS if not implemented. +/** + * sys_io_submit - Submit asynchronous I/O operations for processing + * @ctx_id: AIO context handle returned by io_setup + * @nr: Number of I/O control blocks to submit + * @iocbpp: Array of pointers to iocb structures describing the operations + * + * long-desc: Submits one or more asynchronous I/O operations for processi= ng + * against a previously created AIO context. Each iocb structure describ= es + * a single I/O operation including the operation type, file descriptor, + * buffer, size, and offset. + * + * The syscall processes iocbs sequentially from the array. If an error + * occurs while processing an iocb, submission stops at that point and + * the number of successfully submitted operations is returned. This mea= ns + * partial submission is possible: if submitting 10 iocbs and the 5th fa= ils, + * 4 is returned and iocbs 0-3 are queued for processing. + * + * Supported operations (specified via aio_lio_opcode): + * - IOCB_CMD_PREAD (0): Positioned read from file + * - IOCB_CMD_PWRITE (1): Positioned write to file + * - IOCB_CMD_FSYNC (2): Sync file data and metadata + * - IOCB_CMD_FDSYNC (3): Sync file data only + * - IOCB_CMD_POLL (5): Poll for events on file descriptor + * - IOCB_CMD_NOOP (6): No operation (useful for testing) + * - IOCB_CMD_PREADV (7): Positioned scatter read + * - IOCB_CMD_PWRITEV (8): Positioned gather write + * + * The iocb structure fields include: + * - aio_data: User data copied to io_event on completion + * - aio_lio_opcode: Operation type (one of IOCB_CMD_*) + * - aio_fildes: File descriptor for the operation + * - aio_buf: Buffer address (or iovec array for vectored ops) + * - aio_nbytes: Buffer size (or iovec count for vectored ops) + * - aio_offset: File offset for positioned operations + * - aio_flags: Optional flags (IOCB_FLAG_RESFD, IOCB_FLAG_IOPRIO) + * - aio_resfd: eventfd to signal on completion (if IOCB_FLAG_RESFD set) + * - aio_rw_flags: Per-operation RWF_* flags + * - aio_reqprio: I/O priority (if IOCB_FLAG_IOPRIO set) + * + * After successful submission, operations complete asynchronously. Resu= lts + * are delivered to the completion ring buffer and can be retrieved via + * io_getevents(). If aio_resfd specifies a valid eventfd, it is signaled + * when each operation completes. + * + * The actual I/O may complete synchronously if the data is cached or if + * the underlying filesystem doesn't support truly asynchronous I/O. In + * such cases, the operation is still reported via the completion ring. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: ctx_id + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid AIO context handle previously returned by + * io_setup() for the current process. The context must not have been + * destroyed. A value of 0 is always invalid. The handle is actually + * the virtual address of the ring buffer mapping. + * + * param: nr + * type: KAPI_TYPE_INT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, LONG_MAX + * constraint: Must be >=3D 0. If 0, the syscall returns immediately wit= h 0. + * The actual number processed is capped to ctx->nr_events (the contex= t's + * capacity). Very large values are effectively limited by the context + * capacity and available ring buffer slots. + * + * param: iocbpp + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid userspace pointer to an array of nr point= ers + * to struct iocb. Each iocb pointer must itself be valid and point to= a + * properly initialized iocb structure. The iocb structures must have + * aio_reserved2 set to 0 for forward compatibility. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_RANGE + * success: >=3D 0 + * desc: Returns the number of iocbs successfully submitted (0 to nr). If + * partial submission occurs due to an error, returns the count of + * successfully submitted operations. Returns 0 if nr is 0. + * + * error: EINVAL, Invalid context or parameter + * desc: Returned if ctx_id is invalid, nr is negative, aio_reserved2 is + * non-zero, aio_lio_opcode is invalid, aio_buf/aio_nbytes overflow, + * aio_resfd is not an eventfd, conflicting aio_rw_flags, file lacks + * required operation support, invalid POLL/FSYNC parameters, or + * invalid aio_reqprio class. + * + * error: EFAULT, Invalid memory access + * desc: Returned if: (1) iocbpp is not a valid userspace pointer, (2) a= ny + * pointer in the iocbpp array is invalid, (3) the iocb data cannot be + * copied from userspace, (4) aio_buf points to invalid memory, or + * (5) the kernel cannot write the aio_key field back to userspace. + * + * error: EBADF, Bad file descriptor + * desc: Returned if: (1) aio_fildes in an iocb does not refer to an open + * file, (2) aio_resfd does not refer to a valid file descriptor when + * IOCB_FLAG_RESFD is set, (3) the file is not opened with appropriate + * mode for the operation (e.g., read on write-only file). + * + * error: EAGAIN, Resource temporarily unavailable + * desc: Returned if insufficient slots are available in the completion + * ring buffer. This typically means too many operations are already + * in flight and the application should call io_getevents() to consume + * completed events before submitting more. + * + * error: EPERM, Operation not permitted + * desc: Returned if: (1) IOCB_FLAG_IOPRIO is set and aio_reqprio specif= ies + * IOPRIO_CLASS_RT (real-time I/O priority) but the process lacks + * CAP_SYS_ADMIN or CAP_SYS_NICE capability, or (2) RWF_NOAPPEND is + * specified but the file has the append-only attribute (IS_APPEND). + * + * error: EOPNOTSUPP, Operation not supported + * desc: Returned if: (1) unsupported aio_rw_flags are specified, (2) + * RWF_NOWAIT is specified but the file doesn't support non-blocking I= /O + * (FMODE_NOWAIT not set), (3) RWF_ATOMIC is specified for a read or + * the file doesn't support atomic writes, or (4) RWF_DONTCACHE is + * specified but not supported by the filesystem or file is DAX-mapped. + * + * error: EOVERFLOW, Value too large + * desc: Returned if aio_offset plus aio_nbytes would overflow and the + * file does not support unsigned offsets. This check prevents reading + * or writing past the maximum representable file position. + * + * error: ENOMEM, Out of memory + * desc: Returned if memory allocation fails when preparing credentials + * for IOCB_CMD_FSYNC operations, or if vectored I/O (preadv/pwritev) + * requires allocating iovec arrays larger than the stack buffer. + * + * lock: RCU read lock + * type: KAPI_LOCK_RCU + * desc: Acquired during context lookup in lookup_ioctx(). Protects agai= nst + * concurrent modification of the ioctx_table while looking up the + * context. Released before processing any iocbs. + * + * lock: ctx->completion_lock + * type: KAPI_LOCK_SPINLOCK + * desc: Per-context spinlock acquired briefly during request slot alloc= ation + * via user_refill_reqs_available() if the percpu request counter is e= mpty. + * Protects the ring buffer tail and completed_events counters. + * + * lock: ctx->ctx_lock + * type: KAPI_LOCK_SPINLOCK + * desc: Per-context spinlock acquired when adding cancellable requests = to + * the active_reqs list. This enables io_cancel() to find and cancel + * in-flight operations. + * + * lock: blk_plug + * type: KAPI_LOCK_CUSTOM + * desc: Block layer plugging is enabled when nr > 2 (AIO_PLUG_THRESHOLD) + * to batch block I/O requests for better performance. This is not a + * traditional lock but affects I/O scheduling. + * + * signal: any + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_TRANSFORM + * condition: Signal arrives during underlying read/write operation + * desc: If a signal arrives during the underlying file read/write opera= tion + * and the operation returns ERESTARTSYS/ERESTARTNOINTR/etc., the error + * is transformed to EINTR for the completion event. AIO operations ca= nnot + * be restarted in the traditional sense because other operations may = have + * already been submitted. The syscall itself (io_submit) is NOT inter= rupted + * by signals - only the individual async operations can be. + * error: -EINTR (in io_event.res, not syscall return) + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: no + * + * side-effect: KAPI_EFFECT_ALLOC_MEMORY + * target: aio_kiocb structures + * desc: Allocates one aio_kiocb structure per submitted operation from = the + * kiocb_cachep slab cache. These structures track the in-flight opera= tions + * and are freed after completion is recorded in the ring buffer. + * reversible: yes + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: AIO context request counters + * desc: Decrements the available request slot counter in the context. + * Slots are reclaimed when completion events are consumed from the ri= ng + * buffer via io_getevents(). + * reversible: yes + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: ctx->active_reqs list + * desc: Cancellable operations (reads, writes, polls) are added to the + * context's active_reqs list, enabling cancellation via io_cancel(). + * condition: Operation supports cancellation + * reversible: yes + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: iocb->aio_key field + * desc: The kernel writes KIOCB_KEY (0) to the aio_key field of each + * submitted iocb in userspace memory. This marks the iocb as submitted + * and is checked by io_cancel() to validate the iocb. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: file reference count + * desc: Increments the reference count of the file descriptor's struct = file + * via fget() for each submitted operation. The reference is released + * when the operation completes (via fput() in iocb_destroy()). + * reversible: yes + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: target file(s) + * desc: For write operations, the file content may be modified. For fsy= nc + * operations, dirty data is flushed to storage. The actual I/O may + * complete synchronously or asynchronously depending on the filesyste= m. + * condition: IOCB_CMD_PWRITE, IOCB_CMD_PWRITEV, IOCB_CMD_FSYNC, IOCB_CM= D_FDSYNC + * reversible: no + * + * side-effect: KAPI_EFFECT_SCHEDULE + * target: fsync work queue + * desc: FSYNC and FDSYNC operations are scheduled to run on a workqueue + * because vfs_fsync() can block. The operation runs asynchronously and + * completion is signaled via the ring buffer. + * condition: IOCB_CMD_FSYNC or IOCB_CMD_FDSYNC + * reversible: no + * + * state-trans: iocb state + * from: user-prepared iocb + * to: submitted (aio_key set to KIOCB_KEY) + * condition: successful submission of each iocb + * desc: Each successfully submitted iocb transitions from user-prepared + * state to submitted state, marked by the kernel writing KIOCB_KEY to + * aio_key. The iocb remains in submitted state until completion. + * + * state-trans: AIO context slot availability + * from: slots_available =3D N + * to: slots_available =3D N - submitted_count + * condition: successful submission + * desc: Available slots in the context decrease by the number of succes= sfully + * submitted operations. Slots are reclaimed when io_getevents() consu= mes + * completion events. + * + * capability: CAP_SYS_ADMIN + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Use of IOPRIO_CLASS_RT (real-time I/O priority class) + * without: Returns EPERM when attempting to use RT I/O priority + * condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLAS= S_RT + * + * capability: CAP_SYS_NICE + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Use of IOPRIO_CLASS_RT (alternative to CAP_SYS_ADMIN) + * without: Returns EPERM when attempting to use RT I/O priority + * condition: IOCB_FLAG_IOPRIO set and aio_reqprio specifies IOPRIO_CLAS= S_RT + * + * constraint: Ring buffer slot availability + * desc: There must be available slots in the completion ring buffer for + * each operation to be submitted. If all slots are occupied by pending + * completion events, submission fails with EAGAIN. The number of slots + * is determined by nr_events passed to io_setup(), though internal + * doubling means more slots may be available. + * expr: available_slots >=3D 1 for each submission + * + * constraint: Valid file descriptor per iocb + * desc: Each iocb must reference a valid, open file descriptor via + * aio_fildes. The file must be opened with appropriate access mode + * for the requested operation (read access for PREAD, write access + * for PWRITE, etc.). + * + * constraint: File must support operation + * desc: For read/write operations, the underlying file must implement + * read_iter/write_iter file operations. For fsync, the file must + * implement fsync. For poll, the file must support vfs_poll(). + * + * constraint: CONFIG_AIO required + * desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscall + * to be available. If not configured, returns -ENOSYS. + * + * examples: struct iocb iocb, *iocbp =3D &iocb; io_submit(ctx, 1, &iocbp); + * struct iocb iocbs[10], *ptrs[10]; io_submit(ctx, 10, ptrs); // Batch= submit + * + * notes: Unlike traditional synchronous I/O, errors from io_submit() indi= cate + * submission failures, not I/O errors. Actual I/O errors are reported v= ia + * the res field of struct io_event when retrieved via io_getevents(). + * + * The return value indicates how many iocbs were successfully submitted. + * If this is less than nr, the application should check which operation + * failed (it's the one at index =3D return_value) and handle the error. + * Previously submitted operations in the batch are still queued. + * + * For vectored operations (PREADV/PWRITEV), aio_buf points to an array + * of struct iovec and aio_nbytes contains the iovec count. The maximum + * iovec count is UIO_MAXIOV (1024). + * + * Block layer plugging is automatically enabled for batches larger than + * 2 operations, improving I/O merging and reducing per-I/O overhead. + * + * The COMPAT_SYSCALL variant handles 32-bit userspace on 64-bit kernels, + * using compat_uptr_t for the iocbpp array elements. + * + * Historical note: commit d6b2615f7d31d ("aio: simplify - and fix - fge= t/fput + * for io_submit()") fixed file descriptor reference counting issues. Ea= rlier + * kernels could leak file references on certain error paths. + * + * io_uring (since Linux 5.1) is a more modern and performant alternativ= e. + * Consider using io_uring_enter() for new applications requiring async = I/O. + * + * There is no glibc wrapper; use syscall(SYS_io_submit, ...) or the lib= aio + * library. The libaio wrapper io_submit() returns negative error numbers + * directly rather than returning -1 and setting errno. + * + * since-version: 2.5 */ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, struct iocb __user * __user *, iocbpp) --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AD332DC774; Thu, 18 Dec 2025 20:42:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090573; cv=none; b=Gl7UNtW5015L0MhbvgV/FJuwsiL1jxH+qRIVljKJz6aiUrkeXVCozhT5Mn0ZHFTykxx6IH1kShvS/kXAVzKPEao7KpvG/inxmq+IEALfcDsuptEW3z+ZFgje87pKsXMAOyZx9DBrRuz2iWXrTMyglpLaxYXYFa8vxbBZ71aOLe4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090573; c=relaxed/simple; bh=TWRyLzOK+gmM4eFtawhYOHlyCd0Mmshknky6n5T1bvA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YKz7Jy6px1QwN4pjoW4PujLdbIZqRrKFxBFLxBpmAsCr/mU3hvzRMQFAmvu9U4pXf/b/0xQIML2cLUcnDL6Yy2QCS933+Thi4Ey2bwyTJX1YP7qJ2FvPv4zTrDxSBs1bIx2qC0TPkQT9cvAkTMT8+qrS2uyrZGauw077bBhATUE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N9x40JpJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N9x40JpJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D51C5C16AAE; Thu, 18 Dec 2025 20:42:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090573; bh=TWRyLzOK+gmM4eFtawhYOHlyCd0Mmshknky6n5T1bvA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=N9x40JpJzNzswbsy0KgfTXDPQ+7AX/0Cb23mAAxbniebGBmJroQoF6ztOd1IbJ05q 2QNRSt22Fd1ZXIbpxPMGH3u2RqmR+YAHzwbyVaVVTMj9rvhsjHS1sXPB+R31/0uwFc ik7zr1FK3s8tVRl4bKhflAfh6MUW523SmtWAzQ0HRck+hbiAG+ZXoNMXqTD6tq57AJ 90oM0JljpKVftqKgnBGojSs3ga6DMg/h41bz2GUAG46lwu5QiSWJNUXK04cz/ZPp7k mp96toMdHgXqN+9b3WGEJQbs9fRlqXlKAAclomIir0NndPOHsftP52k9HUT8UWFxMp TSssBV800vqCg== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 08/15] kernel/api: add API specification for io_cancel Date: Thu, 18 Dec 2025 15:42:30 -0500 Message-ID: <20251218204239.4159453-9-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/aio.c | 246 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 237 insertions(+), 9 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index f6f1b3790c88b..710517c9a990d 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2843,15 +2843,243 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_conte= xt_t, ctx_id, } #endif =20 -/* sys_io_cancel: - * Attempts to cancel an iocb previously passed to io_submit. If - * the operation is successfully cancelled, the resulting event is - * copied into the memory pointed to by result without being placed - * into the completion queue and 0 is returned. May fail with - * -EFAULT if any of the data structures pointed to are invalid. - * May fail with -EINVAL if aio_context specified by ctx_id is - * invalid. May fail with -EAGAIN if the iocb specified was not - * cancelled. Will fail with -ENOSYS if not implemented. +/** + * sys_io_cancel - Attempt to cancel an outstanding asynchronous I/O opera= tion + * @ctx_id: AIO context handle returned by io_setup + * @iocb: Pointer to the iocb structure that was previously submitted + * @result: Unused parameter (historically for result storage, now ignored) + * + * long-desc: Attempts to cancel an asynchronous I/O operation that was + * previously submitted via io_submit(). The syscall searches for the + * specified iocb in the context's active request list and invokes the + * operation-specific cancellation callback if found. + * + * The cancellation behavior depends on the type of I/O operation: + * - For poll operations (IOCB_CMD_POLL): The request is marked as cance= lled + * and a work item is scheduled to complete the cancellation. + * - For USB gadget I/O: The USB endpoint dequeue function is called, wh= ich + * triggers the completion callback with -ECONNRESET status. + * - For most direct I/O operations: Cancellation is typically not suppo= rted + * as these operations do not register a cancel callback. + * + * If the iocb is found and has a registered cancellation callback, that + * callback is invoked and the iocb is removed from the active request l= ist. + * The completion event is delivered via the ring buffer (not via the re= sult + * parameter, which is now unused for this purpose). + * + * On successful cancellation initiation, the syscall returns -EINPROGRE= SS + * (not 0) to indicate that cancellation is in progress. This is because + * the actual completion may occur asynchronously via the cancel callbac= k. + * + * Important limitations: + * - Most file I/O operations do not support cancellation + * - The iocb must still be pending (not yet completed) + * - The iocb must have been submitted via io_submit (aio_key =3D=3D KIO= CB_KEY) + * - Only operations that register a ki_cancel callback can be cancelled + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_ATOMIC + * + * param: ctx_id + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid AIO context handle previously returned by + * io_setup() for the current process. The context must not have been + * destroyed via io_destroy(). A value of 0 is always invalid. The han= dle + * is actually the virtual address of the ring buffer mapping, and must + * belong to the calling process's address space. + * + * param: iocb + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * size: sizeof(struct iocb) + * constraint-type: KAPI_CONSTRAINT_USER_PTR + * constraint: Must be a valid userspace pointer to a struct iocb that w= as + * previously submitted via io_submit(). The iocb's aio_key field must + * contain KIOCB_KEY (0), which is written by the kernel during io_sub= mit. + * A NULL pointer will result in EFAULT. The iocb must still be pending + * (present in the context's active_reqs list) for cancellation to suc= ceed. + * + * param: result + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL + * constraint-type: KAPI_CONSTRAINT_NONE + * constraint: This parameter is no longer used by the kernel. It was + * historically intended to receive the io_event result on successful + * cancellation, but completion events are now always delivered via the + * ring buffer. May be NULL. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_ERROR_CHECK + * success: -EINPROGRESS + * desc: Returns -EINPROGRESS when the cancellation callback was success= fully + * invoked and the request is being cancelled. This is the expected re= turn + * value on successful cancellation initiation. The completion event w= ill + * be delivered via the ring buffer. Note that this is different from = the + * man page which claims 0 is returned on success. + * + * error: EFAULT, Cannot read iocb from userspace + * desc: Returned if the iocb pointer is invalid or points to memory that + * cannot be read. Specifically, the kernel attempts to read the aio_k= ey + * field from the iocb via get_user() and returns EFAULT if this fails. + * A NULL iocb pointer will trigger this error. + * + * error: EINVAL, iocb not submitted via io_submit + * desc: Returned if the aio_key field of the iocb does not contain KIOC= B_KEY + * (which is 0). The kernel sets aio_key to KIOCB_KEY when an iocb is + * successfully submitted via io_submit(). If aio_key contains a diffe= rent + * value, it indicates the iocb was never successfully submitted, is + * corrupted, or the memory has been reused. + * + * error: EINVAL, Invalid AIO context + * desc: Returned if ctx_id does not refer to a valid AIO context. This = can + * occur if: (1) the context was never created, (2) the context was + * destroyed via io_destroy(), (3) the ctx_id is 0, (4) the ring buffer + * header cannot be read from userspace, (5) the context belongs to a + * different process, or (6) the context's internal ID doesn't match. + * + * error: EINVAL, iocb not found or not cancellable + * desc: Returned if the specified iocb is not present in the context's + * active request list. This occurs when: (1) the operation has already + * completed and the completion event is in the ring buffer, (2) the + * operation was never submitted to this context, (3) the iocb pointer + * does not match any pending operation (comparison is by pointer value + * converted to u64), or (4) the operation did not register a cancella= tion + * callback (though in this case EINVAL comes from the default ret val= ue). + * Note: The man page documents EAGAIN for this case, but the actual + * implementation returns EINVAL. + * + * error: ENOSYS, AIO not implemented + * desc: Returned if the kernel was compiled without CONFIG_AIO support. + * This error is returned by the syscall dispatch mechanism before the + * io_cancel implementation is even reached. + * + * error: (driver-specific), Cancellation callback failed + * desc: If the iocb is found and its ki_cancel callback is invoked, the + * callback's return value is propagated to userspace if non-zero. For + * USB gadget operations, usb_ep_dequeue() may return various errors + * including EINVAL if the request wasn't queued. The aio_poll_cancel + * callback always returns 0. Driver-specific cancellation functions + * may return other error codes. + * + * lock: RCU read lock + * type: KAPI_LOCK_RCU + * desc: Acquired in lookup_ioctx() during context lookup. Protects agai= nst + * concurrent modification of the mm->ioctx_table while searching for = the + * context. Released before any spinlocks are acquired. + * + * lock: ctx->ctx_lock + * type: KAPI_LOCK_SPINLOCK + * desc: Per-context spinlock acquired with interrupts disabled via + * spin_lock_irq(). Held while iterating through the active_reqs list + * searching for the iocb, while invoking the ki_cancel callback, and + * while removing the iocb from the list. The cancel callback is invok= ed + * with this lock held, so callbacks must not sleep and must be IRQ-sa= fe. + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: ctx->active_reqs list + * desc: If the iocb is found and its cancellation callback is invoked, = the + * kiocb is removed from the context's active_reqs list via list_del_i= nit(). + * This prevents the iocb from being found by subsequent io_cancel cal= ls. + * condition: iocb found and ki_cancel callback invoked + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: Pending I/O operation + * desc: The cancellation callback may modify the state of the underlying + * I/O operation. For poll operations, the cancelled flag is set. For = USB + * operations, the USB request is dequeued which triggers the completi= on + * callback. The completion event is delivered via the ring buffer. + * condition: ki_cancel callback is invoked + * reversible: no + * + * side-effect: KAPI_EFFECT_SCHEDULE + * target: aio_poll work queue + * desc: For poll operations (IOCB_CMD_POLL), the aio_poll_cancel callba= ck + * schedules a work item via schedule_work() to complete the cancellat= ion + * asynchronously. This work item will eventually deliver the completi= on + * event to the ring buffer. + * condition: Cancelling a poll operation + * reversible: no + * + * state-trans: kiocb state + * from: in_flight (in active_reqs list) + * to: cancelling (removed from list, cancel callback invoked) + * condition: iocb found and ki_cancel invoked + * desc: When the iocb is found in the active_reqs list and its cancella= tion + * callback is invoked, the kiocb transitions from in-flight to cancel= ling + * state. The kiocb is removed from the active_reqs list, preventing + * duplicate cancellation attempts. Final completion occurs asynchrono= usly. + * + * state-trans: poll_iocb cancelled flag + * from: false + * to: true + * condition: aio_poll_cancel is invoked + * desc: For poll operations, the aio_poll_cancel callback sets the canc= elled + * flag on the poll_iocb structure. This signals to the poll completion + * handler that the operation was cancelled rather than completed norm= ally. + * + * constraint: Operation must support cancellation + * desc: Only operations that register a ki_cancel callback can be cance= lled. + * Operations that don't set this callback (most direct I/O operations) + * will never appear in the active_reqs list and thus cannot be cancel= led. + * Currently, only IOCB_CMD_POLL operations in the kernel AIO subsystem + * and USB gadget operations support cancellation. + * + * constraint: Timing window for cancellation + * desc: The iocb must still be pending at the time io_cancel is called. + * There is an inherent race condition: the operation may complete + * naturally between the time the application decides to cancel and wh= en + * io_cancel is invoked. In this case, EINVAL is returned because the + * iocb is no longer in the active_reqs list. + * + * constraint: CONFIG_AIO required + * desc: The kernel must be compiled with CONFIG_AIO=3Dy for this syscall + * to be available. If not configured, ENOSYS is returned. + * + * examples: io_cancel(ctx, &iocb, NULL); // Cancel with unused result pa= ram + * if (io_cancel(ctx, &iocb, NULL) =3D=3D -EINPROGRESS) handle_cancellat= ion(); + * ret =3D io_cancel(ctx, &iocb, NULL); if (ret && ret !=3D -EINPROGRESS= ) error(); + * + * notes: The return value semantics are unusual: -EINPROGRESS indicates + * successful cancellation initiation, not an error. This is because the + * actual cancellation may complete asynchronously, with the completion + * event delivered via the ring buffer. + * + * The result parameter is completely ignored by current kernels. It was + * historically used to return the io_event directly, but since commit + * 28468cbed92e ("Revert 'fs/aio: Make io_cancel() generate completions + * again'"), completion events are always delivered via the ring buffer. + * Applications should use io_getevents() to retrieve the cancelled + * operation's completion event. + * + * The man page documents EAGAIN as a possible error when "the iocb spec= ified + * was not cancelled", but code analysis shows that EINVAL is actually + * returned in this case. The man page is outdated in this regard. + * + * The aio_key field must equal KIOCB_KEY (0) because the kernel writes = this + * value during io_submit. If an application attempts to cancel an iocb + * before submitting it, or after the memory has been reused, this check + * will fail with EINVAL. + * + * For poll operations specifically, the cancellation is marked but the + * actual completion may be delayed until a worker processes it. The + * -EINPROGRESS return value reflects this asynchronous completion model. + * + * USB gadget operations are an exception: when usb_ep_dequeue() is call= ed, + * it typically completes the request synchronously with -ECONNRESET sta= tus + * in the completion callback. + * + * There is no glibc wrapper for this syscall. Applications must use + * syscall(SYS_io_cancel, ...) or the libaio library. The libaio wrapper + * returns negative error numbers directly rather than returning -1 and + * setting errno. + * + * io_uring (since Linux 5.1) provides a more capable and widely-support= ed + * async I/O interface with better cancellation support via IORING_OP_AS= YNC_CANCEL. + * + * since-version: 2.5 */ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, io= cb, struct io_event __user *, result) --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8408D2FDC55; Thu, 18 Dec 2025 20:42:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090574; cv=none; b=UoQ0kQWui1uwxL8pESu5/yyFzO5kt+OH8ZIhXUnkt6VGMditDL7wIpt+v6c/D4MaGdrSEuGIHKnhulOFuXHr+VrJJgmuhZKXHBz/VVoEVO5Ff/QuuhMtfb6Jt6tDCy/E0GK7ObWAE85nK8n94gnYe1wXSlt+oqARfBPvKdO5SpU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090574; c=relaxed/simple; bh=xlCNwYq6t1K/Oh4koGEQ7yQiu7YrMCTvErifTUwoWj8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gtsU8uNTdpJb9wHWSsaVjWWiY6WADrCq9vWZR2g/hs9YtmLNRfm1Amv6eh81eJjfZe01jW6LOK+HcqtqY8R8/5U1o39SL+LzjYLZ9FJXHbq1a7BTuwrD27mOJEkBWKAm1DJ4wy/N/EgIbFSv1MDm6/1MCEs/gtcv8PHUF0hxtkI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=q7ihrVzC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="q7ihrVzC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD622C113D0; Thu, 18 Dec 2025 20:42:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090574; bh=xlCNwYq6t1K/Oh4koGEQ7yQiu7YrMCTvErifTUwoWj8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=q7ihrVzCNwuIW9WK97CFNOhHYvkL9Gg0YHIC2Rzauaq5MUracAm5tT2GrjMczLPEQ sfLBmuT6v0OuH4aQuGyc9IWllluv8hnbo2+xpDuVCmOz79IK98QahlLULMGvFoV299 O86BfysuZz8XLvYbwyYLyr6E9rriXJ3mDFzz7EP7t6+YgRT5TjvPmHEanQln6Bjwcr ZvTGfCelyZn+2Vk9qQfMZ6aobXhjQpDNkgS2GgoLjqyVp6SsD4HXyS2ADoEo3gVRHk 13/LgvJzHnTm/GzJYwNlXx+crzXNAkSrJsdQdTT8XTb6zRtf9faSr64+0qgqgyBpDA ENwhDilrA9CkQ== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 09/15] kernel/api: add API specification for setxattr Date: Thu, 18 Dec 2025 15:42:31 -0500 Message-ID: <20251218204239.4159453-10-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/xattr.c | 310 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 310 insertions(+) diff --git a/fs/xattr.c b/fs/xattr.c index 32d445fb60aaf..02a946227129e 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -740,6 +740,316 @@ SYSCALL_DEFINE6(setxattrat, int, dfd, const char __us= er *, pathname, unsigned in args.flags); } =20 +/** + * sys_setxattr - Set an extended attribute value on a file + * @pathname: Path to the file on which to set the extended attribute + * @name: Null-terminated name of the extended attribute (includes namespa= ce prefix) + * @value: Buffer containing the attribute value to set + * @size: Size of the value buffer in bytes + * @flags: Flags controlling attribute creation/replacement behavior + * + * long-desc: Sets the value of an extended attribute identified by name on + * the file specified by pathname. Extended attributes are name:value pa= irs + * associated with inodes (files, directories, symbolic links, etc.) that + * extend the normal attributes (stat data) associated with all inodes. + * + * The attribute name must include a namespace prefix. Valid namespaces = are: + * - "user." - User-defined attributes (regular files and directories on= ly) + * - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN) + * - "security." - Security module attributes (e.g., SELinux, Smack, cap= abilities) + * - "system." - System attributes (e.g., POSIX ACLs via system.posix_ac= l_access) + * + * The value can be arbitrary binary data or text. A zero-length value is + * permitted and creates an attribute with an empty value (different from + * removing the attribute). + * + * This syscall follows symbolic links. Use lsetxattr() to operate on the + * symbolic link itself, or fsetxattr() to operate on an open file descr= iptor. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: pathname + * type: KAPI_TYPE_PATH + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_USER_PATH + * constraint: Must be a valid null-terminated path string in user memor= y. + * The path is resolved following symbolic links. Maximum path length = is + * PATH_MAX (4096 bytes). The file must exist and the caller must have + * appropriate permissions. + * + * param: name + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_USER_STRING + * range: 1, 255 + * constraint: Must be a valid null-terminated string in user memory con= taining + * the extended attribute name with namespace prefix (e.g., "user.myat= tr"). + * The name (including prefix) must be between 1 and XATTR_NAME_MAX (2= 55) + * characters. An empty name returns ERANGE. + * + * param: value + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid pointer to user memory containing the att= ribute + * value, or NULL if size is 0. When size is non-zero, the pointer mus= t be + * valid and accessible for size bytes. + * + * param: size + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, 65536 + * constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX + * (65536 bytes). Zero is permitted and creates an attribute with empt= y value. + * Filesystem-specific limits may be smaller (e.g., ext4 limits total = xattr + * space to one filesystem block, typically 4KB). + * + * param: flags + * type: KAPI_TYPE_INT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_MASK + * valid-mask: XATTR_CREATE | XATTR_REPLACE + * constraint: Controls creation/replacement behavior. Valid values are = 0, + * XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if t= he + * attribute already exists. XATTR_REPLACE fails if the attribute does= not + * exist. With flags=3D0, the attribute is created if it doesn't exist= or + * replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually ex= clusive. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_ERROR_CHECK + * success: 0 + * desc: Returns 0 on success. The extended attribute is set with the sp= ecified + * value. Any previous value for the attribute is replaced. + * + * error: ENOENT, File not found + * desc: The file specified by pathname does not exist, or a directory c= omponent + * in the path does not exist. Returned from path lookup (filename_loo= kup). + * + * error: EACCES, Permission denied + * desc: Permission denied during path resolution (search permission on = a directory + * component) or write access to the file is denied based on DAC permi= ssions. + * + * error: EPERM, Operation not permitted + * desc: Returned in several cases: (1) The file is marked immutable (ch= attr +i) + * or append-only (chattr +a). (2) For trusted.* namespace, caller lac= ks + * CAP_SYS_ADMIN in the filesystem's user namespace. (3) For security.* + * namespace (except security.capability), caller lacks CAP_SYS_ADMIN. + * (4) For user.* namespace on sticky directories, caller is not the o= wner + * and lacks CAP_FOWNER. (5) The inode has an unmapped ID in an idmapp= ed mount. + * + * error: ENODATA, Attribute not found + * desc: XATTR_REPLACE was specified but the named attribute does not ex= ist on + * the file. Also returned when reading trusted.* without CAP_SYS_ADMI= N (for + * read operations, but included here for completeness with the flag). + * + * error: EEXIST, Attribute already exists + * desc: XATTR_CREATE was specified but the named attribute already exis= ts on + * the file. + * + * error: ERANGE, Name out of range + * desc: The attribute name is empty (zero length) or exceeds XATTR_NAME= _MAX + * (255 characters). Returned from import_xattr_name() via strncpy_fro= m_user(). + * + * error: E2BIG, Value too large + * desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Return= ed from + * setxattr_copy() before attempting to copy the value from userspace. + * + * error: EINVAL, Invalid argument + * desc: The flags parameter contains bits other than XATTR_CREATE and + * XATTR_REPLACE. Also returned for malformed capability values when s= etting + * security.capability, or when the xattr name doesn't match any handl= er prefix. + * + * error: EFAULT, Bad address + * desc: One of the user pointers (pathname, name, or value) is invalid = or + * points to memory that cannot be accessed. Returned from strncpy_fro= m_user() + * for pathname/name or vmemdup_user()/copy_from_user() for value. + * + * error: ENOMEM, Out of memory + * desc: Kernel could not allocate memory to copy the attribute value fr= om + * userspace (via vmemdup_user), or for namespace capability conversion + * (cap_convert_nscap allocates memory for v3 capability format). + * + * error: EOPNOTSUPP, Operation not supported + * desc: The filesystem does not support extended attributes (IOP_XATTR = not set), + * or no xattr handler exists for the given namespace prefix, or the h= andler + * does not implement the set operation. Also returned for POSIX ACL x= attrs + * (system.posix_acl_*) when CONFIG_FS_POSIX_ACL is disabled. + * + * error: EROFS, Read-only filesystem + * desc: The filesystem containing the file is mounted read-only. Return= ed from + * mnt_want_write() before attempting any modification. + * + * error: EIO, I/O error + * desc: The inode is marked as bad (is_bad_inode), indicating filesystem + * corruption or I/O failure. Also may be returned by filesystem-speci= fic + * xattr handler operations. + * + * error: EDQUOT, Disk quota exceeded + * desc: The user's disk quota for extended attributes has been exceeded. + * Filesystem-specific error returned from the handler's set operation. + * + * error: ENOSPC, No space left on device + * desc: The filesystem has insufficient space to store the extended att= ribute. + * Filesystem-specific error from handler's set operation. + * + * error: ELOOP, Too many symbolic links + * desc: Too many symbolic links were encountered during path resolution + * (more than MAXSYMLINKS, typically 40). + * + * error: ENAMETOOLONG, Filename too long + * desc: The pathname or a component of the pathname exceeds the system = limit + * (PATH_MAX or NAME_MAX). + * + * error: ENOTDIR, Not a directory + * desc: A component of the path prefix is not a directory. + * + * error: ESTALE, Stale file handle + * desc: The file handle became stale during the operation (NFS). The sy= scall + * automatically retries with LOOKUP_REVAL in this case. + * + * lock: inode->i_rwsem + * type: KAPI_LOCK_MUTEX + * desc: The inode's read-write semaphore is acquired exclusively via in= ode_lock() + * before calling __vfs_setxattr_locked() and released via inode_unloc= k() after. + * This serializes concurrent xattr modifications on the same inode. + * + * lock: sb->s_writers (superblock freeze protection) + * type: KAPI_LOCK_SEMAPHORE + * desc: Write access to the mount is acquired via mnt_want_write() whic= h calls + * sb_start_write(). This prevents filesystem freeze during the operat= ion. + * Released via mnt_drop_write() after the operation completes. + * + * lock: file_rwsem (delegation breaking) + * type: KAPI_LOCK_SEMAPHORE + * desc: If the file has NFSv4 delegations, the percpu file_rwsem is acq= uired + * during delegation breaking in __break_lease(). The syscall may wait= for + * delegation holders to acknowledge the break. + * + * signal: Any + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RESTART + * condition: Signal arrives during interruptible waits (delegation brea= king) + * desc: The syscall may wait for NFSv4 delegation holders to release th= eir + * delegations. During this wait, signals can interrupt the operation.= If a + * signal is pending, the wait may be interrupted and the operation re= tried. + * Most blocking points in this syscall use non-interruptible waits. + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: yes + * + * side-effect: KAPI_EFFECT_ALLOC_MEMORY + * target: Kernel buffer for attribute value + * desc: The attribute value is copied from userspace to a kernel buffer + * allocated via vmemdup_user(). This memory is freed (kvfree) after t= he + * operation completes, regardless of success or failure. + * reversible: yes + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: File's extended attributes + * desc: On success, the specified extended attribute is created or modi= fied. + * The change is typically persisted to storage synchronously or async= hronously + * depending on filesystem and mount options. + * reversible: yes + * condition: Operation succeeds + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: Inode flags (S_NOSEC) + * desc: When setting security.* attributes, the S_NOSEC flag is cleared= from + * the inode. This flag is an optimization that indicates no security = xattrs + * exist; clearing it ensures proper security checks on subsequent acc= esses. + * condition: Setting security.* namespace attribute + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: fsnotify event + * desc: On success, fsnotify_xattr() is called to notify any registered + * watchers (inotify, fanotify) of the extended attribute modification. + * This generates an IN_ATTRIB event. + * condition: Operation succeeds + * + * state-trans: extended attribute + * from: nonexistent or has old value + * to: has new value + * condition: Operation succeeds with flags=3D0 or appropriate flags + * desc: The extended attribute transitions from not existing (or having= its + * previous value) to containing the new value. With XATTR_CREATE, the + * attribute must not exist beforehand. With XATTR_REPLACE, it must ex= ist. + * + * capability: CAP_SYS_ADMIN + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Setting trusted.* namespace attributes and most security.* at= tributes + * without: Setting trusted.* returns EPERM. Setting security.* (except + * security.capability) returns EPERM. The check uses ns_capable() aga= inst + * the filesystem's user namespace. + * condition: Attribute name starts with "trusted." or "security." (exce= pt + * security.capability) + * + * capability: CAP_SETFCAP + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Setting the security.capability extended attribute + * without: Setting security.capability returns EPERM + * condition: Attribute name is "security.capability". Checked via + * capable_wrt_inode_uidgid() which considers the inode's ownership. + * + * capability: CAP_FOWNER + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypassing owner check for user.* on sticky directories + * without: Non-owners cannot set user.* attributes on files in sticky + * directories without this capability + * condition: Setting user.* namespace attribute on a file in a sticky d= irectory + * + * constraint: Filesystem support + * desc: The filesystem must support extended attributes (have IOP_XATTR= flag + * set and provide xattr handlers). Common filesystems supporting xatt= rs + * include ext4, XFS, Btrfs, and tmpfs. Some filesystems (e.g., FAT, o= lder + * ext2) do not support extended attributes. + * + * constraint: Filesystem-specific size limits + * desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may i= mpose + * smaller limits. For example, ext4 limits all xattrs on an inode to = fit + * in a single filesystem block (typically 4KB). XFS and ReiserFS supp= ort + * the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG. + * + * constraint: user.* namespace restrictions + * desc: The user.* namespace is only supported on regular files and dir= ectories. + * Attempting to set user.* attributes on other file types (symlinks, = devices, + * sockets, FIFOs) returns EPERM (for write) or ENODATA (for read). + * + * constraint: LSM checks + * desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose ad= ditional + * restrictions via security_inode_setxattr() hook. These can return v= arious + * error codes depending on the security policy. The LSM is called aft= er + * permission checks but before the actual xattr modification. + * + * examples: setxattr("/path/file", "user.comment", "test", 4, 0); // Set= user attr + * setxattr("/path/file", "user.new", "val", 3, XATTR_CREATE); // Creat= e only + * setxattr("/path/file", "user.existing", "new", 3, XATTR_REPLACE); //= Replace + * + * notes: Extended attributes provide a way to associate arbitrary metadat= a with + * files beyond the standard stat attributes. They are commonly used for: + * - SELinux security contexts (security.selinux) + * - File capabilities (security.capability) + * - POSIX ACLs (system.posix_acl_access, system.posix_acl_default) + * - User-defined metadata (user.* namespace) + * + * The trusted.* namespace is designed for use by privileged processes t= o store + * data that should not be accessible to unprivileged users (e.g., during + * backup/restore operations). + * + * NFSv4 delegation support means this syscall may need to wait for remo= te + * clients to release their delegations before the operation can complet= e. + * This can introduce unbounded delays in pathological cases. + * + * For security.capability specifically, the kernel may convert between = v2 + * (non-namespaced) and v3 (namespaced) capability formats depending on = the + * filesystem's user namespace and caller's capabilities. + * + * The setxattrat() syscall (added in Linux 6.17) provides more flexibil= ity + * with AT_FDCWD and AT_* flags for specifying the file location. + * + * since-version: 2.4 + */ SYSCALL_DEFINE5(setxattr, const char __user *, pathname, const char __user *, name, const void __user *, value, size_t, size, int, flags) --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DE182DC76E; Thu, 18 Dec 2025 20:42:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090575; cv=none; b=Ctc6PswLq8Nus8JqKE7fQwJDqbnZMRsAgN8Iodj4hwTQ+bJa1f9CDFAjETvO+Au3SrXBoQNAdgXrvITnf0fpRJHw2qYn4cuA12aX7LPnAC80ZbOQM0Tva4ZUNbF+oQiUUfUArDIwawM9ylrcJrbr/NA4oTGGo8fDTev8hhs2R34= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090575; c=relaxed/simple; bh=zAaMAeb+vma2V6BMJ/mtlrX8goEiJYIzNdlfNY7U8Ww=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SVfgSq34SL71AFXXIjags5vA2jlD2bkaQf3iIv9RYdGDRaoN+rU86xaBjbVgFvy92g6SWa50NaPEGxQst+aMitVOrnnIxEPfpAe/O6jRBt8FiGDq0egwFiJUQdR0lsLQkE/U10JYZb6Jt0TyYkDIfRY5PNTIdskzOA9X3qjAI/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WTJZ/lw8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WTJZ/lw8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5ECCC16AAE; Thu, 18 Dec 2025 20:42:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090575; bh=zAaMAeb+vma2V6BMJ/mtlrX8goEiJYIzNdlfNY7U8Ww=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WTJZ/lw8rYIFpL9b9tkPKX213FV1bCRVLL+jHd9ImLjY2+Ny0T2ZYFHQowKRf6XBp C73uqLOOMexjugPzLwLBJkPBzgTW7z04Ald2jtglKkukLpruvtFYqJUTpFBO2Gvyao SUq4Jk1Z3vZt+fWkvbqQQWoS2hWnsii8Miu7waxREfHIohVLsZ++d+yioq6vqcXl9G aFnbnMyPVh+gbF5705BavUCXfsCkmDfJtkaomBWvoYRTJBpaqPgPB4384jnrQsx6hN ZL6KDFxZ4xF1/p0xUTeIy3zLW2T1rtgDBlIZyVRxppN20iZiiWEoGGMSphDHibIppU TRKMy8GlJfziw== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 10/15] kernel/api: add API specification for lsetxattr Date: Thu, 18 Dec 2025 15:42:32 -0500 Message-ID: <20251218204239.4159453-11-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/xattr.c | 327 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 327 insertions(+) diff --git a/fs/xattr.c b/fs/xattr.c index 02a946227129e..466dcaf7ba83e 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -1057,6 +1057,333 @@ SYSCALL_DEFINE5(setxattr, const char __user *, path= name, return path_setxattrat(AT_FDCWD, pathname, 0, name, value, size, flags); } =20 +/** + * sys_lsetxattr - Set an extended attribute value on a symbolic link + * @pathname: Path to the file or symbolic link on which to set the attrib= ute + * @name: Null-terminated name of the extended attribute (includes namespa= ce prefix) + * @value: Buffer containing the attribute value to set + * @size: Size of the value buffer in bytes + * @flags: Flags controlling attribute creation/replacement behavior + * + * long-desc: Sets the value of an extended attribute identified by name on + * the file specified by pathname. Unlike setxattr(), this syscall does = not + * follow symbolic links - if pathname refers to a symbolic link, the + * extended attribute is set on the link itself, not on the file it refe= rs to. + * + * Extended attributes are name:value pairs associated with inodes (file= s, + * directories, symbolic links, etc.) that extend the normal attributes + * (stat data) associated with all inodes. + * + * The attribute name must include a namespace prefix. Valid namespaces = are: + * - "user." - User-defined attributes (regular files and directories on= ly) + * - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN) + * - "security." - Security module attributes (e.g., SELinux, Smack, cap= abilities) + * - "system." - System attributes (e.g., POSIX ACLs via system.posix_ac= l_access) + * + * The value can be arbitrary binary data or text. A zero-length value is + * permitted and creates an attribute with an empty value (different from + * removing the attribute). + * + * Note that not all filesystems support extended attributes on symbolic= links. + * Additionally, the user.* namespace is not available on symbolic links= since + * they are not regular files or directories. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: pathname + * type: KAPI_TYPE_PATH + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_USER_PATH + * constraint: Must be a valid null-terminated path string in user memor= y. + * The path is resolved WITHOUT following symbolic links - if the final + * component is a symbolic link, the operation applies to the link its= elf. + * Maximum path length is PATH_MAX (4096 bytes). The file or link must + * exist and the caller must have appropriate permissions. + * + * param: name + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_USER_STRING + * range: 1, 255 + * constraint: Must be a valid null-terminated string in user memory con= taining + * the extended attribute name with namespace prefix (e.g., "security.= selinux"). + * The name (including prefix) must be between 1 and XATTR_NAME_MAX (2= 55) + * characters. An empty name returns ERANGE. Note that user.* namespac= e is + * not supported on symbolic links. + * + * param: value + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid pointer to user memory containing the att= ribute + * value, or NULL if size is 0. When size is non-zero, the pointer mus= t be + * valid and accessible for size bytes. + * + * param: size + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, 65536 + * constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX + * (65536 bytes). Zero is permitted and creates an attribute with empt= y value. + * Filesystem-specific limits may be smaller (e.g., ext4 limits total = xattr + * space to one filesystem block, typically 4KB). + * + * param: flags + * type: KAPI_TYPE_INT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_MASK + * valid-mask: XATTR_CREATE | XATTR_REPLACE + * constraint: Controls creation/replacement behavior. Valid values are = 0, + * XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if t= he + * attribute already exists. XATTR_REPLACE fails if the attribute does= not + * exist. With flags=3D0, the attribute is created if it doesn't exist= or + * replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually ex= clusive. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_ERROR_CHECK + * success: 0 + * desc: Returns 0 on success. The extended attribute is set with the sp= ecified + * value on the symbolic link itself. Any previous value for the attri= bute + * is replaced. + * + * error: ENOENT, File or symlink not found + * desc: The file or symbolic link specified by pathname does not exist,= or a + * directory component in the path does not exist. Returned from path = lookup. + * + * error: EACCES, Permission denied + * desc: Permission denied during path resolution (search permission on = a directory + * component) or write access to the file is denied based on DAC permi= ssions. + * + * error: EPERM, Operation not permitted + * desc: Returned in several cases: (1) The file is marked immutable (ch= attr +i) + * or append-only (chattr +a). (2) For trusted.* namespace, caller lac= ks + * CAP_SYS_ADMIN in the filesystem's user namespace. (3) For security.* + * namespace (except security.capability), caller lacks CAP_SYS_ADMIN. + * (4) For user.* namespace on sticky directories, caller is not the o= wner + * and lacks CAP_FOWNER. (5) The inode has an unmapped ID in an idmapp= ed mount. + * (6) Attempting to set user.* namespace on a symbolic link (not supp= orted). + * + * error: ENODATA, Attribute not found + * desc: XATTR_REPLACE was specified but the named attribute does not ex= ist on + * the symbolic link. + * + * error: EEXIST, Attribute already exists + * desc: XATTR_CREATE was specified but the named attribute already exis= ts on + * the symbolic link. + * + * error: ERANGE, Name out of range + * desc: The attribute name is empty (zero length) or exceeds XATTR_NAME= _MAX + * (255 characters). Returned from import_xattr_name() via strncpy_fro= m_user(). + * + * error: E2BIG, Value too large + * desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Return= ed from + * setxattr_copy() before attempting to copy the value from userspace. + * + * error: EINVAL, Invalid argument + * desc: The flags parameter contains bits other than XATTR_CREATE and + * XATTR_REPLACE. Also returned for malformed capability values when s= etting + * security.capability, or when the xattr name doesn't match any handl= er prefix. + * + * error: EFAULT, Bad address + * desc: One of the user pointers (pathname, name, or value) is invalid = or + * points to memory that cannot be accessed. Returned from strncpy_fro= m_user() + * for pathname/name or vmemdup_user()/copy_from_user() for value. + * + * error: ENOMEM, Out of memory + * desc: Kernel could not allocate memory to copy the attribute value fr= om + * userspace (via vmemdup_user), or for namespace capability conversion + * (cap_convert_nscap allocates memory for v3 capability format). + * + * error: EOPNOTSUPP, Operation not supported + * desc: The filesystem does not support extended attributes on symbolic= links, + * or no xattr handler exists for the given namespace prefix, or the h= andler + * does not implement the set operation. Many filesystems do not suppo= rt + * setting xattrs on symbolic links. + * + * error: EROFS, Read-only filesystem + * desc: The filesystem containing the symbolic link is mounted read-onl= y. + * Returned from mnt_want_write() before attempting any modification. + * + * error: EIO, I/O error + * desc: The inode is marked as bad (is_bad_inode), indicating filesystem + * corruption or I/O failure. Also may be returned by filesystem-speci= fic + * xattr handler operations. + * + * error: EDQUOT, Disk quota exceeded + * desc: The user's disk quota for extended attributes has been exceeded. + * Filesystem-specific error returned from the handler's set operation. + * + * error: ENOSPC, No space left on device + * desc: The filesystem has insufficient space to store the extended att= ribute. + * Filesystem-specific error from handler's set operation. + * + * error: ELOOP, Too many symbolic links + * desc: Too many symbolic links were encountered during path resolution= of + * directory components (more than MAXSYMLINKS, typically 40). Note th= at the + * final component (the target of the operation) is not followed. + * + * error: ENAMETOOLONG, Filename too long + * desc: The pathname or a component of the pathname exceeds the system = limit + * (PATH_MAX or NAME_MAX). + * + * error: ENOTDIR, Not a directory + * desc: A component of the path prefix is not a directory. + * + * error: ESTALE, Stale file handle + * desc: The file handle became stale during the operation (NFS). The sy= scall + * automatically retries with LOOKUP_REVAL in this case. + * + * lock: inode->i_rwsem + * type: KAPI_LOCK_MUTEX + * acquired: true + * released: true + * desc: The inode's read-write semaphore is acquired exclusively via in= ode_lock() + * before calling __vfs_setxattr_locked() and released via inode_unloc= k() after. + * This serializes concurrent xattr modifications on the same inode. + * + * lock: sb->s_writers (superblock freeze protection) + * type: KAPI_LOCK_SEMAPHORE + * acquired: true + * released: true + * desc: Write access to the mount is acquired via mnt_want_write() whic= h calls + * sb_start_write(). This prevents filesystem freeze during the operat= ion. + * Released via mnt_drop_write() after the operation completes. + * + * lock: file_rwsem (delegation breaking) + * type: KAPI_LOCK_SEMAPHORE + * acquired: true + * released: true + * desc: If the file has NFSv4 delegations, the percpu file_rwsem is acq= uired + * during delegation breaking in __break_lease(). The syscall may wait= for + * delegation holders to acknowledge the break. + * + * signal: Any + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RESTART + * condition: Signal arrives during interruptible waits (delegation brea= king) + * desc: The syscall may wait for NFSv4 delegation holders to release th= eir + * delegations. During this wait, signals can interrupt the operation.= If a + * signal is pending, the wait may be interrupted and the operation re= tried. + * Most blocking points in this syscall use non-interruptible waits. + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: yes + * + * side-effect: KAPI_EFFECT_ALLOC_MEMORY + * target: Kernel buffer for attribute value + * desc: The attribute value is copied from userspace to a kernel buffer + * allocated via vmemdup_user(). This memory is freed (kvfree) after t= he + * operation completes, regardless of success or failure. + * reversible: yes + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: Symbolic link's extended attributes + * desc: On success, the specified extended attribute is created or modi= fied + * on the symbolic link itself. The change is typically persisted to s= torage + * synchronously or asynchronously depending on filesystem and mount o= ptions. + * reversible: yes + * condition: Operation succeeds + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: Inode flags (S_NOSEC) + * desc: When setting security.* attributes, the S_NOSEC flag is cleared= from + * the inode. This flag is an optimization that indicates no security = xattrs + * exist; clearing it ensures proper security checks on subsequent acc= esses. + * condition: Setting security.* namespace attribute + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: fsnotify event + * desc: On success, fsnotify_xattr() is called to notify any registered + * watchers (inotify, fanotify) of the extended attribute modification. + * This generates an IN_ATTRIB event. + * condition: Operation succeeds + * + * state-trans: extended attribute + * from: nonexistent or has old value + * to: has new value + * condition: Operation succeeds with flags=3D0 or appropriate flags + * desc: The extended attribute on the symbolic link transitions from not + * existing (or having its previous value) to containing the new value. + * With XATTR_CREATE, the attribute must not exist beforehand. With + * XATTR_REPLACE, it must exist. + * + * capability: CAP_SYS_ADMIN + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Setting trusted.* namespace attributes and most security.* at= tributes + * without: Setting trusted.* returns EPERM. Setting security.* (except + * security.capability) returns EPERM. The check uses ns_capable() aga= inst + * the filesystem's user namespace. + * condition: Attribute name starts with "trusted." or "security." (exce= pt + * security.capability) + * + * capability: CAP_SETFCAP + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Setting the security.capability extended attribute + * without: Setting security.capability returns EPERM + * condition: Attribute name is "security.capability". Checked via + * capable_wrt_inode_uidgid() which considers the inode's ownership. + * + * capability: CAP_FOWNER + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypassing owner check for user.* on sticky directories + * without: Non-owners cannot set user.* attributes on files in sticky + * directories without this capability + * condition: Setting user.* namespace attribute on a file in a sticky d= irectory + * + * constraint: Filesystem support for symlinks + * desc: Not all filesystems support extended attributes on symbolic lin= ks. + * Some filesystems (like ext4) may only support certain xattr namespa= ces + * on symlinks. The user.* namespace is explicitly not supported on sy= mbolic + * links since they are not regular files or directories. + * + * constraint: Filesystem-specific size limits + * desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may i= mpose + * smaller limits. For example, ext4 limits all xattrs on an inode to = fit + * in a single filesystem block (typically 4KB). XFS and ReiserFS supp= ort + * the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG. + * + * constraint: user.* namespace restrictions on symlinks + * desc: The user.* namespace is only supported on regular files and dir= ectories. + * Attempting to set user.* attributes on symbolic links returns EPERM. + * This is because user.* xattrs have permission semantics that don't = apply + * to symbolic links which anyone can follow. + * + * constraint: LSM checks + * desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose ad= ditional + * restrictions via security_inode_setxattr() hook. These can return v= arious + * error codes depending on the security policy. The LSM is called aft= er + * permission checks but before the actual xattr modification. + * + * examples: lsetxattr("/path/symlink", "security.selinux", ctx, len, 0); = // Set SELinux context on link + * lsetxattr("/path/symlink", "trusted.overlay.opaque", "y", 1, XATTR_CR= EATE); // Set overlay attr + * + * notes: This syscall is primarily used for security labeling of symbolic= links + * themselves (as opposed to their targets). Common use cases include: + * - SELinux security contexts on symbolic links (security.selinux) + * - Overlay filesystem metadata (trusted.overlay.*) + * - IMA/EVM integrity metadata (security.ima, security.evm) + * + * Unlike regular files and directories, symbolic links do not support t= he + * user.* xattr namespace. This is because user.* xattrs require ownersh= ip + * or capability checks that don't make sense for symlinks which can be + * followed by anyone with directory access. + * + * The trusted.* namespace on symbolic links requires CAP_SYS_ADMIN and = is + * commonly used by overlay filesystems to store metadata about redirect= ed + * or opaque directories. + * + * NFSv4 delegation support means this syscall may need to wait for remo= te + * clients to release their delegations before the operation can complet= e. + * + * This syscall was introduced alongside setxattr(), fsetxattr(), and the + * corresponding get/list/remove variants in Linux 2.4 to provide the + * non-following behavior needed for backup/restore tools and security + * labeling of links. + * + * since-version: 2.4 + */ SYSCALL_DEFINE5(lsetxattr, const char __user *, pathname, const char __user *, name, const void __user *, value, size_t, size, int, flags) --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 567AE2DF15C; Thu, 18 Dec 2025 20:42:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090576; cv=none; b=EAXcYqbd7TTTSbCD5Vo6cqQ3sHu6m65wZ3sZL4IlbUEBVYDKBZwN01WM/VH9SWcxIQ8H+GMtY4P3TdtYIKtN3ecO0M604AjXEy173Z6cgufyRpVGneKKe1RTck9HyLIS4uiLrnfg6vGPEw1Vi/5KRdw3F2Vc791dubFVJmz2Sik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090576; c=relaxed/simple; bh=UressouWWb0CVfcr/8Lw1heICtBbKVFNZZUC5WY7CNk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UHV5vGNEfITPxSwJz+1KFWeMWtUITorV81tVWuVTyiOUSPdRax17iCgjKabx9rzApnyzUSyUsDxrJqvJ2miLy2RiG1q64eWDu/RYvBonAdgwWdued/7QnQAII7XbVuAE47STtnSNR++xgc0L/+gvoO2P5nYmCIDaMRUvVEdjPmM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ti1XsvOl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ti1XsvOl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FFCDC4CEFB; Thu, 18 Dec 2025 20:42:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090576; bh=UressouWWb0CVfcr/8Lw1heICtBbKVFNZZUC5WY7CNk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ti1XsvOlWeDwUMnXGbMLW56pQkKeKOzYCzBZaiHxuN4MeK+7os/YgKuBOm+jHZVfP b3ipD28uEzsWLzSN1qpUuE+SXqG5SU5Y/4ck80cW+uG6ZL2l3IqUM+GfXOdbaaJFY0 ded6pi9w7cr1FYRE9LNVVkBXaYxFYWdkcLNk8iP23JqA7h5I7MB1uy5B3NXPxJHnrA QRqDysT3JjDjo8WIvaiO5Y+qcrbGYNZdw1lo1TwNvl0yL66GRYC0n2cqimDV5k7uTC 8zQO54JwFjOzUrGrfOi0pvouPlitVCwW3lWJnY8VDClQNUTFLcsBRX5PFShpjq6ScX sXQ0f6xJKY77w== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 11/15] kernel/api: add API specification for fsetxattr Date: Thu, 18 Dec 2025 15:42:33 -0500 Message-ID: <20251218204239.4159453-12-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/xattr.c | 322 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 322 insertions(+) diff --git a/fs/xattr.c b/fs/xattr.c index 466dcaf7ba83e..8a27c11905f7e 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -1392,6 +1392,328 @@ SYSCALL_DEFINE5(lsetxattr, const char __user *, pat= hname, value, size, flags); } =20 +/** + * sys_fsetxattr - Set an extended attribute value on an open file descrip= tor + * @fd: File descriptor of the file on which to set the extended attribute + * @name: Null-terminated name of the extended attribute (includes namespa= ce prefix) + * @value: Buffer containing the attribute value to set + * @size: Size of the value buffer in bytes + * @flags: Flags controlling attribute creation/replacement behavior + * + * long-desc: Sets the value of an extended attribute identified by name on + * the file referred to by the open file descriptor fd. Extended attribu= tes + * are name:value pairs associated with inodes (files, directories, symb= olic + * links, etc.) that extend the normal attributes (stat data) associated= with + * all inodes. + * + * This syscall is similar to setxattr() but operates on an already-open= file + * descriptor rather than a pathname. This is useful when the file is al= ready + * open, when the caller wants to avoid race conditions between opening = and + * setting attributes, or when operating on file descriptors that cannot= be + * easily reopened. + * + * The attribute name must include a namespace prefix. Valid namespaces = are: + * - "user." - User-defined attributes (regular files and directories on= ly) + * - "trusted." - Trusted attributes (requires CAP_SYS_ADMIN) + * - "security." - Security module attributes (e.g., SELinux, Smack, cap= abilities) + * - "system." - System attributes (e.g., POSIX ACLs via system.posix_ac= l_access) + * + * The value can be arbitrary binary data or text. A zero-length value is + * permitted and creates an attribute with an empty value (different from + * removing the attribute). + * + * The file descriptor must have been opened for writing to modify exten= ded + * attributes. The file descriptor cannot be an O_PATH file descriptor. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: fd + * type: KAPI_TYPE_FD + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid file descriptor returned by open(), creat= (), + * or similar syscalls. The file descriptor cannot be an O_PATH file + * descriptor. The file must be on a filesystem that is not mounted + * read-only. AT_FDCWD (-100) is NOT valid for this syscall as it oper= ates + * on file descriptors, not directory handles. + * + * param: name + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_USER_STRING + * range: 1, 255 + * constraint: Must be a valid null-terminated string in user memory con= taining + * the extended attribute name with namespace prefix (e.g., "user.myat= tr"). + * The name (including prefix) must be between 1 and XATTR_NAME_MAX (2= 55) + * characters. An empty name returns ERANGE. + * + * param: value + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER | KAPI_PARAM_OPTIONAL + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must be a valid pointer to user memory containing the att= ribute + * value, or NULL if size is 0. When size is non-zero, the pointer mus= t be + * valid and accessible for size bytes. + * + * param: size + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, 65536 + * constraint: Size of the value in bytes. Must not exceed XATTR_SIZE_MAX + * (65536 bytes). Zero is permitted and creates an attribute with empt= y value. + * Filesystem-specific limits may be smaller (e.g., ext4 limits total = xattr + * space to one filesystem block, typically 4KB). + * + * param: flags + * type: KAPI_TYPE_INT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_MASK + * valid-mask: XATTR_CREATE | XATTR_REPLACE + * constraint: Controls creation/replacement behavior. Valid values are = 0, + * XATTR_CREATE (0x1), or XATTR_REPLACE (0x2). XATTR_CREATE fails if t= he + * attribute already exists. XATTR_REPLACE fails if the attribute does= not + * exist. With flags=3D0, the attribute is created if it doesn't exist= or + * replaced if it does. XATTR_CREATE and XATTR_REPLACE are mutually ex= clusive. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_ERROR_CHECK + * success: 0 + * desc: Returns 0 on success. The extended attribute is set with the sp= ecified + * value. Any previous value for the attribute is replaced. + * + * error: EBADF, Bad file descriptor + * desc: The file descriptor fd is not valid or is not open for writing.= This + * is returned from the fd class lookup when the file descriptor does = not + * refer to an open file. + * + * error: EPERM, Operation not permitted + * desc: Returned when: (1) file is immutable or append-only, (2) truste= d.* + * without CAP_SYS_ADMIN, (3) security.* (except capability) without + * CAP_SYS_ADMIN, (4) user.* on sticky dir without ownership/CAP_FOWNE= R, + * (5) unmapped ID in idmapped mount, (6) user.* on non-regular/non-di= r. + * + * error: ENODATA, Attribute not found + * desc: XATTR_REPLACE was specified but the named attribute does not ex= ist on + * the file. Also returned when reading trusted.* without CAP_SYS_ADMI= N. + * + * error: EEXIST, Attribute already exists + * desc: XATTR_CREATE was specified but the named attribute already exis= ts on + * the file. + * + * error: ERANGE, Name out of range + * desc: The attribute name is empty (zero length) or exceeds XATTR_NAME= _MAX + * (255 characters). Returned from import_xattr_name() via strncpy_fro= m_user(). + * + * error: E2BIG, Value too large + * desc: The size parameter exceeds XATTR_SIZE_MAX (65536 bytes). Return= ed from + * setxattr_copy() before attempting to copy the value from userspace. + * + * error: EINVAL, Invalid argument + * desc: The flags parameter contains bits other than XATTR_CREATE and + * XATTR_REPLACE. Also returned for malformed capability values when s= etting + * security.capability (invalid header format, invalid rootid mapping)= , or + * when the xattr name doesn't match any handler prefix. + * + * error: EFAULT, Bad address + * desc: One of the user pointers (name or value) is invalid or points to + * memory that cannot be accessed. Returned from strncpy_from_user() f= or + * name or vmemdup_user()/copy_from_user() for value. + * + * error: ENOMEM, Out of memory + * desc: Kernel could not allocate memory to copy the attribute value fr= om + * userspace (via vmemdup_user), or for namespace capability conversion + * (cap_convert_nscap allocates memory for v3 capability format). + * + * error: EOPNOTSUPP, Operation not supported + * desc: The filesystem does not support extended attributes (IOP_XATTR = not set), + * or no xattr handler exists for the given namespace prefix, or the h= andler + * does not implement the set operation. Also returned for POSIX ACL x= attrs + * (system.posix_acl_*) when CONFIG_FS_POSIX_ACL is disabled. + * + * error: EROFS, Read-only filesystem + * desc: The filesystem containing the file is mounted read-only. Return= ed from + * mnt_want_write_file() before attempting any modification. + * + * error: EIO, I/O error + * desc: The inode is marked as bad (is_bad_inode), indicating filesystem + * corruption or I/O failure. Also may be returned by filesystem-speci= fic + * xattr handler operations. + * + * error: EDQUOT, Disk quota exceeded + * desc: The user's disk quota for extended attributes has been exceeded. + * Filesystem-specific error returned from the handler's set operation. + * + * error: ENOSPC, No space left on device + * desc: The filesystem has insufficient space to store the extended att= ribute. + * Filesystem-specific error from handler's set operation. + * + * error: EACCES, Permission denied + * desc: Write access to the file is denied based on DAC permissions. Th= e caller + * does not have appropriate permission to modify xattrs on this file. + * + * lock: inode->i_rwsem + * type: KAPI_LOCK_MUTEX + * acquired: true + * released: true + * desc: The inode's read-write semaphore is acquired exclusively via in= ode_lock() + * before calling __vfs_setxattr_locked() and released via inode_unloc= k() after. + * This serializes concurrent xattr modifications on the same inode. + * + * lock: sb->s_writers (superblock freeze protection) + * type: KAPI_LOCK_SEMAPHORE + * acquired: true + * released: true + * desc: Write access to the mount is acquired via mnt_want_write_file()= which + * calls sb_start_write(). This prevents filesystem freeze during the = operation. + * Released via mnt_drop_write_file() after the operation completes. + * + * lock: file_rwsem (delegation breaking) + * type: KAPI_LOCK_SEMAPHORE + * acquired: true + * released: true + * desc: If the file has NFSv4 delegations, the percpu file_rwsem is acq= uired + * during delegation breaking in __break_lease(). The syscall may wait= for + * delegation holders to acknowledge the break. + * + * signal: Any + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RESTART + * condition: Signal arrives during interruptible wait for delegation br= eaking + * desc: The syscall may wait for NFSv4 delegation holders to release th= eir + * delegations via wait_event_interruptible_timeout() in __break_lease= (). + * During this wait, signals can interrupt the operation. If a signal = is + * pending, the wait is interrupted and the operation may be retried by + * the kernel automatically if the signal disposition allows (SA_RESTA= RT). + * error: -ERESTARTSYS + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: yes + * + * side-effect: KAPI_EFFECT_ALLOC_MEMORY + * target: Kernel buffer for attribute value + * desc: The attribute value is copied from userspace to a kernel buffer + * allocated via vmemdup_user(). This memory is freed (kvfree) after t= he + * operation completes, regardless of success or failure. + * reversible: yes + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: File's extended attributes + * desc: On success, the specified extended attribute is created or modi= fied. + * The change is typically persisted to storage synchronously or async= hronously + * depending on filesystem and mount options. + * reversible: yes + * condition: Operation succeeds + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: Inode flags (S_NOSEC) + * desc: When setting security.* attributes, the S_NOSEC flag is cleared= from + * the inode. This flag is an optimization that indicates no security = xattrs + * exist; clearing it ensures proper security checks on subsequent acc= esses. + * condition: Setting security.* namespace attribute + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: fsnotify event + * desc: On success, fsnotify_xattr() is called to notify any registered + * watchers (inotify, fanotify) of the extended attribute modification. + * This generates an IN_ATTRIB event. + * condition: Operation succeeds + * + * state-trans: extended attribute + * from: nonexistent or has old value + * to: has new value + * condition: Operation succeeds with flags=3D0 or appropriate flags + * desc: The extended attribute transitions from not existing (or having= its + * previous value) to containing the new value. With XATTR_CREATE, the + * attribute must not exist beforehand. With XATTR_REPLACE, it must ex= ist. + * + * capability: CAP_SYS_ADMIN + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Setting trusted.* namespace attributes and most security.* at= tributes + * without: Setting trusted.* returns EPERM. Setting security.* (except + * security.capability) returns EPERM. The check uses ns_capable() aga= inst + * the filesystem's user namespace. + * condition: Attribute name starts with "trusted." or "security." (exce= pt + * security.capability) + * + * capability: CAP_SETFCAP + * type: KAPI_CAP_GRANT_PERMISSION + * allows: Setting the security.capability extended attribute + * without: Setting security.capability returns EPERM + * condition: Attribute name is "security.capability". Checked via + * capable_wrt_inode_uidgid() which considers the inode's ownership. + * + * capability: CAP_FOWNER + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypassing owner check for user.* on sticky directories + * without: Non-owners cannot set user.* attributes on files in sticky + * directories without this capability + * condition: Setting user.* namespace attribute on a file in a sticky d= irectory + * + * constraint: Filesystem support + * desc: The filesystem must support extended attributes (have IOP_XATTR= flag + * set and provide xattr handlers). Common filesystems supporting xatt= rs + * include ext4, XFS, Btrfs, and tmpfs. Some filesystems (e.g., FAT, o= lder + * ext2) do not support extended attributes. + * + * constraint: Filesystem-specific size limits + * desc: While the VFS limit is 64KB (XATTR_SIZE_MAX), filesystems may i= mpose + * smaller limits. For example, ext4 limits all xattrs on an inode to = fit + * in a single filesystem block (typically 4KB). XFS and ReiserFS supp= ort + * the full 64KB. Exceeding filesystem limits returns ENOSPC or E2BIG. + * + * constraint: user.* namespace restrictions + * desc: The user.* namespace is only supported on regular files and dir= ectories. + * Attempting to set user.* attributes on other file types (symlinks, = devices, + * sockets, FIFOs) returns EPERM (for write) or ENODATA (for read). + * + * constraint: LSM checks + * desc: Linux Security Modules (SELinux, Smack, AppArmor) may impose ad= ditional + * restrictions via security_inode_setxattr() hook. These can return v= arious + * error codes depending on the security policy. The LSM is called aft= er + * permission checks but before the actual xattr modification. + * + * constraint: File descriptor must not be O_PATH + * desc: The file descriptor must be a regular file descriptor, not one = opened + * with O_PATH. O_PATH file descriptors do not provide access to the f= ile + * contents or metadata modification operations. + * + * examples: fsetxattr(fd, "user.comment", "test", 4, 0); // Set user attr + * fsetxattr(fd, "user.new", "val", 3, XATTR_CREATE); // Create only, f= ail if exists + * fsetxattr(fd, "user.existing", "new", 3, XATTR_REPLACE); // Replace = only + * fsetxattr(fd, "user.empty", "", 0, 0); // Create attribute with empt= y value + * + * notes: Extended attributes provide a way to associate arbitrary metadat= a with + * files beyond the standard stat attributes. They are commonly used for: + * - SELinux security contexts (security.selinux) + * - File capabilities (security.capability) + * - POSIX ACLs (system.posix_acl_access, system.posix_acl_default) + * - User-defined metadata (user.* namespace) + * + * Using fsetxattr() with an already-open file descriptor avoids potenti= al + * TOCTOU (time-of-check-time-of-use) race conditions that can occur when + * using setxattr() with a pathname, where the file might be replaced be= tween + * opening and setting the attribute. + * + * The trusted.* namespace is designed for use by privileged processes t= o store + * data that should not be accessible to unprivileged users (e.g., during + * backup/restore operations). + * + * NFSv4 delegation support means this syscall may need to wait for remo= te + * clients to release their delegations before the operation can complet= e. + * This can introduce unbounded delays in pathological cases. + * + * For security.capability specifically, the kernel may convert between = v2 + * (non-namespaced) and v3 (namespaced) capability formats depending on = the + * filesystem's user namespace and caller's capabilities. + * + * Unlike setxattr() and lsetxattr(), fsetxattr() does not involve path + * resolution, so errors related to path traversal (ENOENT, ENOTDIR, + * ENAMETOOLONG, ELOOP, ESTALE) are not possible. + * + * since-version: 2.4 + */ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name, const void __user *,value, size_t, size, int, flags) { --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40C932F8BC3; Thu, 18 Dec 2025 20:42:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090577; cv=none; b=Q0l2yfI0AyfXnQx7ro0linkunAGeKs+y/CS5MEUsBGHY25/sXq/2S9Tyw1RkQkoiqWCc9K7F++XnQ9rqhFO59e7/O17B+rlDDck0UlxBgHR7FpM8RHnpfUl28HnH96LYK3nRi97q0vV84fHhdjNeqXjTemKO8kPHC1V3S8VseFE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090577; c=relaxed/simple; bh=BM0Tp6/hpETGMj8bjOuQYX6DLQT32oEdSKx7sQoel2s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NzQOnAFVMfQu0Iacq/+KfrIXJgx7Hhdy+3qHpzm+Qoib7kFo25mzPmCflo7vD14bWavGvgeSq9jE0h6eZ8B3nOGf5iSuEf5yRWqCmWL6Sfpbbwi94wE6JBvlFhue35al/zL2Fs2e3tmnTBrnTmtEyXRChUq3sGQ+q/ILln3Jr5A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y1hfNbMR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y1hfNbMR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 797B1C19421; Thu, 18 Dec 2025 20:42:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090577; bh=BM0Tp6/hpETGMj8bjOuQYX6DLQT32oEdSKx7sQoel2s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y1hfNbMRRXTJ9MDb15OkxqLB90RWP/BXKiBbTCybYy8o69bnOKc5T+zJbaAE9nKQv FBhmc2IY52Qtpl4yYh4jkIwL486Ushx27gjTLJeHFbGVLbXckXUGFkqOrgc0yl2w/U C6w+ixpfjcg9sxW0WZnZnFfp3yslcBEZflxpZE34quuELTRDN4uxH5MB6a0fSWOSdm MbqtoCIu7NL66WThH8F7NrXgpQHeYI0yMiI9KesttRAIFLoHFFFIzI1y1RsPCjc5dh cECeGN+76Tu1Gz+W26ETkONUtZxdWjjuAm+te+AC6n4288ezMFQAryxhNSrWJ5MYcR mZU6RqnGKFAOw== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 12/15] kernel/api: add API specification for sys_open Date: Thu, 18 Dec 2025 15:42:34 -0500 Message-ID: <20251218204239.4159453-13-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/open.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 318 insertions(+) diff --git a/fs/open.c b/fs/open.c index f328622061c56..343e6d3798ec3 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1437,6 +1437,324 @@ int do_sys_open(int dfd, const char __user *filenam= e, int flags, umode_t mode) } =20 =20 +/** + * sys_open - Open or create a file + * @filename: Pathname of the file to open or create + * @flags: File access mode and behavior flags (O_RDONLY, O_WRONLY, O_RDWR= , etc.) + * @mode: File permission bits for newly created files (only with O_CREAT/= O_TMPFILE) + * + * long-desc: Opens the file specified by pathname. If O_CREAT or O_TMPFIL= E is + * specified in flags, the file is created if it does not exist; its mod= e is + * set according to the mode parameter modified by the process's umask. + * + * The flags argument must include one of the following access modes: O_= RDONLY + * (read-only), O_WRONLY (write-only), or O_RDWR (read/write). These are= the + * low-order two bits of flags. In addition, zero or more file creation = and + * file status flags can be bitwise-ORed in flags. + * + * File creation flags: O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC, O_DIRECTORY, + * O_NOFOLLOW, O_CLOEXEC, O_TMPFILE. These flags affect open behavior. + * + * File status flags: O_APPEND, O_ASYNC, O_DIRECT, O_DSYNC, O_LARGEFILE, + * O_NOATIME, O_NONBLOCK (O_NDELAY), O_PATH, O_SYNC. These become part o= f the + * file's open file description and can be retrieved/modified with fcntl= (). + * + * The return value is a file descriptor, a small nonnegative integer us= ed in + * subsequent system calls (read, write, lseek, fcntl, etc.) to refer to= the + * open file. The file descriptor returned by a successful open is the l= owest- + * numbered file descriptor not currently open for the process. + * + * On 64-bit systems, O_LARGEFILE is automatically added to the flags. O= n 32-bit + * systems, files larger than 2GB require O_LARGEFILE to be explicitly s= et. + * + * This syscall is a legacy interface. Modern code should prefer openat(= ) for + * relative path operations and openat2() for additional control via res= olve + * flags. The open() call is equivalent to openat(AT_FDCWD, pathname, fl= ags). + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: filename + * type: KAPI_TYPE_PATH + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_USER_PATH + * constraint: Must be a valid null-terminated path string in user memor= y. + * Maximum path length is PATH_MAX (4096 bytes) including null termina= tor. + * For relative paths, resolution starts from current working director= y. + * The path is followed (symlinks resolved) unless O_NOFOLLOW is speci= fied. + * + * param: flags + * type: KAPI_TYPE_INT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_MASK + * valid-mask: O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTT= Y | + * O_TRUNC | O_APPEND | O_NONBLOCK | O_DSYNC | O_SYNC | FASY= NC | + * O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | O_NOA= TIME | + * O_CLOEXEC | O_PATH | O_TMPFILE + * constraint: Must include exactly one of O_RDONLY (0), O_WRONLY (1), or + * O_RDWR (2) as the access mode. Additional flags may be ORed. Invali= d flag + * combinations (e.g., O_DIRECTORY|O_CREAT, O_PATH with incompatible f= lags, + * O_TMPFILE without O_DIRECTORY, O_TMPFILE with read-only mode) return + * EINVAL. Unknown flags are silently ignored for backward compatibili= ty + * (unlike openat2 which rejects them). + * + * param: mode + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_MASK + * valid-mask: S_ISUID | S_ISGID | S_ISVTX | S_IRWXU | S_IRWXG | S_IRWXO + * constraint: Only meaningful when O_CREAT or O_TMPFILE is specified in + * flags. Specifies the file mode bits (permissions and setuid/setgid/= sticky + * bits) for a newly created file. The effective mode is (mode & ~umas= k). + * When O_CREAT/O_TMPFILE is not set, mode is ignored. Mode values exc= eeding + * S_IALLUGO (07777) are masked off. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_FD + * success: >=3D 0 + * desc: On success, returns a new file descriptor (non-negative integer= ). + * The returned file descriptor is the lowest-numbered descriptor not + * currently open for the process. On error, returns -1 and errno is s= et. + * + * error: EACCES, Permission denied + * desc: The requested access to the file is not allowed, or search perm= ission + * is denied for one of the directories in the path prefix of pathname= , or + * the file did not exist yet and write access to the parent directory= is + * not allowed, or O_TRUNC is specified but write permission is denied= , or + * the file is on a filesystem mounted with noexec and MAY_EXEC was im= plied. + * + * error: EBUSY, Device or resource busy + * desc: O_EXCL was specified in flags and pathname refers to a block de= vice + * that is in use by the system (e.g., it is mounted). + * + * error: EDQUOT, Disk quota exceeded + * desc: O_CREAT is specified and the file does not exist, and the user'= s quota + * of disk blocks or inodes on the filesystem has been exhausted. + * + * error: EEXIST, File exists + * desc: O_CREAT and O_EXCL were specified in flags, but pathname alread= y exists. + * This error is atomic with respect to file creation - it prevents ra= ce + * conditions (TOCTOU) when creating files. + * + * error: EFAULT, Bad address + * desc: pathname points outside the process's accessible address space. + * + * error: EINTR, Interrupted system call + * desc: The call was interrupted by a signal handler before completing = file + * open. This can occur during lock acquisition or when breaking lease= s. + * + * error: EINVAL, Invalid argument + * desc: Returned for several conditions: (1) Invalid O_* flag combinati= ons + * (O_DIRECTORY|O_CREAT, O_TMPFILE without O_DIRECTORY, O_TMPFILE with + * read-only access, O_PATH with flags other than O_DIRECTORY|O_NOFOLL= OW| + * O_CLOEXEC). (2) mode contains bits outside S_IALLUGO when O_CREAT/O= _TMPFILE + * is set (openat2 only). (3) O_DIRECT requested but filesystem doesn't + * support it. (4) The filesystem does not support O_SYNC or O_DSYNC. + * + * error: EISDIR, Is a directory + * desc: pathname refers to a directory and the access requested involved + * writing (O_WRONLY, O_RDWR, or O_TRUNC). Also returned when O_TMPFIL= E is + * used on a directory that doesn't support tmpfile operations. + * + * error: ELOOP, Too many symbolic links + * desc: Too many symbolic links were encountered in resolving pathname,= or + * O_NOFOLLOW was specified but pathname refers to a symbolic link. + * + * error: EMFILE, Too many open files + * desc: The per-process limit on the number of open file descriptors ha= s been + * reached. This limit is RLIMIT_NOFILE (default typically 1024, max s= et by + * /proc/sys/fs/nr_open). + * + * error: ENAMETOOLONG, File name too long + * desc: pathname was too long, exceeding PATH_MAX (4096) bytes, or a si= ngle + * path component exceeded NAME_MAX (usually 255) bytes. + * + * error: ENFILE, Too many open files in system + * desc: The system-wide limit on the total number of open files has been + * reached (/proc/sys/fs/file-max). Processes with CAP_SYS_ADMIN can e= xceed + * this limit. + * + * error: ENODEV, No such device + * desc: pathname refers to a special file that has no corresponding dev= ice, or + * the file's inode has no file operations assigned. + * + * error: ENOENT, No such file or directory + * desc: A directory component in pathname does not exist or is a dangli= ng + * symbolic link, or O_CREAT is not set and the named file does not ex= ist, + * or pathname is an empty string (unless AT_EMPTY_PATH is used with o= penat2). + * + * error: ENOMEM, Out of memory + * desc: The kernel could not allocate sufficient memory for the file st= ructure, + * path lookup structures, or the filename buffer. + * + * error: ENOSPC, No space left on device + * desc: O_CREAT was specified and the file does not exist, and the dire= ctory + * or filesystem containing the file has no room for a new file entry. + * + * error: ENOTDIR, Not a directory + * desc: A component used as a directory in pathname is not actually a d= irectory, + * or O_DIRECTORY was specified and pathname was not a directory. + * + * error: ENXIO, No such device or address + * desc: O_NONBLOCK | O_WRONLY is set and the named file is a FIFO and no + * process has the FIFO open for reading. Also returned when opening a= device + * special file that does not exist. + * + * error: EOPNOTSUPP, Operation not supported + * desc: The filesystem containing pathname does not support O_TMPFILE. + * + * error: EOVERFLOW, Value too large for defined data type + * desc: pathname refers to a regular file that is too large to be opene= d. + * This occurs on 32-bit systems without O_LARGEFILE when the file size + * exceeds 2GB (2^31 - 1 bytes). + * + * error: EPERM, Operation not permitted + * desc: O_NOATIME flag was specified but the effective UID of the calle= r did + * not match the owner of the file and the caller is not privileged, o= r the + * file is append-only and O_TRUNC was specified or write mode without + * O_APPEND, or the file is immutable, or a seal prevents the operatio= n. + * + * error: EROFS, Read-only file system + * desc: pathname refers to a file on a read-only filesystem and write a= ccess + * was requested. + * + * error: ETXTBSY, Text file busy + * desc: pathname refers to an executable image which is currently being + * executed, or to a swap file, and write access or truncation was req= uested. + * + * error: EWOULDBLOCK, Resource temporarily unavailable + * desc: O_NONBLOCK was specified and an incompatible lease is held on t= he file. + * + * lock: files->file_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: Acquired when allocating a file descriptor slot. Held briefly d= uring + * fd allocation via alloc_fd() and released before the syscall return= s. + * + * lock: inode->i_rwsem (parent directory) + * type: KAPI_LOCK_RWLOCK + * acquired: conditional + * released: true + * desc: Write lock acquired on parent directory inode when creating a n= ew file + * (O_CREAT). Acquired via inode_lock_nested() in lookup path. May use + * killable variant which can return EINTR on fatal signal. + * + * lock: RCU read-side + * type: KAPI_LOCK_RCU + * acquired: true + * released: true + * desc: Path lookup uses RCU mode initially for performance. If RCU loo= kup + * fails (returns -ECHILD), falls back to reference-based lookup. + * + * signal: Any signal + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RETURN + * condition: When blocked on interruptible or killable operations + * desc: The syscall may be interrupted during path lookup, lock acquisi= tion, + * or lease breaking. Fatal signals (SIGKILL, etc.) will interrupt kil= lable + * operations. Non-fatal signals may interrupt interruptible operation= s. + * error: -EINTR + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: yes + * + * side-effect: KAPI_EFFECT_RESOURCE_CREATE | KAPI_EFFECT_ALLOC_MEMORY + * target: file descriptor, file structure, dentry cache + * desc: Allocates a new file descriptor in the process's fd table. Allo= cates + * a struct file from the filp slab cache. May allocate dentries and i= nodes + * during path lookup. System-wide file count (nr_files) is incremente= d. + * reversible: yes + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: filesystem, inode + * condition: When O_CREAT is specified and file doesn't exist + * desc: Creates a new file on the filesystem. Creates new inode, alloca= tes + * data blocks as needed, and creates directory entry. Updates parent + * directory mtime and ctime. + * reversible: no + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: file content + * condition: When O_TRUNC is specified for existing file + * desc: Truncates the file to zero length, releasing data blocks. Updat= es + * file mtime and ctime. May trigger notifications to lease holders. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: inode timestamps + * condition: Unless O_NOATIME is specified + * desc: Opens for reading may update inode access time (atime) unless m= ounted + * with noatime/relatime or O_NOATIME is specified. Opens for writing = that + * truncate or create update mtime and ctime. + * + * capability: CAP_DAC_OVERRIDE + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass file read, write, and execute permission checks + * without: Standard DAC (discretionary access control) checks are appli= ed + * condition: Checked when file permission would otherwise deny access + * + * capability: CAP_DAC_READ_SEARCH + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass read permission on files and search permission on dire= ctories + * without: Must have read permission on file or search permission on di= rectory + * condition: Checked during path traversal and file open + * + * capability: CAP_FOWNER + * type: KAPI_CAP_BYPASS_CHECK + * allows: Use O_NOATIME on files not owned by caller + * without: O_NOATIME returns EPERM if caller is not file owner + * condition: Checked when O_NOATIME is specified and caller is not owner + * + * capability: CAP_SYS_ADMIN + * type: KAPI_CAP_INCREASE_LIMIT + * allows: Exceed the system-wide file limit (file-max) + * without: Returns ENFILE when system limit is reached + * condition: Checked in alloc_empty_file() when nr_files >=3D max_files + * + * constraint: RLIMIT_NOFILE (per-process fd limit) + * desc: The returned file descriptor must be less than the process's + * RLIMIT_NOFILE limit. Default is typically 1024, maximum is controll= ed + * by /proc/sys/fs/nr_open (default 1048576). Exceeding returns EMFILE. + * expr: fd < rlimit(RLIMIT_NOFILE) + * + * constraint: file-max (system-wide limit) + * desc: System-wide limit on open files in /proc/sys/fs/file-max. Proce= sses + * without CAP_SYS_ADMIN receive ENFILE when this limit is reached. The + * limit is computed based on system memory at boot time. + * expr: nr_files < files_stat.max_files || capable(CAP_SYS_ADMIN) + * + * constraint: PATH_MAX + * desc: Maximum length of pathname including null terminator is PATH_MAX + * (4096 bytes). Individual path components must not exceed NAME_MAX (= 255). + * + * examples: fd =3D open("/etc/passwd", O_RDONLY); // Read existing file + * fd =3D open("/tmp/newfile", O_WRONLY | O_CREAT | O_TRUNC, 0644); // = Create/truncate + * fd =3D open("/tmp/lockfile", O_WRONLY | O_CREAT | O_EXCL, 0600); // = Exclusive create + * fd =3D open("/dev/null", O_RDWR); // Open device + * fd =3D open("/tmp", O_RDONLY | O_DIRECTORY); // Open directory + * fd =3D open("/tmp", O_TMPFILE | O_RDWR, 0600); // Anonymous temp file + * + * notes: The distinction between O_RDONLY, O_WRONLY, and O_RDWR is critic= al. + * O_RDONLY is defined as 0, so (flags & O_RDONLY) will be true for all = flags. + * Test access mode using (flags & O_ACCMODE) =3D=3D O_RDONLY. + * + * When O_CREAT is specified without O_EXCL, there is a race condition b= etween + * testing for file existence and creating it. Use O_CREAT | O_EXCL for = atomic + * exclusive file creation. + * + * O_CLOEXEC should be used in multithreaded programs to prevent file de= scriptor + * leaks to child processes between fork() and execve(). + * + * O_DIRECT has alignment requirements that vary by filesystem. Use stat= x() + * with STATX_DIOALIGN (Linux 6.1+) to query requirements. Unaligned I/O= may + * fail with EINVAL or fall back to buffered I/O. + * + * O_PATH opens a file descriptor that can be used only for certain oper= ations + * (fstat, dup, fcntl, close, fchdir on directories, as dirfd for *at() = calls). + * I/O operations will fail with EBADF. + * + * since-version: 1.0 + */ SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, = mode) { if (force_o_largefile()) --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29D5032573E; Thu, 18 Dec 2025 20:42:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090578; cv=none; b=C7aFOdq8aA5AtDfrGZ/z0suvXhE+TZtImsEmcvjji1GzmOmfTuDG10wpm678+c5WryUhEgvnQHWn+HFE+vLqkzL+2/28X3RwdTXMcnindG/mp9EYLWjNJmjrLzkJZJVNfyLgc/oFjaSSdhJZMYn8TKUeSl5U5MS5UkcuCN7Yx5g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090578; c=relaxed/simple; bh=6SDsm5fa4pPqv3CmHlvFDJLLRTXdJ+yKSKEOq2Xv97E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pUh8iXPzEJFT/HvrI+HK0Wd26L/QfR0XRUwu/ZudOgsGuKd3AwsR/HOrsrRu0+IYyf/OXOWHjBcKclGtIaviNeHoSyusUcMRmjEmH9qWc2O98EqC6cC+TG9/vZ6ebHzq5CVJZKSsUFeVR1U42JQ8a8tP5FvtJi7y82RCyV+smBk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FjwuDJ3K; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FjwuDJ3K" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 61D65C4CEFB; Thu, 18 Dec 2025 20:42:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090578; bh=6SDsm5fa4pPqv3CmHlvFDJLLRTXdJ+yKSKEOq2Xv97E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FjwuDJ3KjYo/uHWrF0yaqJ1AjEOPb8wLbKcHMM6tzivdxRy0KvHCJOdIWigf3qMdZ JR1bCTdU4a9aLRJVnlCWZwDkkj37UR7NRVwLauSp2yFXE+Z5iDuMhEcLvHZKVXBlxI KQlkS/xLpUCxTASwAKQHisMxTESvySu/mgA2rlQP4cSbMSt43t0+ET3nH3vYJX/5zf NdlsFZF42FIbQ3sO9+TV6G9IhgI/1ZcHOC18eG7ZSop92hfwhYjGI1esuGKxhjKZnU klGm5aPvDJZfpa+7BX3HBONMHRklR9KS8XQmX2gCtwixiiVgNdbrsfHjjG32pRVeys nO7QPEuUDqCgg== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 13/15] kernel/api: add API specification for sys_close Date: Thu, 18 Dec 2025 15:42:35 -0500 Message-ID: <20251218204239.4159453-14-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/open.c | 247 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 243 insertions(+), 4 deletions(-) diff --git a/fs/open.c b/fs/open.c index 343e6d3798ec3..26d8ee8336405 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1868,10 +1868,249 @@ int filp_close(struct file *filp, fl_owner_t id) } EXPORT_SYMBOL(filp_close); =20 -/* - * Careful here! We test whether the file pointer is NULL before - * releasing the fd. This ensures that one clone task can't release - * an fd while another clone is opening it. +/** + * sys_close - Close a file descriptor + * @fd: The file descriptor to close + * + * long-desc: Terminates access to an open file descriptor, releasing the = file + * descriptor for reuse by subsequent open(), dup(), or similar syscalls= . Any + * advisory record locks (POSIX locks, OFD locks, and flock locks) held = on the + * associated file are released. When this is the last file descriptor + * referring to the underlying open file description, associated resourc= es are + * freed. If the file was previously unlinked, the file itself is delete= d when + * the last reference is closed. + * + * CRITICAL: The file descriptor is ALWAYS closed, even when close() ret= urns + * an error. This differs from POSIX semantics where the state of the fi= le + * descriptor is unspecified after EINTR. On Linux, the fd is released e= arly + * in close() processing before flush operations that may fail. Therefor= e, + * retrying close() after an error return is DANGEROUS and may close an + * unrelated file descriptor that was assigned to another thread. + * + * Errors returned from close() (EIO, ENOSPC, EDQUOT) indicate that the = final + * flush of buffered data failed. These errors commonly occur on network + * filesystems like NFS when write errors are deferred to close time. A + * successful return from close() does NOT guarantee that data has been + * successfully written to disk; the kernel uses buffer cache to defer w= rites. + * To ensure data persistence, call fsync() before close(). + * + * On close, the following cleanup operations are performed: POSIX advis= ory + * locks are removed, dnotify registrations are cleaned up, the file is + * flushed if the file operations define a flush callback, and the file + * reference is released. If this was the last reference, additional cle= anup + * includes: fsnotify close notification, epoll cleanup, flock and lease + * removal, FASYNC cleanup, the file's release callback invocation, and + * the file structure deallocation. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: fd + * type: KAPI_TYPE_FD + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, INT_MAX + * constraint: Must be a valid, open file descriptor for the current pro= cess. + * The value 0, 1, or 2 (stdin, stdout, stderr) may be closed like any= other + * fd, though this is unusual and may cause issues with libraries that= assume + * these descriptors are valid. The parameter is unsigned int to match= kernel + * file descriptor table indexing, but values exceeding INT_MAX are ef= fectively + * invalid due to internal checks. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_EXACT + * success: 0 + * desc: Returns 0 on success. On error, returns a negative error code. + * IMPORTANT: Even when an error is returned, the file descriptor is s= till + * closed and must not be used again. The error indicates a problem wi= th + * the final flush operation, not that the fd remains open. + * + * error: EBADF, Bad file descriptor + * desc: The file descriptor fd is not a valid open file descriptor, or = was + * already closed. This is the only error that indicates the fd was NOT + * closed (because it was never open to begin with). Occurs when fd is= out + * of range, has no file assigned, or was already closed. + * + * error: EINTR, Interrupted system call + * desc: The flush operation was interrupted by a signal before completi= on. + * This occurs when a file's flush callback (e.g., NFS) performs an + * interruptible wait that receives a signal. IMPORTANT: Despite this = error, + * the file descriptor IS closed and must not be used again. This error + * is generated by converting kernel-internal restart codes (ERESTARTS= YS, + * ERESTARTNOINTR, ERESTARTNOHAND, ERESTART_RESTARTBLOCK) to EINTR bec= ause + * restarting the syscall would be incorrect once the fd is freed. + * + * error: EIO, I/O error + * desc: An I/O error occurred during the flush of buffered data to the + * underlying storage. This typically indicates a hardware error, netw= ork + * failure on NFS, or other storage system error. The file descriptor = is + * still closed. Previously buffered write data may have been lost. + * + * error: ENOSPC, No space left on device + * desc: There was insufficient space on the storage device to flush buf= fered + * writes. This is common on NFS when the server runs out of space bet= ween + * write() and close(). The file descriptor is still closed. + * + * error: EDQUOT, Disk quota exceeded + * desc: The user's disk quota was exceeded while attempting to flush bu= ffered + * writes. Common on NFS when quota is exceeded between write() and cl= ose(). + * The file descriptor is still closed. + * + * lock: files->file_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: Acquired via file_close_fd() to atomically lookup and remove th= e fd + * from the file descriptor table. Held only during the table manipula= tion; + * released before flush and final cleanup operations. This ensures th= at + * another thread cannot allocate the same fd number while close is in + * progress. + * + * lock: file->f_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: Acquired during epoll cleanup (eventpoll_release_file) and dnot= ify + * cleanup to safely unlink the file from monitoring structures. May a= lso + * be acquired during lock context operations. + * + * lock: ep->mtx + * type: KAPI_LOCK_MUTEX + * acquired: true + * released: true + * desc: Acquired during epoll cleanup if the file was monitored by epol= l. + * Used to safely remove the file from epoll interest lists. + * + * lock: flc_lock + * type: KAPI_LOCK_SPINLOCK + * acquired: true + * released: true + * desc: File lock context spinlock, acquired during locks_remove_file()= to + * safely remove POSIX, flock, and lease locks associated with the fil= e. + * + * signal: pending_signals + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RETURN + * condition: When flush callback performs interruptible wait + * desc: If the file's flush callback (e.g., nfs_file_flush) performs an + * interruptible wait and a signal is pending, the wait is interrupted. + * Any kernel restart codes are converted to EINTR since close cannot = be + * restarted after the fd is freed. + * error: -EINTR + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: no + * + * side-effect: KAPI_EFFECT_RESOURCE_DESTROY | KAPI_EFFECT_IRREVERSIBLE + * target: File descriptor table entry + * desc: The file descriptor is removed from the process's file descript= or + * table, making the fd number available for reuse by subsequent open(= ), + * dup(), or similar calls. This occurs BEFORE any flush or cleanup th= at + * might fail, making the operation irreversible regardless of return = value. + * condition: Always (when fd is valid) + * reversible: no + * + * side-effect: KAPI_EFFECT_LOCK_RELEASE + * target: POSIX advisory locks, OFD locks, flock locks + * desc: All advisory locks held on the file by this process are removed. + * POSIX locks are removed via locks_remove_posix() during filp_flush(= ). + * All lock types (POSIX, OFD, flock) are removed via locks_remove_fil= e() + * during __fput() when this is the last reference. + * condition: File has FMODE_OPENED and !(FMODE_PATH) + * reversible: no + * + * side-effect: KAPI_EFFECT_RESOURCE_DESTROY + * target: File leases + * desc: Any file leases held on the file are removed during locks_remov= e_file() + * when this is the last reference to the open file description. + * condition: File had leases and this is the last close + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: dnotify registrations + * desc: Directory notification (dnotify) registrations associated with = this + * file are cleaned up via dnotify_flush(). This only applies to direc= tories. + * condition: File is a directory with dnotify registrations + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: epoll interest lists + * desc: If the file was being monitored by epoll instances, it is remov= ed + * from those interest lists via eventpoll_release(). + * condition: File was added to epoll instances + * reversible: no + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: Buffered data + * desc: The file's flush callback is invoked if defined (e.g., NFS calls + * nfs_file_flush). This attempts to write any buffered data to storage + * and may return errors (EIO, ENOSPC, EDQUOT) if the flush fails. The + * success of this flush is NOT guaranteed even with a 0 return; use + * fsync() before close() to ensure data persistence. + * condition: File has a flush callback and was opened for writing + * reversible: no + * + * side-effect: KAPI_EFFECT_FREE_MEMORY + * target: struct file and related structures + * desc: When this is the last reference to the file, __fput() is called + * synchronously (fput_close_sync), which frees the file structure, re= leases + * the dentry and mount references, and invokes the file's release cal= lback. + * condition: This is the last reference to the file + * reversible: no + * + * side-effect: KAPI_EFFECT_FILESYSTEM + * target: Unlinked file deletion + * desc: If the file was previously unlinked (deleted) but kept open, cl= osing + * the last reference causes the actual file data to be removed from t= he + * filesystem and the inode to be freed. + * condition: File was unlinked and this is the last reference + * reversible: no + * + * state-trans: file_descriptor + * from: open + * to: closed/free + * condition: Valid fd passed to close + * desc: The file descriptor transitions from open (usable) to closed (i= nvalid). + * The fd number becomes available for reuse. This transition occurs e= arly + * in close() processing, before any operations that might fail. + * + * state-trans: file_reference_count + * from: n + * to: n-1 (or freed if n was 1) + * condition: Always on successful fd lookup + * desc: The file's reference count is decremented. If this was the last + * reference, the file is fully cleaned up and freed. + * + * constraint: File Descriptor Reuse Race + * desc: Because the fd is freed early in close() processing, another th= read + * may receive the same fd number from a concurrent open() before clos= e() + * returns. Applications must not retry close() after an error return,= as + * this could close an unrelated file opened by another thread. + * expr: After close(fd) returns (even with error), fd is invalid + * + * examples: close(fd); // Basic usage - ignore errors (common but not id= eal) + * if (close(fd) =3D=3D -1) perror("close"); // Log errors for debugging + * fsync(fd); close(fd); // Ensure data persistence before closing + * + * notes: This syscall has subtle non-POSIX semantics: the fd is ALWAYS cl= osed + * regardless of the return value. POSIX specifies that on EINTR, the st= ate + * of the fd is unspecified, but Linux always closes it. HP-UX requires + * retrying close() on EINTR, but doing so on Linux may close an unrelat= ed + * fd that was reassigned by another thread. For portable code, the safe= st + * approach is to check for errors but never retry close(). + * + * Error codes from the flush callback (EIO, ENOSPC, EDQUOT) indicate th= at + * previously written data may have been lost. These errors are particul= arly + * common on NFS where write errors are often deferred to close time. + * + * The driver's release() callback errors are explicitly ignored by the + * kernel, so device driver cleanup errors are not propagated to userspa= ce. + * + * Calling close() on a file descriptor while another thread is using it + * (e.g., in a blocking read() or write()) has implementation-defined + * behavior. On Linux, the blocked operation continues on the underlying + * file and may complete even after close() returns. + * + * since-version: 1.0 */ SYSCALL_DEFINE1(close, unsigned int, fd) { --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1513933064A; Thu, 18 Dec 2025 20:42:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090579; cv=none; b=lRPkZNcyuwnlMnmolwHXWbsLCQ2w4ks1jm8b6elSedaybn/hHvOoKGHvggNOCuKoWZKAt27303QqCda5xSWEAi7CXIvU57LlEacP4uciDP3k0qcU9qmIWxVO/UHMXoHUrhnELU0eqsZsvR3gv1L0PybqljTT2E3QB2Hp1nYKnqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090579; c=relaxed/simple; bh=iJZKXJCaRCjR9e8zmoORtXZj353a0RM7OTM7Pkd37e0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PaKG7Yukh0Ol0wPYhfySmGBxdToAcvEzLo+CsktRIC/YRiZSSLkHAf0bhWcIRpnC+jSQ6o03SQkZ/c357w9VGMwMsVGsHDiIHCvsvTtOpv2NH3Z4JeC6Jk/upqQaFPBl6/qZ3icz9jcymbv0JKgm/JVgiVCotqk+ndo7PGdrXJc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=A41kOROO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="A41kOROO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C9A5C113D0; Thu, 18 Dec 2025 20:42:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090579; bh=iJZKXJCaRCjR9e8zmoORtXZj353a0RM7OTM7Pkd37e0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A41kOROOnYPnHtXpGMwOokIa6i5Y69O6hfthXj12fcY3I3O4ycsdnDU1D+bXKQRfk I60IaMqOqYigaC6PihwirdKyesBa3zmEPU2aDBxboXeJWpxM5h7JXB/rABMjPoUZyW oE+DbCRDeuOevHs2bt3SRyGM3A4hcZvj+VClGY5R8klElhwOYVa4D3o3zOR+xqlHSe Stu+FsbwhUUB1dqg1+Jq4xseXRaCi4c2EVx2X5at4SMoZECNFEmO8rHhY/w6fSdG32 KNPEq8A9usV+2rTwdzq3jWj6Rfo2u4XWsWcWrUUwB+aI0eDABDgq1c0DNo5sngApFC kV43OcoSiBUVw== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 14/15] kernel/api: add API specification for sys_read Date: Thu, 18 Dec 2025 15:42:36 -0500 Message-ID: <20251218204239.4159453-15-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/read_write.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 287 insertions(+) diff --git a/fs/read_write.c b/fs/read_write.c index 833bae068770a..422046a666b1d 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -719,6 +719,293 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, = size_t count) return ret; } =20 +/** + * sys_read - Read data from a file descriptor + * @fd: File descriptor to read from + * @buf: User-space buffer to read data into + * @count: Maximum number of bytes to read + * + * long-desc: Attempts to read up to count bytes from file descriptor fd i= nto + * the buffer starting at buf. For seekable files (regular files, block + * devices), the read begins at the current file offset, and the file of= fset + * is advanced by the number of bytes read. For non-seekable files (pipe= s, + * FIFOs, sockets, character devices), the file offset is not used. + * + * If count is zero and fd refers to a regular file, read() may detect e= rrors + * as described below. In the absence of errors, or if read() does not c= heck + * for errors, a read() with a count of 0 returns zero and has no other = effects. + * + * On success, the number of bytes read is returned (zero indicates end = of + * file for regular files). It is not an error if this number is smaller= than + * the number of bytes requested; this may happen because fewer bytes are + * actually available right now (maybe because we were close to end-of-f= ile, + * or because we are reading from a pipe, socket, or terminal), or becau= se + * read() was interrupted by a signal. + * + * On Linux, read() transfers at most MAX_RW_COUNT (0x7ffff000, approxim= ately + * 2GB) bytes per call, regardless of whether the filesystem would allow= more. + * This is to avoid issues with signed arithmetic overflow on 32-bit sys= tems. + * + * POSIX allows reads that are interrupted after reading some data to ei= ther + * return -1 (with errno set to EINTR) or return the number of bytes alr= eady + * read. Linux follows the latter behavior: if data has been read before= a + * signal arrives, the call returns the bytes read rather than failing. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: fd + * type: KAPI_TYPE_FD + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, INT_MAX + * constraint: Must be a valid, open file descriptor with read permissio= n. + * The file must have been opened with O_RDONLY or O_RDWR. Special val= ues + * like AT_FDCWD are not valid. File descriptors for directories return + * EISDIR. Standard file descriptors 0 (stdin), 1 (stdout), 2 (stderr)= are + * valid if open and readable. + * + * param: buf + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_OUT | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must point to a valid, writable user-space memory region = of at + * least count bytes. The buffer is validated via access_ok() before a= ny + * read operation. NULL is invalid and will return EFAULT. The buffer = may + * be partially written if an error occurs mid-read. For O_DIRECT read= s, + * the buffer may need to be aligned to the filesystem's block size (v= aries + * by filesystem, check via statx() with STATX_DIOALIGN). + * + * param: count + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, SIZE_MAX + * constraint: Maximum number of bytes to read. Clamped internally to + * MAX_RW_COUNT (INT_MAX & PAGE_MASK, approximately 0x7ffff000 bytes) = to + * prevent signed overflow issues. A count of 0 returns immediately wi= th 0 + * without accessing the file (but may still detect errors). Large val= ues + * are not errors but will be clamped. Cast to ssize_t must not be neg= ative. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_RANGE + * success: >=3D 0 + * desc: On success, returns the number of bytes read (non-negative). Ze= ro + * indicates end-of-file (EOF) for regular files, or no data available + * from a device that does not block. The return value may be less than + * count if fewer bytes were available (short read). Partial reads are + * not errors. On error, returns a negative error code. + * + * error: EBADF, Bad file descriptor + * desc: fd is not a valid file descriptor, or fd was not opened for rea= ding. + * This includes file descriptors opened with O_WRONLY, O_PATH, or file + * descriptors that have been closed. Also returned if the file struct= ure + * does not have FMODE_READ set. + * + * error: EFAULT, Bad address + * desc: buf points outside the accessible address space. The buffer add= ress + * failed access_ok() validation. Can also occur if a fault happens du= ring + * copy_to_user() when transferring data to user space after the read + * completes in kernel space. + * + * error: EINVAL, Invalid argument + * desc: Returned in several cases: (1) The file descriptor refers to an + * object that is not suitable for reading (no read or read_iter metho= d). + * (2) The file was opened with O_DIRECT and the buffer alignment, off= set, + * or count does not meet the filesystem's alignment requirements. (3)= For + * timerfd file descriptors, the buffer is smaller than 8 bytes. (4) T= he + * count argument, when cast to ssize_t, is negative. + * + * error: EISDIR, Is a directory + * desc: fd refers to a directory. Directories cannot be read using read= (); + * use getdents64() instead. This error is returned by the generic_rea= d_dir() + * handler installed for directory file operations. + * + * error: EAGAIN, Resource temporarily unavailable + * desc: fd refers to a file (pipe, socket, device) that is marked non-b= locking + * (O_NONBLOCK) and the read would block. Also returned with IOCB_NOWA= IT + * when data is not immediately available. Equivalent to EWOULDBLOCK. + * The application should retry the read later or use select/poll/epol= l. + * + * error: EINTR, Interrupted system call + * desc: The call was interrupted by a signal before any data was read. = This + * only occurs if no data has been transferred; if some data was read = before + * the signal, the call returns the number of bytes read. The caller s= hould + * typically restart the read. + * + * error: EIO, Input/output error + * desc: A low-level I/O error occurred. For regular files, this typical= ly + * indicates a hardware error on the storage device, a filesystem erro= r, + * or a network filesystem timeout. For terminals, this may indicate t= he + * controlling terminal has been closed for a background process. + * + * error: EOVERFLOW, Value too large for defined data type + * desc: The file position plus count would exceed LLONG_MAX. Also retur= ned + * when reading from certain files (e.g., some /proc files) where the = file + * position would overflow. For files without FOP_UNSIGNED_OFFSET flag, + * negative file positions are not allowed. + * + * error: ENOBUFS, No buffer space available + * desc: Returned when reading from pipe-based watch queues (CONFIG_WATC= H_QUEUE) + * when the buffer is too small to hold a complete notification, or wh= en + * reading packets from pipes with PIPE_BUF_FLAG_WHOLE set. + * + * error: ERESTARTSYS, Restart system call (internal) + * desc: Internal error code indicating the syscall should be restarted.= This + * is typically translated to EINTR if SA_RESTART is not set on the si= gnal + * handler, or the syscall is transparently restarted if SA_RESTART is= set. + * User space should not see this error code directly. + * + * error: EACCES, Permission denied + * desc: The security subsystem (LSM such as SELinux or AppArmor) denied + * the read operation via security_file_permission(). This can occur e= ven + * if the file was successfully opened, as LSM policies may enforce pe= r- + * operation checks. + * + * error: EPERM, Operation not permitted + * desc: Returned by fanotify permission events (CONFIG_FANOTIFY_ACCESS_= PERMISSIONS) + * when a user-space fanotify listener denies the read operation via + * fsnotify_file_area_perm(). + * + * lock: file->f_pos_lock + * type: KAPI_LOCK_MUTEX + * acquired: conditional + * released: true + * desc: For regular files that require atomic position updates (FMODE_A= TOMIC_POS), + * the f_pos_lock mutex is acquired by fdget_pos() at syscall entry an= d released + * by fdput_pos() at syscall exit. This serializes concurrent reads th= at share + * the same file description. Not acquired for files opened with FMODE= _STREAM + * (pipes, sockets) or when the file is not shared. + * + * lock: Filesystem-specific locks + * type: KAPI_LOCK_CUSTOM + * acquired: conditional + * released: true + * desc: The filesystem's read_iter or read method may acquire additiona= l locks. + * For regular files, this typically includes the inode's i_rwsem for = certain + * operations. For pipes, the pipe->mutex is acquired. For sockets, so= cket + * lock is acquired. These are internal to the file operation and rele= ased + * before return. + * + * lock: RCU read-side + * type: KAPI_LOCK_RCU + * acquired: conditional + * released: true + * desc: Used during file descriptor lookup via fdget(). RCU read lock p= rotects + * access to the file descriptor table. Released by fdput() at syscall= exit. + * + * signal: Any signal + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RETURN + * condition: When blocked waiting for data on interruptible operations + * desc: The syscall may be interrupted by signals while waiting for dat= a to + * become available (pipes, sockets, terminals) or waiting for locks. = If + * interrupted before any data is read, returns -EINTR or -ERESTARTSYS. + * If data has already been read, returns the number of bytes read. + * error: -EINTR + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: yes + * + * side-effect: KAPI_EFFECT_FILE_POSITION + * target: file->f_pos + * condition: For seekable files when read succeeds (returns > 0) + * desc: The file offset (f_pos) is advanced by the number of bytes read. + * For stream files (FMODE_STREAM such as pipes and sockets), the offs= et + * is not used or modified. The offset update is protected by f_pos_lo= ck + * when the file is shared between threads/processes. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: inode access time (atime) + * condition: When read succeeds and O_NOATIME is not set + * desc: Updates the file's access time (atime) via touch_atime(). The u= pdate + * may be suppressed by mount options (noatime, relatime), the O_NOATI= ME + * flag, or if the filesystem does not support atime. Relatime only up= dates + * atime if it is older than mtime or ctime, or more than a day old. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: task I/O accounting + * condition: Always + * desc: Updates the current task's I/O accounting statistics. The rchar= field + * (read characters) is incremented by bytes read via add_rchar(). The= syscr + * field (syscall read count) is incremented via inc_syscr(). These st= atistics + * are visible in /proc/[pid]/io. Updated regardless of success or fai= lure. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: fsnotify events + * condition: When read returns > 0 + * desc: Generates an FS_ACCESS fsnotify event via fsnotify_access() all= owing + * inotify, fanotify, and dnotify watchers to be notified of the read.= This + * occurs after data transfer completes successfully. + * reversible: no + * + * capability: CAP_DAC_OVERRIDE + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass discretionary access control on read permission + * without: Standard DAC checks are enforced + * condition: Checked via security_file_permission() during rw_verify_ar= ea() + * + * capability: CAP_DAC_READ_SEARCH + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass read permission checks on regular files + * without: Must have read permission on file + * condition: Checked by LSM hooks during the read operation + * + * constraint: MAX_RW_COUNT + * desc: The count parameter is silently clamped to MAX_RW_COUNT (INT_MA= X & + * PAGE_MASK, approximately 2GB minus one page) to prevent integer ove= rflow + * in internal calculations. This is transparent to the caller; the sy= scall + * succeeds but reads at most MAX_RW_COUNT bytes. + * expr: actual_count =3D min(count, MAX_RW_COUNT) + * + * constraint: File must be open for reading + * desc: The file descriptor must have been opened with O_RDONLY or O_RD= WR. + * Files opened with O_WRONLY or O_PATH cannot be read and return EBAD= F. + * The file must have both FMODE_READ and FMODE_CAN_READ flags set. + * expr: (file->f_mode & FMODE_READ) && (file->f_mode & FMODE_CAN_READ) + * + * examples: n =3D read(fd, buf, sizeof(buf)); // Basic read + * n =3D read(STDIN_FILENO, buf, 1024); // Read from stdin + * while ((n =3D read(fd, buf, 4096)) > 0) { process(buf, n); } // Read= loop + * if (read(fd, buf, count) =3D=3D 0) { handle_eof(); } // Check for EOF + * + * notes: The behavior of read() varies significantly depending on the typ= e of + * file descriptor: + * + * - Regular files: Reads from current position, advances position, retu= rns 0 + * at EOF. Short reads are rare but possible near EOF or on signal. + * + * - Pipes and FIFOs: Blocking by default. Returns available data (up to= count) + * or blocks until data is available. Returns 0 when all writers have = closed. + * O_NONBLOCK returns EAGAIN when empty instead of blocking. + * + * - Sockets: Similar to pipes. Specific behavior depends on socket type= and + * protocol. MSG_* flags can be specified via recv() for more control. + * + * - Terminals: Line-buffered in canonical mode; read returns when newli= ne is + * entered or buffer is full. Raw mode returns immediately when data a= vailable. + * Special handling for signals (SIGINT on Ctrl+C, etc.). + * + * - Device special files: Behavior is device-specific. Some devices sup= port + * seeking, others do not. Read size may be constrained by device. + * + * Race condition: Concurrent reads from the same file description (not = just + * file descriptor) can race on the file position. Linux 3.14+ provides = atomic + * position updates for regular files via f_pos_lock, but applications s= hould + * use pread() for concurrent positioned reads. + * + * O_DIRECT reads bypass the page cache and typically require aligned bu= ffers + * and positions. Alignment requirements are filesystem-specific; use st= atx() + * with STATX_DIOALIGN (Linux 6.1+) to query. Unaligned O_DIRECT reads f= ail + * with EINVAL on most filesystems. + * + * For splice(2)-like zero-copy reads, consider using splice(), sendfile= (), + * or copy_file_range() instead of read() + write(). + * + * since-version: 1.0 + */ SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) { return ksys_read(fd, buf, count); --=20 2.51.0 From nobody Sat Feb 7 10:08:18 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F334C336EEE; Thu, 18 Dec 2025 20:42:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090580; cv=none; b=ijXn4NCD+2F4Y6wIphe4sgO6RyPA9KtBTYWn3a0c3Vo0oRrHdelqaEP5i7u5e5Q5M9e9nHGt5cIIvGSWiRGETnAaiTvwBgWrqKVB6eLLzjlUodumbL2snZlK9LiYPlf1STdK2Qjr5JTVBce+i8Vxr2IpUy7QvrAATHxR1b25cas= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766090580; c=relaxed/simple; bh=oL3b1IdHz7HdHacAQkdzJP1OcLLI3zfDFml5NRFS38s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MTh8wiAF+0gXyfcUYX03To0JtNlPRequnsbqi/UPRghXzdNdC2r0ZS2d+vwAVy9iMBQ8c0pgXIaV5LlL45yHvEgkcg+KUnhqQYDY99qPkVLDOPWzf4h6TNcHJqdbAeGRpmewSlaI+1JNSWgH4A/+hRpvFcn6Mrg0VfLfkO5gxz8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rI1Szzzi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rI1Szzzi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 36279C19421; Thu, 18 Dec 2025 20:42:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766090579; bh=oL3b1IdHz7HdHacAQkdzJP1OcLLI3zfDFml5NRFS38s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rI1Szzzice7aDP+OFJBuhoZ3KxNQ4WkxGN3J9nLZNLSTWQBmiaVB7hRwBPajFFOUf YmsG+f/3uaIq3WZCqQzFgihxRzaDsNmNTBUegYtv/+czHsxU6vh6IgIwNOU9isPLKR CO4E3CuOlOGSsZbn52k4nh7+/NHdVZII59C5MOsNyEix/8K2Vjp4/sVsEBtZWxQ8GA uxK/Ry6vsaDLl+OPcvfMLBERZqTC7VcG+1T7FWQXMHM20r9UPnDYtfvMCOg8gnVRyg Lh7duABLHg9rBVnh1Yb7W9SZLO2EzqGjKCcv+wE41QhNR/Z+PmKUOs+NZqa0sRtE8t egq+peiDyNtjQ== From: Sasha Levin To: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, tools@kernel.org, gpaoloni@redhat.com, Sasha Levin Subject: [RFC PATCH v5 15/15] kernel/api: add API specification for sys_write Date: Thu, 18 Dec 2025 15:42:37 -0500 Message-ID: <20251218204239.4159453-16-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251218204239.4159453-1-sashal@kernel.org> References: <20251218204239.4159453-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sasha Levin --- fs/read_write.c | 377 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 377 insertions(+) diff --git a/fs/read_write.c b/fs/read_write.c index 422046a666b1d..685bf6b9bd3b1 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1030,6 +1030,383 @@ ssize_t ksys_write(unsigned int fd, const char __us= er *buf, size_t count) return ret; } =20 +/** + * sys_write - Write data to a file descriptor + * @fd: File descriptor to write to + * @buf: User-space buffer containing data to write + * @count: Maximum number of bytes to write + * + * long-desc: Attempts to write up to count bytes from the buffer starting= at + * buf to the file referred to by the file descriptor fd. For seekable f= iles + * (regular files, block devices), the write begins at the current file = offset, + * and the file offset is advanced by the number of bytes written. If th= e file + * was opened with O_APPEND, the file offset is first set to the end of = the + * file before writing. For non-seekable files (pipes, FIFOs, sockets, c= haracter + * devices), the file offset is not used and writing occurs at the curre= nt + * position as defined by the device. + * + * The number of bytes written may be less than count if, for example, t= here is + * insufficient space on the underlying physical medium, or the RLIMIT_F= SIZE + * resource limit is encountered, or the call was interrupted by a signal + * handler after having written less than count bytes. In the event of a + * successful partial write, the caller should make another write() call= to + * transfer the remaining bytes. This behavior is called a "short write." + * + * On Linux, write() transfers at most MAX_RW_COUNT (0x7ffff000, approxi= mately + * 2GB minus one page) bytes per call, regardless of whether the file or + * filesystem would allow more. This prevents signed arithmetic overflow. + * + * For regular files, a successful write() does not guarantee that data = has been + * committed to disk. Use fsync(2) or fdatasync(2) if durability is requ= ired. + * For O_SYNC or O_DSYNC files, the kernel automatically syncs data on w= rite. + * + * POSIX permits writes that are interrupted after partial writes to eit= her + * return -1 with errno=3DEINTR, or to return the count of bytes already= written. + * Linux implements the latter behavior: if some data has been written b= efore + * a signal arrives, write() returns the number of bytes written rather = than + * failing with EINTR. + * + * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE + * + * param: fd + * type: KAPI_TYPE_FD + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, INT_MAX + * constraint: Must be a valid, open file descriptor with write permissi= on. + * The file must have been opened with O_WRONLY or O_RDWR. File descri= ptors + * opened with O_RDONLY, O_PATH, or that have been closed return EBADF. + * Standard file descriptors 0 (stdin), 1 (stdout), 2 (stderr) are val= id if + * open and writable. AT_FDCWD and other special values are not valid. + * + * param: buf + * type: KAPI_TYPE_USER_PTR + * flags: KAPI_PARAM_IN | KAPI_PARAM_USER + * constraint-type: KAPI_CONSTRAINT_CUSTOM + * constraint: Must point to a valid, readable user-space memory region = of at + * least count bytes. The buffer is validated via access_ok() before a= ny + * write operation. NULL is invalid and returns EFAULT. For O_DIRECT w= rites, + * the buffer may need to be aligned to the filesystem's block size (v= aries + * by filesystem; query with statx() using STATX_DIOALIGN on Linux 6.1= +). + * + * param: count + * type: KAPI_TYPE_UINT + * flags: KAPI_PARAM_IN + * constraint-type: KAPI_CONSTRAINT_RANGE + * range: 0, SIZE_MAX + * constraint: Maximum number of bytes to write. Clamped internally to + * MAX_RW_COUNT (INT_MAX & PAGE_MASK, approximately 0x7ffff000 bytes) = to + * prevent signed overflow. A count of 0 returns 0 immediately without= any + * file operations. Cast to ssize_t must not be negative. + * + * return: + * type: KAPI_TYPE_INT + * check-type: KAPI_RETURN_RANGE + * success: >=3D 0 + * desc: On success, returns the number of bytes written (non-negative).= Zero + * indicates that nothing was written (count was 0, or no space availa= ble + * for non-blocking writes). The return value may be less than count d= ue to + * resource limits, signal interruption, or device constraints (short = write). + * On error, returns a negative error code. + * + * error: EBADF, Bad file descriptor + * desc: fd is not a valid file descriptor, or fd was not opened for wri= ting. + * This includes file descriptors opened with O_RDONLY, O_PATH, or file + * descriptors that have been closed. Also returned if the file struct= ure + * does not have FMODE_WRITE or FMODE_CAN_WRITE set. + * + * error: EFAULT, Bad address + * desc: buf points outside the accessible address space. The buffer add= ress + * failed access_ok() validation. Can also occur if a fault happens du= ring + * copy_from_user() when reading data from user space. + * + * error: EINVAL, Invalid argument + * desc: Returned in several cases: (1) The file descriptor refers to an + * object that is not suitable for writing (no write or write_iter met= hod). + * (2) The file was opened with O_DIRECT and the buffer alignment, off= set, + * or count does not meet the filesystem's alignment requirements. (3)= The + * count argument, when cast to ssize_t, is negative. (4) For IOCB_NOW= AIT + * operations on non-O_DIRECT files that don't support WASYNC. + * + * error: EAGAIN, Resource temporarily unavailable + * desc: fd refers to a file (pipe, socket, device) that is marked non-b= locking + * (O_NONBLOCK) and the write would block because the buffer is full. = Also + * returned with IOCB_NOWAIT when data cannot be written immediately. + * Equivalent to EWOULDBLOCK. The application should retry later or use + * select/poll/epoll to wait for writability. + * + * error: EINTR, Interrupted system call + * desc: The call was interrupted by a signal before any data was writte= n. This + * only occurs if no data has been transferred; if some data was writt= en + * before the signal, the call returns the number of bytes written. The + * caller should typically restart the write. + * + * error: EPIPE, Broken pipe + * desc: fd refers to a pipe or socket whose reading end has been closed. + * When this condition occurs, the calling process also receives a SIG= PIPE + * signal unless MSG_NOSIGNAL is used (for sockets) or IOCB_NOSIGNAL i= s set. + * If the signal is caught or ignored, EPIPE is still returned. + * + * error: EFBIG, File too large + * desc: An attempt was made to write a file that exceeds the implementa= tion- + * defined maximum file size or the file size limit (RLIMIT_FSIZE) of = the + * process. When RLIMIT_FSIZE is exceeded, the process also receives S= IGXFSZ. + * For files not opened with O_LARGEFILE on 32-bit systems, the limit = is 2GB. + * + * error: ENOSPC, No space left on device + * desc: The device containing the file has no room for the data. This c= an + * occur mid-write resulting in a short write followed by ENOSPC on re= try. + * + * error: EDQUOT, Disk quota exceeded + * desc: The user's quota of disk blocks on the filesystem has been exha= usted. + * Like ENOSPC, this can result in a short write. + * + * error: EIO, Input/output error + * desc: A low-level I/O error occurred while modifying the inode or wri= ting + * data. This typically indicates hardware failure, filesystem corrupt= ion, + * or network filesystem timeout. Some data may have been written. + * + * error: EPERM, Operation not permitted + * desc: The operation was prevented: (1) by a file seal (F_SEAL_WRITE or + * F_SEAL_FUTURE_WRITE on memfd/shmem), (2) writing to an immutable in= ode + * (IS_IMMUTABLE), (3) by an LSM hook denying the operation, or (4) by= a + * fanotify permission event denying the write. + * + * error: EOVERFLOW, Value too large for defined data type + * desc: The file position plus count would exceed LLONG_MAX. Also retur= ned + * when the offset would exceed filesystem limits after the write. + * + * error: EDESTADDRREQ, Destination address required + * desc: fd is a datagram socket for which no peer address has been set = using + * connect(2). Use sendto(2) to specify the destination address. + * + * error: ETXTBSY, Text file busy + * desc: The file is being used as a swap file (IS_SWAPFILE). + * + * error: EXDEV, Cross-device link + * desc: When writing to a pipe that has been configured as a watch queue + * (CONFIG_WATCH_QUEUE), direct write() calls are not supported. + * + * error: ENOMEM, Out of memory + * desc: Insufficient kernel memory was available for the write operatio= n. + * For pipes, this occurs when allocating pages for the pipe buffer. + * + * error: ERESTARTSYS, Restart system call (internal) + * desc: Internal error code indicating the syscall should be restarted.= This + * is converted to EINTR if SA_RESTART is not set on the signal handle= r, or + * the syscall is transparently restarted if SA_RESTART is set. User s= pace + * should not see this error code directly. + * + * error: EACCES, Permission denied + * desc: The security subsystem (LSM such as SELinux or AppArmor) denied= the + * write operation via security_file_permission(). This can occur even= if + * the file was successfully opened. + * + * lock: file->f_pos_lock + * type: KAPI_LOCK_MUTEX + * acquired: conditional + * released: true + * desc: For regular files that require atomic position updates (FMODE_A= TOMIC_POS), + * the f_pos_lock mutex is acquired by fdget_pos() at syscall entry an= d released + * by fdput_pos() at syscall exit. This serializes concurrent writes s= haring + * the same file description. Not acquired for stream files (FMODE_STR= EAM like + * pipes and sockets) or when the file is not shared. + * + * lock: sb->s_writers (freeze protection) + * type: KAPI_LOCK_CUSTOM + * acquired: conditional + * released: true + * desc: For regular files, file_start_write() acquires freeze protectio= n on + * the superblock via sb_start_write() before the write, and file_end_= write() + * releases it after. This prevents writes during filesystem freeze. N= ot + * acquired for non-regular files (pipes, sockets, devices). + * + * lock: inode->i_rwsem + * type: KAPI_LOCK_RWLOCK + * acquired: conditional + * released: true + * desc: For regular files using generic_file_write_iter(), the inode's = i_rwsem + * is acquired in write mode before modifying file data. This is inter= nal to + * the filesystem and released before return. Not all filesystems use = this + * pattern. + * + * lock: pipe->mutex + * type: KAPI_LOCK_MUTEX + * acquired: conditional + * released: true + * desc: For pipes and FIFOs, the pipe's mutex is held while modifying p= ipe + * buffers. Released temporarily while waiting for space, then reacqui= red. + * + * lock: RCU read-side + * type: KAPI_LOCK_RCU + * acquired: conditional + * released: true + * desc: Used during file descriptor lookup via fdget(). RCU read lock p= rotects + * access to the file descriptor table. Released by fdput() at syscall= exit. + * + * signal: SIGPIPE + * direction: KAPI_SIGNAL_SEND + * action: KAPI_SIGNAL_ACTION_TERMINATE + * condition: Writing to a pipe or socket with no readers + * desc: When writing to a pipe whose read end is closed, or a socket wh= ose + * peer has closed, SIGPIPE is sent to the calling process. The default + * action terminates the process. Use signal(SIGPIPE, SIG_IGN) or set + * IOCB_NOSIGNAL/MSG_NOSIGNAL to suppress. EPIPE is returned regardles= s. + * timing: KAPI_SIGNAL_TIME_DURING + * + * signal: SIGXFSZ + * direction: KAPI_SIGNAL_SEND + * action: KAPI_SIGNAL_ACTION_COREDUMP + * condition: Writing exceeds RLIMIT_FSIZE + * desc: When a write would exceed the soft file size limit (RLIMIT_FSIZ= E), + * SIGXFSZ is sent. The default action terminates with a core dump. The + * write returns EFBIG. If RLIMIT_FSIZE is RLIM_INFINITY, no signal is= sent. + * timing: KAPI_SIGNAL_TIME_DURING + * + * signal: Any signal + * direction: KAPI_SIGNAL_RECEIVE + * action: KAPI_SIGNAL_ACTION_RETURN + * condition: While blocked waiting for space (pipes, sockets) + * desc: The syscall may be interrupted by signals while waiting for buf= fer + * space to become available. If interrupted before any data is writte= n, + * returns -EINTR or -ERESTARTSYS. If data was already written, return= s the + * byte count. Restartable if SA_RESTART is set and no data was writte= n. + * error: -EINTR + * timing: KAPI_SIGNAL_TIME_DURING + * restartable: yes + * + * side-effect: KAPI_EFFECT_FILE_POSITION + * target: file->f_pos + * condition: For seekable files when write succeeds (returns > 0) + * desc: The file offset (f_pos) is advanced by the number of bytes writ= ten. + * For files opened with O_APPEND, f_pos is first set to file size. For + * stream files (FMODE_STREAM such as pipes and sockets), the offset i= s not + * used or modified. Position updates are protected by f_pos_lock when + * shared. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: inode timestamps (mtime, ctime) + * condition: When write succeeds (returns > 0) + * desc: Updates the file's modification time (mtime) and change time (c= time) + * via file_update_time(). The update precision depends on filesystem = mount + * options (fine-grained timestamps for multigrain inodes). + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: SUID/SGID bits (mode) + * condition: When writing to a setuid/setgid file + * desc: The SUID bit is cleared when a non-root user writes to a file w= ith + * the bit set. The SGID bit may also be cleared. This is a security f= eature + * to prevent privilege escalation via modified setuid binaries. Done = via + * file_remove_privs(). + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: file data + * condition: When write succeeds (returns > 0) + * desc: Modifies the file's data content. For regular files, data is wr= itten + * to the page cache (buffered I/O) or directly to storage (O_DIRECT). + * Data may not be persistent until fsync() is called or the file is c= losed. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: task I/O accounting + * condition: Always + * desc: Updates the current task's I/O accounting statistics. The wchar= field + * (write characters) is incremented by bytes written via add_wchar().= The + * syscw field (syscall write count) is incremented via inc_syscw(). T= hese + * statistics are visible in /proc/[pid]/io. + * reversible: no + * + * side-effect: KAPI_EFFECT_MODIFY_STATE + * target: fsnotify events + * condition: When write returns > 0 + * desc: Generates an FS_MODIFY fsnotify event via fsnotify_modify(), al= lowing + * inotify, fanotify, and dnotify watchers to be notified of the write. + * + * capability: CAP_DAC_OVERRIDE + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass discretionary access control on write permission + * without: Standard DAC checks are enforced + * condition: Checked via security_file_permission() during rw_verify_ar= ea() + * + * capability: CAP_FOWNER + * type: KAPI_CAP_BYPASS_CHECK + * allows: Bypass ownership checks for SUID/SGID clearing + * without: SUID/SGID bits are cleared on write by non-owner + * condition: Checked during file_remove_privs() + * + * constraint: MAX_RW_COUNT + * desc: The count parameter is silently clamped to MAX_RW_COUNT (INT_MA= X & + * PAGE_MASK, approximately 2GB minus one page) to prevent integer ove= rflow + * in internal calculations. This is transparent to the caller. + * expr: actual_count =3D min(count, MAX_RW_COUNT) + * + * constraint: File must be open for writing + * desc: The file descriptor must have been opened with O_WRONLY or O_RD= WR. + * Files opened with O_RDONLY or O_PATH cannot be written and return E= BADF. + * The file must have both FMODE_WRITE and FMODE_CAN_WRITE flags set. + * expr: (file->f_mode & FMODE_WRITE) && (file->f_mode & FMODE_CAN_WRITE) + * + * constraint: RLIMIT_FSIZE + * desc: The size of data written is constrained by the RLIMIT_FSIZE res= ource + * limit. If writing would exceed this limit, SIGXFSZ is sent and EFBI= G is + * returned. The limit does not apply to files beyond the limit - only= to + * writes that would cross it. + * expr: pos + count <=3D rlimit(RLIMIT_FSIZE) || rlimit(RLIMIT_FSIZE) = =3D=3D RLIM_INFINITY + * + * constraint: File seals + * desc: For memfd or shmem files with F_SEAL_WRITE or F_SEAL_FUTURE_WRI= TE + * seals applied, all write operations fail with EPERM. With F_SEAL_GR= OW, + * writes that would extend file size fail with EPERM. + * + * examples: n =3D write(fd, buf, sizeof(buf)); // Basic write + * n =3D write(STDOUT_FILENO, msg, strlen(msg)); // Write to stdout + * while (total < len) { n =3D write(fd, buf+total, len-total); if (n<0)= break; total +=3D n; } // Handle short writes + * if (write(pipefd[1], &byte, 1) < 0 && errno =3D=3D EPIPE) { handle_br= oken_pipe(); } // Pipe error handling + * + * notes: The behavior of write() varies significantly depending on the ty= pe of + * file descriptor: + * + * - Regular files: Writes to the page cache (buffered) or directly to s= torage + * (O_DIRECT). Short writes are rare except near RLIMIT_FSIZE or disk = full. + * O_APPEND is atomic for determining write position. + * + * - Pipes and FIFOs: Blocking by default. Writes up to PIPE_BUF (4096 b= ytes + * on Linux) are guaranteed atomic. Larger writes may be interleaved w= ith + * writes from other processes. Blocks if pipe is full; returns EAGAIN= with + * O_NONBLOCK. SIGPIPE/EPIPE if no readers. + * + * - Sockets: Behavior depends on socket type and protocol. Stream socke= ts + * (TCP) may return partial writes. Datagram sockets (UDP) typically w= rite + * complete messages or fail. SIGPIPE/EPIPE for broken connections (un= less + * MSG_NOSIGNAL). EDESTADDRREQ for unconnected datagram sockets. + * + * - Terminals: May block on flow control. Canonical vs raw mode affects + * behavior. Special characters may be interpreted. + * + * - Device special files: Behavior is device-specific. Block devices be= have + * similarly to regular files. Character device behavior varies. + * + * Race condition considerations: Concurrent writes from threads sharing= a + * file description race on the file position. Linux 3.14+ provides atom= ic + * position updates via f_pos_lock for regular files (FMODE_ATOMIC_POS),= but + * for maximum safety, use pwrite() for concurrent positioned writes. + * + * O_DIRECT writes bypass the page cache and typically require buffer and + * offset alignment to filesystem block size. Query requirements via sta= tx() + * with STATX_DIOALIGN (Linux 6.1+). Unaligned O_DIRECT writes return EI= NVAL + * on most filesystems. + * + * For zero-copy writes, consider using splice(2), sendfile(2), or vmspl= ice(2) + * instead of copying data through user-space buffers with write(). + * + * Partial writes (short writes) must be handled by application code. + * Applications should loop until all data is written or an error occurs. + * + * since-version: 1.0 + */ SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count) { --=20 2.51.0