From nobody Wed Oct 8 22:31:06 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B81AE2F0C77; Tue, 24 Jun 2025 18:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750788478; cv=none; b=vB1ZZGVecqTLX5m3W/PPpcPU9vCAQjzo0aVkfFpUaHnoMlvnfUAwQ0FQZQPYqc6AJfH/tvjKoN/svmezVRcjXbHrXWjlSSJhLgbdKKX2xAmWqr1gWX2Z4GPZSmbfUl/3enXyZz6u4yZV+R3wmVaFHkSZHkz/RddxF9f6PVeB1TQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750788478; c=relaxed/simple; bh=i8/5GgeHil/TNEzb6lJNOuCu39GC+PerV5c31sqY8Jg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Gsn44gBOaxHEuD8NcEAguVjawQEuUFgNhp0XQcVm89NfnGZc+4bQgq9Dc6woFZI3LCd2BkFJpkQWFCkbDHmUAzqRc9pJvadsC5s0pA/uoz9xU+tgepizj7aCyQuHrduV//NhCZ6KJwYE1pN+J81PuiA26zMp5qrnrD/LA4HTi7Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BRxFYgDL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BRxFYgDL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 16BEEC4CEF2; Tue, 24 Jun 2025 18:07:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1750788478; bh=i8/5GgeHil/TNEzb6lJNOuCu39GC+PerV5c31sqY8Jg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BRxFYgDLMwwFmSW65fSyMffHQ/mSpD8PGolAea/Gg4Qbpe6MP+rqZLOTbiydr3Iya SBsic+UmzVPFyV7itHDqaGTaGwDQS0pk92UDHhBXjHzV9oPmIwf1UJLa1AoDDeYuV5 i9nfUL9Wrt01OqtK33nfhIlhFN91EkUV/NUSQu18mm++MBWex3ndr8e1bHVluNb8Ur 4+ajE3AmZxo8DpWL8dNS3ysr5JsjpZQ3yFGTHf7T0bOHyWAawewF9ZZpqoBQ1ql9CF CIDT4505tykB2mPOl+lwoMfrYmjIG/6Nn7P0oxnJ0UuTYFgl5HYE0NIZgxCVF9siNH xi5CxNTP2XuwQ== From: Sasha Levin To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-api@vger.kernel.org, workflows@vger.kernel.org, tools@kernel.org, Sasha Levin Subject: [RFC v2 21/22] net/socket: add API specification for socket() Date: Tue, 24 Jun 2025 14:07:41 -0400 Message-Id: <20250624180742.5795-22-sashal@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250624180742.5795-1-sashal@kernel.org> References: <20250624180742.5795-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add kernel API specification for the socket() system call, documenting all aspects of socket creation. Signed-off-by: Sasha Levin --- net/socket.c | 489 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 489 insertions(+) diff --git a/net/socket.c b/net/socket.c index 9a0e720f08598..fa42497d72af2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -78,6 +78,7 @@ #include #include #include +#include #include #include #include @@ -89,6 +90,7 @@ #include #include #include +#include =20 #include #include @@ -1692,6 +1694,493 @@ int __sys_socket(int family, int type, int protocol) return sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK)); } =20 +DEFINE_KERNEL_API_SPEC(sys_socket) + KAPI_DESCRIPTION("Create an endpoint for communication") + KAPI_LONG_DESC("Creates an endpoint for communication and returns a file = descriptor " + "that refers to that endpoint. The file descriptor returned by a = successful " + "call will be the lowest-numbered file descriptor not currently o= pen for " + "the process. The socket has the indicated type, which specifies = the " + "communication semantics. The socket() system call is the foundat= ion of " + "all network programming in Linux, providing access to various ne= twork " + "protocols and communication mechanisms.") + KAPI_CONTEXT(KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE) + + KAPI_PARAM(0, "family", "int", "Protocol/address family (domain)") + KAPI_PARAM_FLAGS(KAPI_PARAM_IN) + KAPI_PARAM_TYPE(KAPI_TYPE_INT) + KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_RANGE) + KAPI_PARAM_RANGE(0, 45) /* AF_UNSPEC to AF_MCTP */ + KAPI_PARAM_CONSTRAINT("Common families: AF_UNIX (1), AF_INET (2), AF_INE= T6 (10), " + "AF_NETLINK (16), AF_PACKET (17). Others: AF_BLUETOOTH (31), AF_= CAN (29), " + "AF_TIPC (30), AF_VSOCK (40), AF_XDP (44). Range: 0-45 (AF_MCTP)= . " + "PF_* are aliases. Negative or >=3D 46 returns EAFNOSUPPORT.") + KAPI_PARAM_END + + KAPI_PARAM(1, "type", "int", "Socket type with optional flags") + KAPI_PARAM_FLAGS(KAPI_PARAM_IN) + KAPI_PARAM_TYPE(KAPI_TYPE_INT) + KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_MASK) + KAPI_PARAM_VALID_MASK(SOCK_TYPE_MASK | SOCK_CLOEXEC | SOCK_NONBLOCK) + KAPI_PARAM_CONSTRAINT("Types: SOCK_STREAM (1), SOCK_DGRAM (2), SOCK_RAW = (3), " + "SOCK_RDM (4), SOCK_SEQPACKET (5), SOCK_DCCP (6), SOCK_PACKET (1= 0-obsolete). " + "Flags (since 2.6.27): SOCK_NONBLOCK, SOCK_CLOEXEC. Range: 0-10.= ") + KAPI_PARAM_END + + KAPI_PARAM(2, "protocol", "int", "Protocol within the family") + KAPI_PARAM_FLAGS(KAPI_PARAM_IN) + KAPI_PARAM_TYPE(KAPI_TYPE_INT) + KAPI_PARAM_CONSTRAINT_TYPE(KAPI_CONSTRAINT_NONE) + KAPI_PARAM_CONSTRAINT("Usually 0 to select the default protocol for the = given family and type. " + "For AF_INET/AF_INET6: IPPROTO_TCP (6), IPPROTO_UDP (17), IPPROT= O_ICMP (1), " + "IPPROTO_RAW (255), etc. Must be >=3D 0 and < IPPROTO_MAX. " + "For AF_UNIX: only 0 or PF_UNIX (1) accepted. " + "For AF_PACKET: network byte order Ethernet protocol (e.g., ETH_= P_IP). " + "For AF_NETLINK: NETLINK_ROUTE, NETLINK_AUDIT, etc. (0-31). " + "Protocol value passed through update_socket_protocol() BPF hook= which may modify it.") + KAPI_PARAM_END + + KAPI_RETURN("long", "File descriptor on success; negative error code on f= ailure. " + "On success, returns the lowest available file descriptor. " + "The descriptor is automatically placed in the process's file descri= ptor table. " + "If SOCK_CLOEXEC is set, FD_CLOEXEC is set on the descriptor. " + "If SOCK_NONBLOCK is set, O_NONBLOCK is set on the file.") + KAPI_RETURN_TYPE(KAPI_TYPE_FD) + KAPI_RETURN_CHECK_TYPE(KAPI_RETURN_ERROR_CHECK) + KAPI_RETURN_SUCCESS(0) + KAPI_RETURN_END + + /* Core error codes from __sock_create() and __sys_socket() */ + KAPI_ERROR(0, -EAFNOSUPPORT, "EAFNOSUPPORT", "Address family not supporte= d", + "The implementation does not support the specified address family. " + "Returned when: family < 0 || family >=3D NPROTO (46); " + "protocol family not registered in net_families[]; " + "protocol family module cannot be loaded; " + "try_module_get() fails on protocol family owner.") + KAPI_ERROR(1, -EINVAL, "EINVAL", "Invalid argument", + "Invalid argument specified. Returned when: " + "type < 0 || type >=3D SOCK_MAX (11); " + "invalid flags in type ((type & ~SOCK_TYPE_MASK) & ~(SOCK_CLOEXEC | S= OCK_NONBLOCK)); " + "other protocol-specific validation failures.") + KAPI_ERROR(2, -ENFILE, "ENFILE", "File table overflow", + "The system-wide limit on the total number of open files has been rea= ched. " + "Returned when sock_alloc() fails due to new_inode_pseudo() failure.") + KAPI_ERROR(3, -EMFILE, "EMFILE", "Too many open files", + "The per-process limit on the number of open file descriptors has bee= n reached. " + "Returned when sock_map_fd() cannot allocate a new file descriptor.") + KAPI_ERROR(4, -ENOMEM, "ENOMEM", "Out of memory", + "Insufficient kernel memory available. Can occur in: " + "sk_alloc() when allocating sock structure; " + "protocol-specific init functions; " + "security_sk_alloc() in LSM hooks; " + "various kmalloc()/kmem_cache_alloc() calls.") + KAPI_ERROR(5, -ENOBUFS, "ENOBUFS", "No buffer space available", + "Insufficient resources to create socket. Similar to ENOMEM but used = by " + "some protocol families (e.g., AF_PACKET) to indicate resource exhaus= tion.") + KAPI_ERROR(6, -EPROTONOSUPPORT, "EPROTONOSUPPORT", "Protocol not supporte= d", + "The protocol is not supported within this domain. Returned when: " + "AF_UNIX: protocol !=3D 0 && protocol !=3D PF_UNIX; " + "AF_INET/AF_INET6: protocol not found in inetsw[] array; " + "AF_NETLINK: protocol < 0 || protocol >=3D MAX_LINKS (32).") + KAPI_ERROR(7, -ESOCKTNOSUPPORT, "ESOCKTNOSUPPORT", "Socket type not suppo= rted", + "The socket type is not supported within this domain. Returned when: " + "AF_UNIX: type not in {STREAM, DGRAM, SEQPACKET, RAW}; " + "AF_INET/AF_INET6: no matching (type, protocol) in inetsw[]; " + "AF_PACKET: type not in {DGRAM, RAW, PACKET}; " + "AF_NETLINK: type not in {RAW, DGRAM}.") + KAPI_ERROR(8, -EPERM, "EPERM", "Operation not permitted", + "Permission denied due to insufficient privileges. Returned when: " + "AF_INET/AF_INET6 with SOCK_RAW: missing CAP_NET_RAW; " + "AF_PACKET: missing CAP_NET_RAW; " + "Some protocol families may have additional restrictions.") + KAPI_ERROR(9, -EACCES, "EACCES", "Permission denied", + "Permission denied by Linux Security Module (SELinux, AppArmor, etc.)= . " + "Returned by security_socket_create() or security_socket_post_create(= ) hooks.") + KAPI_ERROR(10, -EAGAIN, "EAGAIN", "Resource temporarily unavailable", + "Transient resource shortage. Can be returned by some protocol famili= es " + "during initialization when resources are temporarily exhausted.") + KAPI_ERROR(11, -EINTR, "EINTR", "Interrupted system call", + "Operation interrupted by signal. Rare for socket() but possible if " + "module loading is interrupted or during memory allocation with GFP_K= ERNEL.") + KAPI_ERROR(12, -EFAULT, "EFAULT", "Bad address", + "Not directly returned by socket() since all parameters are values, n= ot pointers. " + "Listed for completeness as it appears in documentation.") + KAPI_ERROR(13, -ENOSYS, "ENOSYS", "Function not implemented", + "Can occur in containers using alt-syscall where socket() is not whit= elisted, " + "or on architectures where socket() is not implemented.") + + KAPI_ERROR_COUNT(14) + KAPI_PARAM_COUNT(3) + KAPI_SINCE_VERSION("4.2BSD") + + KAPI_EXAMPLES("/* Create a TCP socket */\n" + "int tcp_sock =3D socket(AF_INET, SOCK_STREAM, 0);\n" + "if (tcp_sock < 0) {\n" + " perror(\"socket\");\n" + " exit(EXIT_FAILURE);\n" + "}\n\n" + "/* Create a non-blocking UDP socket with close-on-exec */\n" + "int udp_sock =3D socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK | SO= CK_CLOEXEC, 0);\n\n" + "/* Create a raw ICMP socket (requires CAP_NET_RAW) */\n" + "int raw_sock =3D socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);\n\n" + "/* Create a Unix domain datagram socket */\n" + "int unix_sock =3D socket(AF_UNIX, SOCK_DGRAM, 0);\n\n" + "/* Create a netlink socket for routing information */\n" + "int nl_sock =3D socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);\n\n" + "/* Create a packet socket for raw Ethernet frames (requires CAP_N= ET_RAW) */\n" + "int packet_sock =3D socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))= ;\n\n" + "/* Create a Bluetooth L2CAP socket */\n" + "int bt_sock =3D socket(AF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CA= P);") + + KAPI_NOTES("Implementation details:\n" + "- Uses RCU to safely access net_families[] array\n" + "- May trigger automatic module loading via request_module(\"net-pf-%= d\", family)\n" + "- Allocates inode from sock_inode_cache via new_inode_pseudo()\n" + "- Each protocol family registers via sock_register() with unique fam= ily number\n" + "- Socket creation involves: sock_alloc() -> pf->create() -> sock_map= _fd()\n" + "- The update_socket_protocol() BPF hook can modify the protocol para= meter\n" + "- LSM hooks called: security_socket_create() and security_socket_pos= t_create()\n" + "- Creates struct socket (VFS layer) and struct sock (network layer)\= n" + "- Socket state initialized to SS_UNCONNECTED\n" + "- File operations set to socket_file_ops\n" + "- The (PF_INET, SOCK_PACKET) combination is deprecated since Linux 2= .0\n" + "Build-time checks ensure SOCK_CLOEXEC =3D=3D O_CLOEXEC and flag cons= istency") + + /* Lock specifications */ + KAPI_LOCK(0, "rcu_read_lock", KAPI_LOCK_RCU) + KAPI_LOCK_ACQUIRED + KAPI_LOCK_RELEASED + KAPI_LOCK_DESC("Protects net_families[] array access during protocol fam= ily lookup. " + "Acquired before rcu_dereference(net_families[family]), " + "released after pf->create() call or on error path.") + KAPI_LOCK_END + + KAPI_LOCK(1, "pf->owner module refcount", KAPI_LOCK_CUSTOM) + KAPI_LOCK_ACQUIRED + KAPI_LOCK_RELEASED + KAPI_LOCK_DESC("Prevents protocol family module unload during socket cre= ation. " + "try_module_get(pf->owner) before pf->create(), " + "module_put(pf->owner) after completion.") + KAPI_LOCK_END + + KAPI_LOCK(2, "sock->ops->owner module refcount", KAPI_LOCK_CUSTOM) + KAPI_LOCK_ACQUIRED + KAPI_LOCK_DESC("Prevents socket operations module unload during socket l= ifetime. " + "try_module_get(sock->ops->owner) after successful creation, " + "released only on sock_release() when socket is closed.") + KAPI_LOCK_END + + KAPI_LOCK_COUNT(3) + + /* Signal handling */ + KAPI_SIGNAL(0, 0, "Module loading", KAPI_SIGNAL_RECEIVE, KAPI_SIGNAL_ACTI= ON_RESTART) + KAPI_SIGNAL_CONDITION("CONFIG_MODULES && request_module() called") + KAPI_SIGNAL_DESC("Module loading via request_module() is interruptible. " + "Signal delivery causes -EINTR from modprobe execution.") + KAPI_SIGNAL_TIMING(KAPI_SIGNAL_TIME_DURING) + KAPI_SIGNAL_INTERRUPTIBLE + KAPI_SIGNAL_END + + KAPI_SIGNAL_COUNT(1) + + /* Side effects */ + KAPI_SIDE_EFFECT(0, KAPI_EFFECT_ALLOC_MEMORY | KAPI_EFFECT_RESOURCE_CREAT= E, + "socket structures", + "Allocates struct socket (VFS), struct sock (network), and protocol-sp= ecific data. " + "Memory from: sock_inode_cache, protocol's slab cache, and general kma= lloc.") + KAPI_EFFECT_CONDITION("Always occurs on successful socket creation") + KAPI_EFFECT_REVERSIBLE + KAPI_SIDE_EFFECT_END + + KAPI_SIDE_EFFECT(1, KAPI_EFFECT_RESOURCE_CREATE, + "file descriptor", + "Allocates new file descriptor at lowest available index. " + "Creates struct file with socket_file_ops. Sets up file->private_data = =3D socket.") + KAPI_EFFECT_CONDITION("Always occurs on successful socket creation") + KAPI_EFFECT_REVERSIBLE + KAPI_SIDE_EFFECT_END + + KAPI_SIDE_EFFECT(2, KAPI_EFFECT_FILESYSTEM, + "protocol module", + "May trigger request_module(\"net-pf-%d\", family) to load protocol mo= dule. " + "Executes /sbin/modprobe in userspace context.") + KAPI_EFFECT_CONDITION("CONFIG_MODULES=3Dy && !net_families[family] && fi= rst attempt") + KAPI_SIDE_EFFECT_END + + KAPI_SIDE_EFFECT(3, KAPI_EFFECT_MODIFY_STATE, + "LSM and audit", + "Calls security_socket_create() pre-creation and security_socket_post_= create() " + "post-creation. May generate audit events. SELinux/AppArmor may deny.") + KAPI_EFFECT_CONDITION("CONFIG_SECURITY=3Dy or CONFIG_AUDIT=3Dy") + KAPI_SIDE_EFFECT_END + + KAPI_SIDE_EFFECT(4, KAPI_EFFECT_MODIFY_STATE, + "BPF programs", + "update_socket_protocol() hook can modify protocol parameter. " + "BPF_CGROUP_RUN_PROG_INET_SOCK() may run for AF_INET/AF_INET6.") + KAPI_EFFECT_CONDITION("BPF programs attached to cgroup or socket hooks") + KAPI_SIDE_EFFECT_END + + KAPI_SIDE_EFFECT(5, KAPI_EFFECT_NETWORK | KAPI_EFFECT_HARDWARE, + "network stack", + "Initializes protocol-specific state. May interact with network hardwa= re " + "(e.g., AF_PACKET binds to network interface).") + KAPI_EFFECT_CONDITION("Protocol family specific") + KAPI_SIDE_EFFECT_END + + KAPI_SIDE_EFFECT(6, KAPI_EFFECT_MODIFY_STATE, + "resource accounting", + "Updates task and memory cgroup accounting. Charges socket memory to o= wner. " + "Increments global socket counters.") + KAPI_EFFECT_CONDITION("CONFIG_MEMCG=3Dy or other accounting enabled") + KAPI_SIDE_EFFECT_END + + KAPI_SIDE_EFFECT_COUNT(7) + + /* State transitions */ + KAPI_STATE_TRANS(0, "file descriptor table", + "n open descriptors", "n+1 open descriptors", + "New fd allocated at min(available). Updates current->files->fd_array[= ]") + KAPI_STATE_TRANS_END + + KAPI_STATE_TRANS(1, "socket state machine", + "non-existent", "SS_UNCONNECTED", + "Socket created in unconnected state, ready for bind() or connect()") + KAPI_STATE_TRANS_END + + KAPI_STATE_TRANS(2, "network namespace", + "no socket", "socket registered", + "Socket associated with current->nsproxy->net_ns network namespace") + KAPI_STATE_TRANS_END + + KAPI_STATE_TRANS(3, "memory accounting", + "pre-allocation", "memory charged", + "Socket memory charged to owner's memcg and rlimits") + KAPI_STATE_TRANS_END + + KAPI_STATE_TRANS_COUNT(4) + + /* Networking-specific specifications */ + + /* Socket state specification */ + KAPI_SOCKET_STATE_REQ(KAPI_SOCK_STATE_UNSPEC) + KAPI_SOCKET_STATE_RESULT(KAPI_SOCK_STATE_OPEN) + KAPI_SOCKET_STATE_COND("Successful socket creation") + KAPI_SOCKET_STATE_PROTOS(KAPI_PROTO_ALL) + KAPI_SOCKET_STATE_END + + /* Protocol-specific behaviors - detailed specifications */ + KAPI_PROTOCOL_BEHAVIOR(0, KAPI_PROTO_TCP, + "TCP (Transmission Control Protocol) creates reliable, ordered, connecti= on-oriented " + "byte streams. Features: 3-way handshake connection establishment; seque= nce numbers " + "for ordering; acknowledgments and retransmissions for reliability; flow= control " + "via sliding window; congestion control (Reno/CUBIC/BBR); Nagle algorith= m for " + "small packet aggregation; keep-alive probes; urgent data via MSG_OOB. " + "Socket combines (AF_INET/AF_INET6, SOCK_STREAM, IPPROTO_TCP).") + KAPI_PROTOCOL_FLAGS(0, "TCP-specific socket options via SOL_TCP level") + KAPI_PROTOCOL_BEHAVIOR_END + + KAPI_PROTOCOL_BEHAVIOR(1, KAPI_PROTO_UDP, + "UDP (User Datagram Protocol) creates unreliable, connectionless datagra= m service. " + "Features: no connection establishment; best-effort delivery; message bo= undaries " + "preserved; no flow/congestion control; optional checksums; multicast/br= oadcast " + "capable; lower overhead than TCP. Maximum datagram size 65507 bytes (65= 535 - " + "IP header - UDP header). connect() on UDP socket sets default destinati= on. " + "Socket combines (AF_INET/AF_INET6, SOCK_DGRAM, IPPROTO_UDP).") + KAPI_PROTOCOL_FLAGS(0, "UDP-specific options like UDP_CORK via SOL_UDP") + KAPI_PROTOCOL_BEHAVIOR_END + + KAPI_PROTOCOL_BEHAVIOR(2, KAPI_PROTO_UNIX, + "Unix domain sockets provide high-performance local IPC with filesystem-= based " + "addressing or Linux abstract namespace. Features: reliable delivery; in= -order " + "semantics for SOCK_STREAM; message boundaries for SOCK_DGRAM/SOCK_SEQPA= CKET; " + "credential passing via SCM_CREDENTIALS; file descriptor passing via SCM= _RIGHTS; " + "no network overhead; kernel-only data path. SOCK_RAW mapped to SOCK_DGR= AM. " + "Maximum datagram size 130688 bytes by default (net.core.wmem_max).") + KAPI_PROTOCOL_FLAGS(0, "No Unix-specific socket level; uses SOL_SOCKET o= nly") + KAPI_PROTOCOL_BEHAVIOR_END + + KAPI_PROTOCOL_BEHAVIOR(3, KAPI_PROTO_RAW, + "Raw sockets provide direct access to network layer (IP) or link layer (= Ethernet). " + "Features: receive/send raw IP packets; implement custom protocols; pack= et " + "sniffing; bypass transport layer. IP header included based on IP_HDRINC= L option. " + "Protocol field specifies which protocol to receive (IPPROTO_ICMP, etc.)= or " + "IPPROTO_RAW to send any. Link layer access via AF_PACKET. Requires CAP_= NET_RAW " + "capability. Used by ping, traceroute, nmap, tcpdump.") + KAPI_PROTOCOL_FLAGS(0, "IP_HDRINCL and raw-specific options via SOL_RAW") + KAPI_PROTOCOL_BEHAVIOR_END + + KAPI_PROTOCOL_BEHAVIOR(4, KAPI_PROTO_PACKET, + "Packet sockets provide direct access to link layer (Layer 2). Features:= " + "send/receive raw Ethernet frames; implement network protocols in usersp= ace; " + "packet capture and injection; access to all packets on interface. SOCK_= RAW " + "provides full Layer 2 header; SOCK_DGRAM provides cooked packets withou= t " + "Layer 2 header. Protocol specifies Ethernet protocol (ETH_P_IP, ETH_P_A= LL). " + "High-performance variants: PACKET_MMAP, PACKET_FANOUT. Requires CAP_NET= _RAW.") + KAPI_PROTOCOL_FLAGS(0, "Extensive options via SOL_PACKET level") + KAPI_PROTOCOL_BEHAVIOR_END + + KAPI_PROTOCOL_BEHAVIOR(5, KAPI_PROTO_NETLINK, + "Netlink sockets provide kernel/user-space communication interface. Feat= ures: " + "reliable datagram service; multicast groups; message-based; TLV attribu= tes; " + "async notifications; used for routing, netfilter, audit, SELinux, etc. " + "Protocol specifies subsystem: NETLINK_ROUTE (routing/link), NETLINK_FIR= EWALL, " + "NETLINK_NETFILTER, NETLINK_AUDIT, etc. No special capabilities for most= " + "protocols except administrative operations.") + KAPI_PROTOCOL_FLAGS(0, "Netlink-specific options and attributes") + KAPI_PROTOCOL_BEHAVIOR_END + + KAPI_PROTOCOL_BEHAVIOR(6, KAPI_PROTO_SCTP, + "SCTP (Stream Control Transmission Protocol) provides reliable, message-= oriented " + "service with multi-streaming and multi-homing. Features: message bounda= ries; " + "ordered/unordered delivery; multi-streaming prevents head-of-line block= ing; " + "multi-homing for redundancy; heartbeats; partial reliability extension.= " + "4-way handshake with cookie mechanism prevents SYN floods. " + "Socket combines (AF_INET/AF_INET6, SOCK_STREAM/SOCK_SEQPACKET, IPPROTO_= SCTP).") + KAPI_PROTOCOL_FLAGS(0, "SCTP-specific options via SOL_SCTP level") + KAPI_PROTOCOL_BEHAVIOR_END + + KAPI_PROTOCOL_BEHAVIOR_COUNT(7) + + /* Buffer specification - not applicable for socket creation */ + KAPI_BUFFER_SPEC(0) + KAPI_BUFFER_SIZE(0, 0, 0) + KAPI_BUFFER_END + + /* Async specification - socket creation is synchronous */ + KAPI_ASYNC_SPEC(KAPI_ASYNC_BLOCK, 0) + KAPI_ASYNC_END + + /* Network-specific errors are already covered in main error list */ + + /* Address families supported - comprehensive list */ + KAPI_ADDR_FAMILY(0, AF_UNIX, "AF_UNIX/AF_LOCAL", sizeof(struct sockaddr_u= n), 2, 110) + KAPI_ADDR_FORMAT("struct sockaddr_un { sa_family_t sun_family; char sun_= path[108]; }") + KAPI_ADDR_FEATURES(false, false, false) + KAPI_ADDR_SPECIAL("Abstract namespace: sun_path[0] =3D=3D '\\0'; " + "Autobind: empty sun_path gets random abstract address; " + "Filesystem: normal paths follow filesystem permissions") + KAPI_ADDR_PORTS(0, 0) /* No port concept */ + KAPI_ADDR_FAMILY_END + + KAPI_ADDR_FAMILY(1, AF_INET, "AF_INET", sizeof(struct sockaddr_in), 16, 1= 6) + KAPI_ADDR_FORMAT("struct sockaddr_in { sa_family_t sin_family; __be16 si= n_port; " + "struct in_addr sin_addr; char sin_zero[8]; }") + KAPI_ADDR_FEATURES(true, true, true) + KAPI_ADDR_SPECIAL("INADDR_ANY (0.0.0.0) - wildcard; " + "INADDR_LOOPBACK (127.0.0.1) - loopback; " + "INADDR_BROADCAST (255.255.255.255) - broadcast; " + "224.0.0.0/4 - multicast range") + KAPI_ADDR_PORTS(0, 65535) /* 0 =3D ephemeral port assignment */ + KAPI_ADDR_FAMILY_END + + KAPI_ADDR_FAMILY(2, AF_INET6, "AF_INET6", sizeof(struct sockaddr_in6), 28= , 28) + KAPI_ADDR_FORMAT("struct sockaddr_in6 { sa_family_t sin6_family; __be16 = sin6_port; " + "__be32 sin6_flowinfo; struct in6_addr sin6_addr; __u32 sin6_scope_id= ; }") + KAPI_ADDR_FEATURES(true, true, false) /* No broadcast in IPv6 */ + KAPI_ADDR_SPECIAL("in6addr_any (::) - wildcard; " + "in6addr_loopback (::1) - loopback; " + "ff00::/8 - multicast range; " + "fe80::/10 - link-local; " + "::ffff:0:0/96 - IPv4-mapped addresses") + KAPI_ADDR_PORTS(0, 65535) + KAPI_ADDR_FAMILY_END + + KAPI_ADDR_FAMILY(3, AF_NETLINK, "AF_NETLINK", sizeof(struct sockaddr_nl),= 12, 12) + KAPI_ADDR_FORMAT("struct sockaddr_nl { sa_family_t nl_family; __u16 nl_p= ad; " + "__u32 nl_pid; __u32 nl_groups; }") + KAPI_ADDR_FEATURES(false, true, false) /* Multicast via nl_groups */ + KAPI_ADDR_SPECIAL("nl_pid: 0 =3D kernel; getpid() =3D this process; " + "nl_groups: bitmask of multicast groups") + KAPI_ADDR_PORTS(0, 0) /* Uses nl_pid instead */ + KAPI_ADDR_FAMILY_END + + KAPI_ADDR_FAMILY(4, AF_PACKET, "AF_PACKET", sizeof(struct sockaddr_ll), 2= 0, 20) + KAPI_ADDR_FORMAT("struct sockaddr_ll { sa_family_t sll_family; __be16 sl= l_protocol; " + "int sll_ifindex; __u16 sll_hatype; __u8 sll_pkttype; " + "__u8 sll_halen; __u8 sll_addr[8]; }") + KAPI_ADDR_FEATURES(true, true, true) /* Via sll_pkttype */ + KAPI_ADDR_SPECIAL("sll_ifindex: 0 =3D any interface; " + "sll_protocol: ETH_P_ALL =3D all protocols; " + "sll_pkttype: PACKET_HOST/BROADCAST/MULTICAST/OTHERHOST") + KAPI_ADDR_PORTS(0, 0) /* Layer 2, no ports */ + KAPI_ADDR_FAMILY_END + + KAPI_ADDR_FAMILY(5, AF_BLUETOOTH, "AF_BLUETOOTH", sizeof(struct sockaddr)= , 14, 258) + KAPI_ADDR_FORMAT("Varies by protocol: sockaddr_l2 (L2CAP), sockaddr_rc (= RFCOMM), " + "sockaddr_hci (HCI), sockaddr_sco (SCO)") + KAPI_ADDR_FEATURES(false, false, false) + KAPI_ADDR_SPECIAL("BDADDR_ANY (00:00:00:00:00:00) - any device; " + "BDADDR_LOCAL (00:00:00:ff:ff:ff) - local adapter") + KAPI_ADDR_PORTS(1, 30) /* PSM for L2CAP, channel for RFCOMM */ + KAPI_ADDR_FAMILY_END + + KAPI_ADDR_FAMILY_COUNT(6) + + /* Security specification - use existing capability mechanism */ + KAPI_CAPABILITY(0, CAP_NET_RAW, "CAP_NET_RAW", KAPI_CAP_GRANT_PERMISSION) + KAPI_CAP_CONDITION("family =3D=3D AF_PACKET || type =3D=3D SOCK_RAW") + KAPI_CAP_ALLOWS("Raw socket creation and packet injection") + KAPI_CAP_WITHOUT("Permission denied (EPERM)") + KAPI_CAPABILITY_END + + KAPI_CAPABILITY_COUNT(1) + + /* Operation characteristics */ + .is_connection_oriented =3D false, + .is_message_oriented =3D false, + .supports_oob_data =3D false, + .supports_peek =3D false, + .supports_select_poll =3D false, + .is_reentrant =3D true, + + /* Semantic descriptions */ + KAPI_NET_DATA_TRANSFER("Not applicable - socket() only creates the endpoi= nt") + + /* Additional constraints and validation rules */ + KAPI_CONSTRAINT(0, "Protocol/Type Compatibility", + "Not all (family, type, protocol) combinations are valid. " + "Common valid combinations: " + "(AF_INET, SOCK_STREAM, IPPROTO_TCP); " + "(AF_INET, SOCK_DGRAM, IPPROTO_UDP); " + "(AF_INET, SOCK_RAW, IPPROTO_ICMP); " + "(AF_UNIX, SOCK_STREAM, 0); " + "(AF_UNIX, SOCK_DGRAM, 0); " + "(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); " + "(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)") + KAPI_CONSTRAINT_END + + KAPI_CONSTRAINT(1, "Module Loading", + "If protocol family module not loaded, socket() may block during " + "request_module() execution. This is interruptible and may take " + "significant time. Modules loaded: net-pf-N where N is family number.") + KAPI_CONSTRAINT_END + + KAPI_CONSTRAINT(2, "Capability Requirements", + "CAP_NET_RAW required for: " + "- AF_INET/AF_INET6 with SOCK_RAW " + "- AF_PACKET with any socket type " + "- Some AF_NETLINK operations require CAP_NET_ADMIN " + "- AF_BLUETOOTH may require CAP_NET_ADMIN for some operations") + KAPI_CONSTRAINT_END + + KAPI_CONSTRAINT(3, "Network Namespace", + "Socket is created in current->nsproxy->net_ns network namespace. " + "Socket is bound to this namespace for its lifetime. " + "Different namespaces have independent network stacks.") + KAPI_CONSTRAINT_END + + KAPI_CONSTRAINT(4, "Memory Limits", + "Socket creation respects: " + "- RLIMIT_NOFILE for file descriptor limits " + "- Memory cgroup limits for socket memory " + "- System-wide socket memory limits (net.core.somaxconn, etc.) " + "- Per-protocol memory limits") + KAPI_CONSTRAINT_END + + KAPI_CONSTRAINT_COUNT(5) + +KAPI_END_SPEC; + SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol) { return __sys_socket(family, type, protocol); --=20 2.39.5