From nobody Tue Dec 2 03:00:11 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D80033446C4 for ; Mon, 17 Nov 2025 22:47:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763419643; cv=none; b=ZD6/egW0UxIdvVLRgLOfE1yAZJzFF+ksPwNf5L/Jwb6U0EZKIzkICilws2GyG04MIzShYyXk3AYdPqLsbPjqfF7uH+2OeEYCN6YlcRsw3x4kdMuUElhAM4YaeGsykzCiN47c8piC7SRFg81GBGXZyVnYIobjGOqWaQnAjOwEbZ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763419643; c=relaxed/simple; bh=rm96RSIZb4LTBvdrh7YmVYysakvIeRYfKKwfcyCMsaM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=squLeOYuzCc3QlDwpFzPBFDunrNpS+yBJoxys9MGsL2TZwCOF518Mlun8qEp6R8L3kFS84oFL8KSCppdmUCBBV1IpVowfl4KoNVQwH4OUTxXT/tp0xQEtlCZeWlioREnUFLt/Z9FGyi7Jp0jcrLNeaE0Gq6VTylrF0A5q4lEQOk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=c0Fotxn3; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="c0Fotxn3" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2955555f73dso53331765ad.0 for ; Mon, 17 Nov 2025 14:47:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1763419640; x=1764024440; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=h49/4ez8nB53RwngFZBAQKZZnMeHtus1TaIAc+Tmfyk=; b=c0Fotxn336osBdfMC25aWx0+mzKseoAmoIBsxrkeZMfxrC/n8NnOVangr+fOi5K3F2 2E5CcegaucEoDd+htJcqn4nc/Tgh4pTqA24ecaNrOdi/xayAmbcKrxylx6u95ePkSJkO ucYYAxKJRFHhex2yKwUZeSqnsKs9KAXFCrrqR3w94YFAhOkRQkrOkhGcx8yG6YZUaNSy ujN6oulyt9LM3KDQHBkmUWbcpDp84/TBF4xxnuGTRwHUGWXMyTGW4avwHgaWl4dy5pi3 sphe1LP5e4mdP4DznX/wcKxAgFcD5ATuCmU50A3KogbEUvE0W2EBP2pExe1u41sOLlsM HBMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763419640; x=1764024440; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=h49/4ez8nB53RwngFZBAQKZZnMeHtus1TaIAc+Tmfyk=; b=tpBfNuma9bwRk1a3/gKyT1SQimo2v8DaJmkoZ1aYYXM84H14HMG3Kwexvrn45Q8Qik ojh5Z9M51iT7zNSyek4pbuOmJEPecALwgnrjbYqs9OrFj6l0K8bn9Vt/PS0XE4oz7goz rJGxkhbT8ubngTY5Tq4Z7XIAaZZoZxeXqwWtUD6I5vpmKy7E3KF+nBfOwqaqLXLoyEVQ aMFf8/lIn/i9Q9lUyl65gZOeuJOV1icNS/b9pqc3Cy1rrYPY6Qa8AU6UTxGUFSaSEHNW F2OlTUVweWKVcoQxwx2dvcud9ssmLZLzzoLkYMVziJztueduz8LQBPXnuw33z304xIBK sWxA== X-Forwarded-Encrypted: i=1; AJvYcCVMgcvWF/JuN7iIl0fHJlEODb/kQjVzQTmXpzQMLtX/71/tKzOgM9Psivdr6WE0+IFvIOEXsjOSOetD8LM=@vger.kernel.org X-Gm-Message-State: AOJu0YzGlpipQYOeYHar/BwbqD9ereO3R6Aah6dHLuygqamKnLc9DXW1 o/idtzfybycTDfS5negsTvlKX4l6Wz8M+Ti1PJq0GXo5M9IGCh6WUv5dMsa96hNbDm4eX7rth6U TjBkil6YPaxz+IVYGF1tuiQM49w== X-Google-Smtp-Source: AGHT+IHWA3n4fRLuQQDbvqes2g4fdvBQx26NqKHJhSXAuqe2xxxf8ttHYK65SO+WREJuuBGajBPt+OclghSCG+uvxQ== X-Received: from plok14.prod.google.com ([2002:a17:903:3bce:b0:297:ecd0:777a]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:f691:b0:295:6a9:cb62 with SMTP id d9443c01a7336-2986a73b4a7mr170440805ad.35.1763419639915; Mon, 17 Nov 2025 14:47:19 -0800 (PST) Date: Mon, 17 Nov 2025 14:47:00 -0800 In-Reply-To: <20251117224701.1279139-1-ackerleytng@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251117224701.1279139-1-ackerleytng@google.com> X-Mailer: git-send-email 2.52.0.rc1.455.g30608eb744-goog Message-ID: <20251117224701.1279139-4-ackerleytng@google.com> Subject: [RFC PATCH 3/4] XArray: Support splitting for arbitrarily large entries From: Ackerley Tng To: willy@infradead.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: david@redhat.com, michael.roth@amd.com, vannapurve@google.com, Ackerley Tng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The existing xas_split() function primarily supports splitting an entry into multiple siblings within the current node, or creating child nodes for one level below. It does not handle scenarios where the requested split order requires wiring up multiple levels of new intermediate nodes to form a deeper subtree. This limits the xas_split() from splitting arbitrarily large entries. This commit extends xas_split() to build subtrees of XArray nodes and then set these subtrees as children of the node where the split is requested. Signed-off-by: Ackerley Tng --- I had a recursive implementation which I believe was easier to understand, = but I went with a iterative implementation using a stack because I was concerned = about stack depths in the kernel. Let me know if the recursive implementation is preferred! Feedback is always appreciated, and I'd specifically like feedback on: + RCU-related handling + Handling of xas_update() calls + Use of node_set_marks() - can/should that be refactored to be node-focuse= d, rather than in some cases also updating the child? + Can/should xas_split_alloc() read entry using xas_load(), rather than rel= y on void *entry passed as an argument? lib/xarray.c | 158 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 121 insertions(+), 37 deletions(-) diff --git a/lib/xarray.c b/lib/xarray.c index b7c44a75bb03..6fdace2e73df 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -1105,52 +1105,136 @@ EXPORT_SYMBOL_GPL(xas_split_alloc); void xas_split(struct xa_state *xas, void *entry, unsigned int order) { unsigned int sibs =3D (1 << (order % XA_CHUNK_SHIFT)) - 1; - unsigned int offset, marks; + struct xa_node *node_to_split; + unsigned int offset_to_split; + struct xa_node *stack; struct xa_node *node; - void *curr =3D xas_load(xas); - int values =3D 0; + unsigned int offset; + unsigned int marks; + unsigned int i; + void *sibling; - node =3D xas->xa_node; - if (xas_top(node)) + xas_load(xas); + node_to_split =3D xas->xa_node; + if (xas_top(node_to_split)) return; - marks =3D node_get_marks(node, xas->xa_offset); + marks =3D node_get_marks(node_to_split, xas->xa_offset); - offset =3D xas->xa_offset + sibs; - do { - if (xas->xa_shift < node->shift) { - struct xa_node *child =3D xas->xa_alloc; - - xas->xa_alloc =3D rcu_dereference_raw(child->parent); - __xas_init_node_for_split(xas, child, entry); - child->shift =3D node->shift - XA_CHUNK_SHIFT; - child->offset =3D offset; - child->count =3D XA_CHUNK_SIZE; - child->nr_values =3D xa_is_value(entry) ? - XA_CHUNK_SIZE : 0; - RCU_INIT_POINTER(child->parent, node); - node_set_marks(node, offset, child, xas->xa_sibs, - marks); - rcu_assign_pointer(node->slots[offset], - xa_mk_node(child)); - if (xa_is_value(curr)) - values--; - xas_update(xas, child); + /* Horizontal split: just fill in values in existing node. */ + if (node_to_split->shift =3D=3D xas->xa_shift) { + offset =3D xas->xa_offset; + + for (i =3D offset; i < offset + sibs + 1; i++) { + if ((i & xas->xa_sibs) =3D=3D 0) { + node_set_marks(node_to_split, i, NULL, 0, marks); + rcu_assign_pointer(node_to_split->slots[i], entry); + + sibling =3D xa_mk_sibling(i); + } else { + rcu_assign_pointer(node_to_split->slots[i], sibling); + } + } + + xas_update(xas, node_to_split); + return; + } + + /* + * Vertical split: build tree bottom-up, so that on any RCU lookup (on + * the original tree), the tree is consistent. + */ + offset_to_split =3D get_offset(xas->xa_index, node_to_split); + stack =3D NULL; + offset =3D 0; + for (;;) { + unsigned int next_offset; + + if (stack && + stack->shift =3D=3D node_to_split->shift - XA_CHUNK_SHIFT && + stack->offset =3D=3D offset_to_split + sibs) + break; + + if (stack && stack->offset =3D=3D XA_CHUNK_SIZE - 1) { + unsigned int child_shift; + + node =3D xas->xa_alloc; + xas->xa_alloc =3D rcu_dereference_raw(node->parent); + + child_shift =3D stack->shift; + for (i =3D 0; i < XA_CHUNK_SIZE; i++) { + struct xa_node *child =3D stack; + + stack =3D child->parent; + + node_set_marks(node, child->offset, NULL, 0, marks); + + RCU_INIT_POINTER(child->parent, node); + RCU_INIT_POINTER(node->slots[child->offset], xa_mk_node(child)); + } + + node->array =3D xas->xa; + node->count =3D XA_CHUNK_SIZE; + node->nr_values =3D 0; + node->shift =3D child_shift + XA_CHUNK_SHIFT; + node->offset =3D 0; + if (node->shift =3D=3D node_to_split->shift - XA_CHUNK_SHIFT) + node->offset =3D offset_to_split; + if (stack && stack->shift =3D=3D node->shift) + node->offset =3D stack->offset + 1; + + next_offset =3D 0; + + xas_update(xas, node); } else { - unsigned int canon =3D offset - xas->xa_sibs; + node =3D xas->xa_alloc; + xas->xa_alloc =3D rcu_dereference_raw(node->parent); - node_set_marks(node, canon, NULL, 0, marks); - rcu_assign_pointer(node->slots[canon], entry); - while (offset > canon) - rcu_assign_pointer(node->slots[offset--], - xa_mk_sibling(canon)); - values +=3D (xa_is_value(entry) - xa_is_value(curr)) * - (xas->xa_sibs + 1); + for (i =3D 0; i < XA_CHUNK_SIZE; i++) { + if ((i & xas->xa_sibs) =3D=3D 0) { + node_set_marks(node, i, NULL, 0, marks); + RCU_INIT_POINTER(node->slots[i], entry); + + sibling =3D xa_mk_sibling(i); + } else { + RCU_INIT_POINTER(node->slots[i], sibling); + } + } + + node->array =3D xas->xa; + node->count =3D XA_CHUNK_SIZE; + node->nr_values =3D xa_is_value(entry) ? XA_CHUNK_SIZE : 0; + node->shift =3D xas->xa_shift; + node->offset =3D offset; + if (node->shift =3D=3D node_to_split->shift - XA_CHUNK_SHIFT) + node->offset =3D offset_to_split; + if (stack && stack->shift =3D=3D node->shift) + node->offset =3D stack->offset + 1; + + next_offset =3D offset + 1; } - } while (offset-- > xas->xa_offset); - node->nr_values +=3D values; - xas_update(xas, node); + node->parent =3D stack; + stack =3D node; + + offset =3D next_offset; + } + + /* Combine all the new nodes on the stack into node_to_split. */ + for (i =3D 0; i < sibs + 1; i++) { + node =3D stack; + stack =3D node->parent; + + node_set_marks(node_to_split, node->offset, NULL, 0, marks); + + rcu_assign_pointer(node->parent, node_to_split); + rcu_assign_pointer(node_to_split->slots[node->offset], xa_mk_node(node)); + } + + WARN_ON(stack); + + node_to_split->nr_values -=3D xa_is_value(entry) ? sibs + 1 : 0; + xas_update(xas, node_to_split); } EXPORT_SYMBOL_GPL(xas_split); -- 2.52.0.rc1.455.g30608eb744-goog