[v6] Use Rust Bitmap from Rust Binder driver

[PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Alice Ryhl 2 months, 2 weeks ago

To find an unused Binder handle, Rust Binder currently iterates the
red/black tree from the beginning until it finds a gap in the keys. This
is extremely slow.

To improve the performance, add a bitmap that keeps track of which
indices are actually in use. This allows us to quickly find an unused
key in the red/black tree.

For a benchmark, please see the below numbers that were obtained from
modifying binderThroughputTest to send a node with each transaction and
stashing it in the server. This results in the number of nodes
increasing by one for every transaction sent. I got the following table
of roundtrip latencies (in µs):

Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
0 - 10,000        │          176.88 │         92.93 │          99.41
10,000 - 20,000   │          437.37 │         87.74 │          98.55
20,000 - 30,000   │          677.49 │         76.24 │          96.37
30,000 - 40,000   │          901.76 │         83.39 │          96.73
40,000 - 50,000   │         1126.62 │        100.44 │          94.57
50,000 - 60,000   │         1288.98 │         94.38 │          96.64
60,000 - 70,000   │         1588.74 │         88.27 │          96.36
70,000 - 80,000   │         1812.97 │         93.97 │          91.24
80,000 - 90,000   │         2062.95 │         92.22 │         102.01
90,000 - 100,000  │         2330.03 │         97.18 │         100.31

It should be clear that the current Rust code becomes linearly slower
per insertion as the number of calls to rb_next() per transaction
increases. After this change, the time to find an ID number appears
constant. (Technically it is not constant-time as both insertion and
removal scan the entire bitmap. However, quick napkin math shows that
scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
in a benchmark where the rountrip latency is 100µs.)

I've included a comparison to the C driver, which uses the same bitmap
algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
for faster descriptor lookup").

This currently checks if the bitmap should be shrunk after every
removal. One potential future change is introducing a shrinker to make
this operation O(1), but based on the benchmark above this does not seem
required at this time.

Reviewed-by: Burak Emir <bqe@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
--- a/drivers/android/binder/process.rs
+++ b/drivers/android/binder/process.rs
@@ -19,6 +19,7 @@
     cred::Credential,
     error::Error,
     fs::file::{self, File},
+    id_pool::IdPool,
     list::{List, ListArc, ListArcField, ListLinks},
     mm,
     prelude::*,
@@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
 struct ProcessNodeRefs {
     /// Used to look up nodes using the 32-bit id that this process knows it by.
     by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
+    /// Used to quickly find unused ids in `by_handle`.
+    handle_is_present: IdPool,
     /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
     /// the underlying `Node` struct as returned by `Node::global_id`.
     by_node: RBTree<usize, u32>,
@@ -381,6 +384,7 @@ impl ProcessNodeRefs {
     fn new() -> Self {
         Self {
             by_handle: RBTree::new(),
+            handle_is_present: IdPool::new(),
             by_node: RBTree::new(),
             freeze_listeners: RBTree::new(),
         }
@@ -775,7 +779,7 @@ pub(crate) fn get_node(
     pub(crate) fn insert_or_update_handle(
         self: ArcBorrow<'_, Process>,
         node_ref: NodeRef,
-        is_mananger: bool,
+        is_manager: bool,
     ) -> Result<u32> {
         {
             let mut refs = self.node_refs.lock();
@@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
         let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
         let info = UniqueArc::new_uninit(GFP_KERNEL)?;
 
-        let mut refs = self.node_refs.lock();
+        let mut refs_lock = self.node_refs.lock();
+        let mut refs = &mut *refs_lock;
+
+        let (unused_id, by_handle_slot) = loop {
+            // ID 0 may only be used by the manager.
+            let start = if is_manager { 0 } else { 1 };
+
+            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
+                match refs.by_handle.entry(res.as_u32()) {
+                    rbtree::Entry::Vacant(entry) => break (res, entry),
+                    rbtree::Entry::Occupied(_) => {
+                        pr_err!("Detected mismatch between handle_is_present and by_handle");
+                        res.acquire();
+                        kernel::warn_on!(true);
+                        return Err(EINVAL);
+                    }
+                }
+            }
+
+            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
+            drop(refs_lock);
+            let resizer = grow_request.realloc(GFP_KERNEL)?;
+            refs_lock = self.node_refs.lock();
+            refs = &mut *refs_lock;
+            refs.handle_is_present.grow(resizer);
+        };
+        let handle = unused_id.as_u32();
 
         // Do a lookup again as node may have been inserted before the lock was reacquired.
         if let Some(handle_ref) = refs.by_node.get(&node_ref.node.global_id()) {
@@ -804,20 +834,9 @@ pub(crate) fn insert_or_update_handle(
             return Ok(handle);
         }
 
-        // Find id.
-        let mut target: u32 = if is_mananger { 0 } else { 1 };
-        for handle in refs.by_handle.keys() {
-            if *handle > target {
-                break;
-            }
-            if *handle == target {
-                target = target.checked_add(1).ok_or(ENOMEM)?;
-            }
-        }
-
         let gid = node_ref.node.global_id();
         let (info_proc, info_node) = {
-            let info_init = NodeRefInfo::new(node_ref, target, self.into());
+            let info_init = NodeRefInfo::new(node_ref, handle, self.into());
             match info.pin_init_with(info_init) {
                 Ok(info) => ListArc::pair_from_pin_unique(info),
                 // error is infallible
@@ -838,9 +857,10 @@ pub(crate) fn insert_or_update_handle(
         // `info_node` into the right node's `refs` list.
         unsafe { info_proc.node_ref2().node.insert_node_info(info_node) };
 
-        refs.by_node.insert(reserve1.into_node(gid, target));
-        refs.by_handle.insert(reserve2.into_node(target, info_proc));
-        Ok(target)
+        refs.by_node.insert(reserve1.into_node(gid, handle));
+        by_handle_slot.insert(info_proc, reserve2);
+        unused_id.acquire();
+        Ok(handle)
     }
 
     pub(crate) fn get_transaction_node(&self, handle: u32) -> BinderResult<NodeRef> {
@@ -905,6 +925,16 @@ pub(crate) fn update_ref(
                 let id = info.node_ref().node.global_id();
                 refs.by_handle.remove(&handle);
                 refs.by_node.remove(&id);
+                refs.handle_is_present.release_id(handle as usize);
+
+                if let Some(shrink) = refs.handle_is_present.shrink_request() {
+                    drop(refs);
+                    // This intentionally ignores allocation failures.
+                    if let Ok(new_bitmap) = shrink.realloc(GFP_KERNEL) {
+                        refs = self.node_refs.lock();
+                        refs.handle_is_present.shrink(new_bitmap);
+                    }
+                }
             }
         } else {
             // All refs are cleared in process exit, so this warning is expected in that case.

-- 
2.52.0.460.gd25c4c69ec-goog

Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Yury Norov 2 months, 2 weeks ago

On Tue, Nov 25, 2025 at 01:59:42PM +0000, Alice Ryhl wrote:
> To find an unused Binder handle, Rust Binder currently iterates the
> red/black tree from the beginning until it finds a gap in the keys. This
> is extremely slow.
> 
> To improve the performance, add a bitmap that keeps track of which
> indices are actually in use. This allows us to quickly find an unused
> key in the red/black tree.
> 
> For a benchmark, please see the below numbers that were obtained from
> modifying binderThroughputTest to send a node with each transaction and
> stashing it in the server. This results in the number of nodes
> increasing by one for every transaction sent. I got the following table
> of roundtrip latencies (in µs):
> 
> Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
> 0 - 10,000        │          176.88 │         92.93 │          99.41
> 10,000 - 20,000   │          437.37 │         87.74 │          98.55
> 20,000 - 30,000   │          677.49 │         76.24 │          96.37
> 30,000 - 40,000   │          901.76 │         83.39 │          96.73
> 40,000 - 50,000   │         1126.62 │        100.44 │          94.57
> 50,000 - 60,000   │         1288.98 │         94.38 │          96.64
> 60,000 - 70,000   │         1588.74 │         88.27 │          96.36
> 70,000 - 80,000   │         1812.97 │         93.97 │          91.24
> 80,000 - 90,000   │         2062.95 │         92.22 │         102.01
> 90,000 - 100,000  │         2330.03 │         97.18 │         100.31
> 
> It should be clear that the current Rust code becomes linearly slower
> per insertion as the number of calls to rb_next() per transaction
> increases. After this change, the time to find an ID number appears
> constant. (Technically it is not constant-time as both insertion and
> removal scan the entire bitmap. However, quick napkin math shows that
> scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
> in a benchmark where the rountrip latency is 100µs.)
> 
> I've included a comparison to the C driver, which uses the same bitmap
> algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
> for faster descriptor lookup").

Thanks for the solid numbers!

> This currently checks if the bitmap should be shrunk after every
> removal. One potential future change is introducing a shrinker to make
> this operation O(1), but based on the benchmark above this does not seem
> required at this time.
> 
> Reviewed-by: Burak Emir <bqe@google.com>
> Acked-by: Carlos Llamas <cmllamas@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
>  drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
>  1 file changed, 47 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
> index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
> --- a/drivers/android/binder/process.rs
> +++ b/drivers/android/binder/process.rs
> @@ -19,6 +19,7 @@
>      cred::Credential,
>      error::Error,
>      fs::file::{self, File},
> +    id_pool::IdPool,
>      list::{List, ListArc, ListArcField, ListLinks},
>      mm,
>      prelude::*,
> @@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
>  struct ProcessNodeRefs {
>      /// Used to look up nodes using the 32-bit id that this process knows it by.
>      by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
> +    /// Used to quickly find unused ids in `by_handle`.
> +    handle_is_present: IdPool,
>      /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
>      /// the underlying `Node` struct as returned by `Node::global_id`.
>      by_node: RBTree<usize, u32>,
> @@ -381,6 +384,7 @@ impl ProcessNodeRefs {
>      fn new() -> Self {
>          Self {
>              by_handle: RBTree::new(),
> +            handle_is_present: IdPool::new(),
>              by_node: RBTree::new(),
>              freeze_listeners: RBTree::new(),
>          }
> @@ -775,7 +779,7 @@ pub(crate) fn get_node(
>      pub(crate) fn insert_or_update_handle(
>          self: ArcBorrow<'_, Process>,
>          node_ref: NodeRef,
> -        is_mananger: bool,
> +        is_manager: bool,
>      ) -> Result<u32> {
>          {
>              let mut refs = self.node_refs.lock();
> @@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
>          let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
>          let info = UniqueArc::new_uninit(GFP_KERNEL)?;
>  
> -        let mut refs = self.node_refs.lock();
> +        let mut refs_lock = self.node_refs.lock();
> +        let mut refs = &mut *refs_lock;
> +
> +        let (unused_id, by_handle_slot) = loop {
> +            // ID 0 may only be used by the manager.
> +            let start = if is_manager { 0 } else { 1 };
> +
> +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> +                match refs.by_handle.entry(res.as_u32()) {
> +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> +                    rbtree::Entry::Occupied(_) => {
> +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> +                        res.acquire();
> +                        kernel::warn_on!(true);
> +                        return Err(EINVAL);

EINVAL means that user provides a wrong parameter. Here's a data
corruption. Maybe EFAULT?

> +                    }
> +                }
> +            }
> +
> +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> +            drop(refs_lock);
> +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> +            refs_lock = self.node_refs.lock();
> +            refs = &mut *refs_lock;
> +            refs.handle_is_present.grow(resizer);

This continues puzzling me. Refs_lock protects refs, and the spec
says:

    a reference’s scope starts from where it is introduced and
    continues through the last time that reference is used.

https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html

The last usage of refs is at .grow_request() line, because later it's
reused with the new value. 

If my reading of the spec is correct, after dropping the refs_lock,
you may get rescheduled, and another thread may follow the same path.
Because refs_lock is dropped explicitly and refs - implicitly, the
concurrent thread can grab both and follow with resizing the id map.

When your first thread will get back, you'll end up resizing the
already resized map.

I asked your AI, and it says that this race is indeed possible for
exactly that reason. But it doesn't break memory safety, so the
compiler is happy about it...

> +        };
> +        let handle = unused_id.as_u32();
>  
>          // Do a lookup again as node may have been inserted before the lock was reacquired.
>          if let Some(handle_ref) = refs.by_node.get(&node_ref.node.global_id()) {
> @@ -804,20 +834,9 @@ pub(crate) fn insert_or_update_handle(
>              return Ok(handle);
>          }
>  
> -        // Find id.
> -        let mut target: u32 = if is_mananger { 0 } else { 1 };
> -        for handle in refs.by_handle.keys() {
> -            if *handle > target {
> -                break;
> -            }
> -            if *handle == target {
> -                target = target.checked_add(1).ok_or(ENOMEM)?;
> -            }
> -        }
> -
>          let gid = node_ref.node.global_id();
>          let (info_proc, info_node) = {
> -            let info_init = NodeRefInfo::new(node_ref, target, self.into());
> +            let info_init = NodeRefInfo::new(node_ref, handle, self.into());
>              match info.pin_init_with(info_init) {
>                  Ok(info) => ListArc::pair_from_pin_unique(info),
>                  // error is infallible
> @@ -838,9 +857,10 @@ pub(crate) fn insert_or_update_handle(
>          // `info_node` into the right node's `refs` list.
>          unsafe { info_proc.node_ref2().node.insert_node_info(info_node) };
>  
> -        refs.by_node.insert(reserve1.into_node(gid, target));
> -        refs.by_handle.insert(reserve2.into_node(target, info_proc));
> -        Ok(target)
> +        refs.by_node.insert(reserve1.into_node(gid, handle));
> +        by_handle_slot.insert(info_proc, reserve2);
> +        unused_id.acquire();
> +        Ok(handle)
>      }
>  
>      pub(crate) fn get_transaction_node(&self, handle: u32) -> BinderResult<NodeRef> {
> @@ -905,6 +925,16 @@ pub(crate) fn update_ref(
>                  let id = info.node_ref().node.global_id();
>                  refs.by_handle.remove(&handle);
>                  refs.by_node.remove(&id);
> +                refs.handle_is_present.release_id(handle as usize);
> +
> +                if let Some(shrink) = refs.handle_is_present.shrink_request() {
> +                    drop(refs);
> +                    // This intentionally ignores allocation failures.
> +                    if let Ok(new_bitmap) = shrink.realloc(GFP_KERNEL) {
> +                        refs = self.node_refs.lock();
> +                        refs.handle_is_present.shrink(new_bitmap);
> +                    }
> +                }
>              }
>          } else {
>              // All refs are cleared in process exit, so this warning is expected in that case.
> 
> -- 
> 2.52.0.460.gd25c4c69ec-goog

Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Alice Ryhl 2 months, 2 weeks ago

On Tue, Nov 25, 2025 at 09:26:46PM -0500, Yury Norov wrote:
> On Tue, Nov 25, 2025 at 01:59:42PM +0000, Alice Ryhl wrote:
> > To find an unused Binder handle, Rust Binder currently iterates the
> > red/black tree from the beginning until it finds a gap in the keys. This
> > is extremely slow.
> > 
> > To improve the performance, add a bitmap that keeps track of which
> > indices are actually in use. This allows us to quickly find an unused
> > key in the red/black tree.
> > 
> > For a benchmark, please see the below numbers that were obtained from
> > modifying binderThroughputTest to send a node with each transaction and
> > stashing it in the server. This results in the number of nodes
> > increasing by one for every transaction sent. I got the following table
> > of roundtrip latencies (in µs):
> > 
> > Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
> > 0 - 10,000        │          176.88 │         92.93 │          99.41
> > 10,000 - 20,000   │          437.37 │         87.74 │          98.55
> > 20,000 - 30,000   │          677.49 │         76.24 │          96.37
> > 30,000 - 40,000   │          901.76 │         83.39 │          96.73
> > 40,000 - 50,000   │         1126.62 │        100.44 │          94.57
> > 50,000 - 60,000   │         1288.98 │         94.38 │          96.64
> > 60,000 - 70,000   │         1588.74 │         88.27 │          96.36
> > 70,000 - 80,000   │         1812.97 │         93.97 │          91.24
> > 80,000 - 90,000   │         2062.95 │         92.22 │         102.01
> > 90,000 - 100,000  │         2330.03 │         97.18 │         100.31
> > 
> > It should be clear that the current Rust code becomes linearly slower
> > per insertion as the number of calls to rb_next() per transaction
> > increases. After this change, the time to find an ID number appears
> > constant. (Technically it is not constant-time as both insertion and
> > removal scan the entire bitmap. However, quick napkin math shows that
> > scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
> > in a benchmark where the rountrip latency is 100µs.)
> > 
> > I've included a comparison to the C driver, which uses the same bitmap
> > algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
> > for faster descriptor lookup").
> 
> Thanks for the solid numbers!
> 
> > This currently checks if the bitmap should be shrunk after every
> > removal. One potential future change is introducing a shrinker to make
> > this operation O(1), but based on the benchmark above this does not seem
> > required at this time.
> > 
> > Reviewed-by: Burak Emir <bqe@google.com>
> > Acked-by: Carlos Llamas <cmllamas@google.com>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> >  drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
> >  1 file changed, 47 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
> > index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
> > --- a/drivers/android/binder/process.rs
> > +++ b/drivers/android/binder/process.rs
> > @@ -19,6 +19,7 @@
> >      cred::Credential,
> >      error::Error,
> >      fs::file::{self, File},
> > +    id_pool::IdPool,
> >      list::{List, ListArc, ListArcField, ListLinks},
> >      mm,
> >      prelude::*,
> > @@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
> >  struct ProcessNodeRefs {
> >      /// Used to look up nodes using the 32-bit id that this process knows it by.
> >      by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
> > +    /// Used to quickly find unused ids in `by_handle`.
> > +    handle_is_present: IdPool,
> >      /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
> >      /// the underlying `Node` struct as returned by `Node::global_id`.
> >      by_node: RBTree<usize, u32>,
> > @@ -381,6 +384,7 @@ impl ProcessNodeRefs {
> >      fn new() -> Self {
> >          Self {
> >              by_handle: RBTree::new(),
> > +            handle_is_present: IdPool::new(),
> >              by_node: RBTree::new(),
> >              freeze_listeners: RBTree::new(),
> >          }
> > @@ -775,7 +779,7 @@ pub(crate) fn get_node(
> >      pub(crate) fn insert_or_update_handle(
> >          self: ArcBorrow<'_, Process>,
> >          node_ref: NodeRef,
> > -        is_mananger: bool,
> > +        is_manager: bool,
> >      ) -> Result<u32> {
> >          {
> >              let mut refs = self.node_refs.lock();
> > @@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
> >          let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
> >          let info = UniqueArc::new_uninit(GFP_KERNEL)?;
> >  
> > -        let mut refs = self.node_refs.lock();
> > +        let mut refs_lock = self.node_refs.lock();
> > +        let mut refs = &mut *refs_lock;
> > +
> > +        let (unused_id, by_handle_slot) = loop {
> > +            // ID 0 may only be used by the manager.
> > +            let start = if is_manager { 0 } else { 1 };
> > +
> > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > +                match refs.by_handle.entry(res.as_u32()) {
> > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > +                    rbtree::Entry::Occupied(_) => {
> > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > +                        res.acquire();
> > +                        kernel::warn_on!(true);
> > +                        return Err(EINVAL);
> 
> EINVAL means that user provides a wrong parameter. Here's a data
> corruption. Maybe EFAULT?

Hmm ... EFAULT is used when the user passes a dangling pointer that
copy_from_user() refuses to use. I would like to avoid confusion with
that as well. Is there a third option?

> > +                    }
> > +                }
> > +            }
> > +
> > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > +            drop(refs_lock);
> > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > +            refs_lock = self.node_refs.lock();
> > +            refs = &mut *refs_lock;
> > +            refs.handle_is_present.grow(resizer);
> 
> This continues puzzling me. Refs_lock protects refs, and the spec
> says:
> 
>     a reference’s scope starts from where it is introduced and
>     continues through the last time that reference is used.
> 
> https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html
> 
> The last usage of refs is at .grow_request() line, because later it's
> reused with the new value. 
> 
> If my reading of the spec is correct, after dropping the refs_lock,
> you may get rescheduled, and another thread may follow the same path.
> Because refs_lock is dropped explicitly and refs - implicitly, the
> concurrent thread can grab both and follow with resizing the id map.
> 
> When your first thread will get back, you'll end up resizing the
> already resized map.
> 
> I asked your AI, and it says that this race is indeed possible for
> exactly that reason. But it doesn't break memory safety, so the
> compiler is happy about it...

You're right that since we release the mutex, someone else may have
resized the map in the meantime. That's why we implemented grow() to do
nothing if it was already resized:

	pub fn grow(&mut self, mut resizer: PoolResizer) {
	    // Between request to grow that led to allocation of `resizer` and now,
	    // another thread may have already grown the capacity.
	    // In this case, do nothing, drop `resizer` and move on.
	    if resizer.new.len() <= self.capacity() {
	        return;
	    }
	
	    resizer.new.copy_and_extend(&self.map);
	    self.map = resizer.new;
	}

Alice

Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Yury Norov 2 months, 2 weeks ago

On Wed, Nov 26, 2025 at 08:35:55AM +0000, Alice Ryhl wrote:
> On Tue, Nov 25, 2025 at 09:26:46PM -0500, Yury Norov wrote:
> > On Tue, Nov 25, 2025 at 01:59:42PM +0000, Alice Ryhl wrote:
> > > To find an unused Binder handle, Rust Binder currently iterates the
> > > red/black tree from the beginning until it finds a gap in the keys. This
> > > is extremely slow.
> > > 
> > > To improve the performance, add a bitmap that keeps track of which
> > > indices are actually in use. This allows us to quickly find an unused
> > > key in the red/black tree.
> > > 
> > > For a benchmark, please see the below numbers that were obtained from
> > > modifying binderThroughputTest to send a node with each transaction and
> > > stashing it in the server. This results in the number of nodes
> > > increasing by one for every transaction sent. I got the following table
> > > of roundtrip latencies (in µs):
> > > 
> > > Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
> > > 0 - 10,000        │          176.88 │         92.93 │          99.41
> > > 10,000 - 20,000   │          437.37 │         87.74 │          98.55
> > > 20,000 - 30,000   │          677.49 │         76.24 │          96.37
> > > 30,000 - 40,000   │          901.76 │         83.39 │          96.73
> > > 40,000 - 50,000   │         1126.62 │        100.44 │          94.57
> > > 50,000 - 60,000   │         1288.98 │         94.38 │          96.64
> > > 60,000 - 70,000   │         1588.74 │         88.27 │          96.36
> > > 70,000 - 80,000   │         1812.97 │         93.97 │          91.24
> > > 80,000 - 90,000   │         2062.95 │         92.22 │         102.01
> > > 90,000 - 100,000  │         2330.03 │         97.18 │         100.31
> > > 
> > > It should be clear that the current Rust code becomes linearly slower
> > > per insertion as the number of calls to rb_next() per transaction
> > > increases. After this change, the time to find an ID number appears
> > > constant. (Technically it is not constant-time as both insertion and
> > > removal scan the entire bitmap. However, quick napkin math shows that
> > > scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
> > > in a benchmark where the rountrip latency is 100µs.)
> > > 
> > > I've included a comparison to the C driver, which uses the same bitmap
> > > algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
> > > for faster descriptor lookup").
> > 
> > Thanks for the solid numbers!
> > 
> > > This currently checks if the bitmap should be shrunk after every
> > > removal. One potential future change is introducing a shrinker to make
> > > this operation O(1), but based on the benchmark above this does not seem
> > > required at this time.
> > > 
> > > Reviewed-by: Burak Emir <bqe@google.com>
> > > Acked-by: Carlos Llamas <cmllamas@google.com>
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > ---
> > >  drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
> > >  1 file changed, 47 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
> > > index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
> > > --- a/drivers/android/binder/process.rs
> > > +++ b/drivers/android/binder/process.rs
> > > @@ -19,6 +19,7 @@
> > >      cred::Credential,
> > >      error::Error,
> > >      fs::file::{self, File},
> > > +    id_pool::IdPool,
> > >      list::{List, ListArc, ListArcField, ListLinks},
> > >      mm,
> > >      prelude::*,
> > > @@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
> > >  struct ProcessNodeRefs {
> > >      /// Used to look up nodes using the 32-bit id that this process knows it by.
> > >      by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
> > > +    /// Used to quickly find unused ids in `by_handle`.
> > > +    handle_is_present: IdPool,
> > >      /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
> > >      /// the underlying `Node` struct as returned by `Node::global_id`.
> > >      by_node: RBTree<usize, u32>,
> > > @@ -381,6 +384,7 @@ impl ProcessNodeRefs {
> > >      fn new() -> Self {
> > >          Self {
> > >              by_handle: RBTree::new(),
> > > +            handle_is_present: IdPool::new(),
> > >              by_node: RBTree::new(),
> > >              freeze_listeners: RBTree::new(),
> > >          }
> > > @@ -775,7 +779,7 @@ pub(crate) fn get_node(
> > >      pub(crate) fn insert_or_update_handle(
> > >          self: ArcBorrow<'_, Process>,
> > >          node_ref: NodeRef,
> > > -        is_mananger: bool,
> > > +        is_manager: bool,
> > >      ) -> Result<u32> {
> > >          {
> > >              let mut refs = self.node_refs.lock();
> > > @@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
> > >          let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
> > >          let info = UniqueArc::new_uninit(GFP_KERNEL)?;
> > >  
> > > -        let mut refs = self.node_refs.lock();
> > > +        let mut refs_lock = self.node_refs.lock();
> > > +        let mut refs = &mut *refs_lock;
> > > +
> > > +        let (unused_id, by_handle_slot) = loop {
> > > +            // ID 0 may only be used by the manager.
> > > +            let start = if is_manager { 0 } else { 1 };
> > > +
> > > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > > +                match refs.by_handle.entry(res.as_u32()) {
> > > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > > +                    rbtree::Entry::Occupied(_) => {
> > > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > > +                        res.acquire();
> > > +                        kernel::warn_on!(true);
> > > +                        return Err(EINVAL);
> > 
> > EINVAL means that user provides a wrong parameter. Here's a data
> > corruption. Maybe EFAULT?
> 
> Hmm ... EFAULT is used when the user passes a dangling pointer that
> copy_from_user() refuses to use. I would like to avoid confusion with
> that as well. Is there a third option?
> 
> > > +                    }
> > > +                }
> > > +            }
> > > +
> > > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > > +            drop(refs_lock);
> > > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > > +            refs_lock = self.node_refs.lock();
> > > +            refs = &mut *refs_lock;
> > > +            refs.handle_is_present.grow(resizer);
> > 
> > This continues puzzling me. Refs_lock protects refs, and the spec
> > says:
> > 
> >     a reference’s scope starts from where it is introduced and
> >     continues through the last time that reference is used.
> > 
> > https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html
> > 
> > The last usage of refs is at .grow_request() line, because later it's
> > reused with the new value. 
> > 
> > If my reading of the spec is correct, after dropping the refs_lock,
> > you may get rescheduled, and another thread may follow the same path.
> > Because refs_lock is dropped explicitly and refs - implicitly, the
> > concurrent thread can grab both and follow with resizing the id map.
> > 
> > When your first thread will get back, you'll end up resizing the
> > already resized map.
> > 
> > I asked your AI, and it says that this race is indeed possible for
> > exactly that reason. But it doesn't break memory safety, so the
> > compiler is happy about it...
> 
> You're right that since we release the mutex, someone else may have
> resized the map in the meantime. That's why we implemented grow() to do
> nothing if it was already resized:
> 
> 	pub fn grow(&mut self, mut resizer: PoolResizer) {
> 	    // Between request to grow that led to allocation of `resizer` and now,
> 	    // another thread may have already grown the capacity.
> 	    // In this case, do nothing, drop `resizer` and move on.
> 	    if resizer.new.len() <= self.capacity() {
> 	        return;
> 	    }
> 	
> 	    resizer.new.copy_and_extend(&self.map);
> 	    self.map = resizer.new;
> 	}

OK, I see...

I expected that you'll switch the pool state to 'growing' before
dropping the ref_lock. This way, the concurrent thread would be able
to go straight to the new iteration.

Something like:
        
        if let Some(res) = refs.handle_is_present.find_unused_id(start) {
                ...
        }

        if refs.is_growing() {
                drop(refs_lock);
                continue;
        }
        let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
        refs.set_growing()
        drop(refs_lock);
        let resizer = grow_request.realloc(GFP_KERNEL)?;
        ...

Your approach also works, assuming you're OK with a useless alloc/free
bitmaps in the case of collision.

So, I can take this series as is, or I can wait for v7 if you prefer.

Please advise.

Thanks,
Yury

Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Alice Ryhl 2 months, 1 week ago

On Wed, Nov 26, 2025 at 10:31:03AM -0500, Yury Norov wrote:
> On Wed, Nov 26, 2025 at 08:35:55AM +0000, Alice Ryhl wrote:
> > On Tue, Nov 25, 2025 at 09:26:46PM -0500, Yury Norov wrote:
> > > On Tue, Nov 25, 2025 at 01:59:42PM +0000, Alice Ryhl wrote:
> > > > To find an unused Binder handle, Rust Binder currently iterates the
> > > > red/black tree from the beginning until it finds a gap in the keys. This
> > > > is extremely slow.
> > > > 
> > > > To improve the performance, add a bitmap that keeps track of which
> > > > indices are actually in use. This allows us to quickly find an unused
> > > > key in the red/black tree.
> > > > 
> > > > For a benchmark, please see the below numbers that were obtained from
> > > > modifying binderThroughputTest to send a node with each transaction and
> > > > stashing it in the server. This results in the number of nodes
> > > > increasing by one for every transaction sent. I got the following table
> > > > of roundtrip latencies (in µs):
> > > > 
> > > > Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
> > > > 0 - 10,000        │          176.88 │         92.93 │          99.41
> > > > 10,000 - 20,000   │          437.37 │         87.74 │          98.55
> > > > 20,000 - 30,000   │          677.49 │         76.24 │          96.37
> > > > 30,000 - 40,000   │          901.76 │         83.39 │          96.73
> > > > 40,000 - 50,000   │         1126.62 │        100.44 │          94.57
> > > > 50,000 - 60,000   │         1288.98 │         94.38 │          96.64
> > > > 60,000 - 70,000   │         1588.74 │         88.27 │          96.36
> > > > 70,000 - 80,000   │         1812.97 │         93.97 │          91.24
> > > > 80,000 - 90,000   │         2062.95 │         92.22 │         102.01
> > > > 90,000 - 100,000  │         2330.03 │         97.18 │         100.31
> > > > 
> > > > It should be clear that the current Rust code becomes linearly slower
> > > > per insertion as the number of calls to rb_next() per transaction
> > > > increases. After this change, the time to find an ID number appears
> > > > constant. (Technically it is not constant-time as both insertion and
> > > > removal scan the entire bitmap. However, quick napkin math shows that
> > > > scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
> > > > in a benchmark where the rountrip latency is 100µs.)
> > > > 
> > > > I've included a comparison to the C driver, which uses the same bitmap
> > > > algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
> > > > for faster descriptor lookup").
> > > 
> > > Thanks for the solid numbers!
> > > 
> > > > This currently checks if the bitmap should be shrunk after every
> > > > removal. One potential future change is introducing a shrinker to make
> > > > this operation O(1), but based on the benchmark above this does not seem
> > > > required at this time.
> > > > 
> > > > Reviewed-by: Burak Emir <bqe@google.com>
> > > > Acked-by: Carlos Llamas <cmllamas@google.com>
> > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > > ---
> > > >  drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
> > > >  1 file changed, 47 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
> > > > index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
> > > > --- a/drivers/android/binder/process.rs
> > > > +++ b/drivers/android/binder/process.rs
> > > > @@ -19,6 +19,7 @@
> > > >      cred::Credential,
> > > >      error::Error,
> > > >      fs::file::{self, File},
> > > > +    id_pool::IdPool,
> > > >      list::{List, ListArc, ListArcField, ListLinks},
> > > >      mm,
> > > >      prelude::*,
> > > > @@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
> > > >  struct ProcessNodeRefs {
> > > >      /// Used to look up nodes using the 32-bit id that this process knows it by.
> > > >      by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
> > > > +    /// Used to quickly find unused ids in `by_handle`.
> > > > +    handle_is_present: IdPool,
> > > >      /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
> > > >      /// the underlying `Node` struct as returned by `Node::global_id`.
> > > >      by_node: RBTree<usize, u32>,
> > > > @@ -381,6 +384,7 @@ impl ProcessNodeRefs {
> > > >      fn new() -> Self {
> > > >          Self {
> > > >              by_handle: RBTree::new(),
> > > > +            handle_is_present: IdPool::new(),
> > > >              by_node: RBTree::new(),
> > > >              freeze_listeners: RBTree::new(),
> > > >          }
> > > > @@ -775,7 +779,7 @@ pub(crate) fn get_node(
> > > >      pub(crate) fn insert_or_update_handle(
> > > >          self: ArcBorrow<'_, Process>,
> > > >          node_ref: NodeRef,
> > > > -        is_mananger: bool,
> > > > +        is_manager: bool,
> > > >      ) -> Result<u32> {
> > > >          {
> > > >              let mut refs = self.node_refs.lock();
> > > > @@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
> > > >          let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
> > > >          let info = UniqueArc::new_uninit(GFP_KERNEL)?;
> > > >  
> > > > -        let mut refs = self.node_refs.lock();
> > > > +        let mut refs_lock = self.node_refs.lock();
> > > > +        let mut refs = &mut *refs_lock;
> > > > +
> > > > +        let (unused_id, by_handle_slot) = loop {
> > > > +            // ID 0 may only be used by the manager.
> > > > +            let start = if is_manager { 0 } else { 1 };
> > > > +
> > > > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > > > +                match refs.by_handle.entry(res.as_u32()) {
> > > > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > > > +                    rbtree::Entry::Occupied(_) => {
> > > > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > > > +                        res.acquire();
> > > > +                        kernel::warn_on!(true);
> > > > +                        return Err(EINVAL);
> > > 
> > > EINVAL means that user provides a wrong parameter. Here's a data
> > > corruption. Maybe EFAULT?
> > 
> > Hmm ... EFAULT is used when the user passes a dangling pointer that
> > copy_from_user() refuses to use. I would like to avoid confusion with
> > that as well. Is there a third option?
> > 
> > > > +                    }
> > > > +                }
> > > > +            }
> > > > +
> > > > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > > > +            drop(refs_lock);
> > > > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > > > +            refs_lock = self.node_refs.lock();
> > > > +            refs = &mut *refs_lock;
> > > > +            refs.handle_is_present.grow(resizer);
> > > 
> > > This continues puzzling me. Refs_lock protects refs, and the spec
> > > says:
> > > 
> > >     a reference’s scope starts from where it is introduced and
> > >     continues through the last time that reference is used.
> > > 
> > > https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html
> > > 
> > > The last usage of refs is at .grow_request() line, because later it's
> > > reused with the new value. 
> > > 
> > > If my reading of the spec is correct, after dropping the refs_lock,
> > > you may get rescheduled, and another thread may follow the same path.
> > > Because refs_lock is dropped explicitly and refs - implicitly, the
> > > concurrent thread can grab both and follow with resizing the id map.
> > > 
> > > When your first thread will get back, you'll end up resizing the
> > > already resized map.
> > > 
> > > I asked your AI, and it says that this race is indeed possible for
> > > exactly that reason. But it doesn't break memory safety, so the
> > > compiler is happy about it...
> > 
> > You're right that since we release the mutex, someone else may have
> > resized the map in the meantime. That's why we implemented grow() to do
> > nothing if it was already resized:
> > 
> > 	pub fn grow(&mut self, mut resizer: PoolResizer) {
> > 	    // Between request to grow that led to allocation of `resizer` and now,
> > 	    // another thread may have already grown the capacity.
> > 	    // In this case, do nothing, drop `resizer` and move on.
> > 	    if resizer.new.len() <= self.capacity() {
> > 	        return;
> > 	    }
> > 	
> > 	    resizer.new.copy_and_extend(&self.map);
> > 	    self.map = resizer.new;
> > 	}
> 
> OK, I see...
> 
> I expected that you'll switch the pool state to 'growing' before
> dropping the ref_lock. This way, the concurrent thread would be able
> to go straight to the new iteration.
> 
> Something like:
>         
>         if let Some(res) = refs.handle_is_present.find_unused_id(start) {
>                 ...
>         }
> 
>         if refs.is_growing() {
>                 drop(refs_lock);
>                 continue;
>         }
>         let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
>         refs.set_growing()
>         drop(refs_lock);
>         let resizer = grow_request.realloc(GFP_KERNEL)?;
>         ...
> 
> Your approach also works, assuming you're OK with a useless alloc/free
> bitmaps in the case of collision.

Hmm, having the extra ones just repeatedly lock/unlock also isn't great.
I guess one alternate option could be to have a separate mutex that you
take while allocating, but ehh I think the current logic is fine.

> So, I can take this series as is, or I can wait for v7 if you prefer.
> 
> Please advise.

Sure, go ahead! And thanks for the reviews!

Alice

Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Yury Norov 2 months, 1 week ago

On Wed, Nov 26, 2025 at 03:56:17PM +0000, Alice Ryhl wrote:
> On Wed, Nov 26, 2025 at 10:31:03AM -0500, Yury Norov wrote:
> > On Wed, Nov 26, 2025 at 08:35:55AM +0000, Alice Ryhl wrote:
> > > On Tue, Nov 25, 2025 at 09:26:46PM -0500, Yury Norov wrote:
> > > > On Tue, Nov 25, 2025 at 01:59:42PM +0000, Alice Ryhl wrote:
> > > > > To find an unused Binder handle, Rust Binder currently iterates the
> > > > > red/black tree from the beginning until it finds a gap in the keys. This
> > > > > is extremely slow.
> > > > > 
> > > > > To improve the performance, add a bitmap that keeps track of which
> > > > > indices are actually in use. This allows us to quickly find an unused
> > > > > key in the red/black tree.
> > > > > 
> > > > > For a benchmark, please see the below numbers that were obtained from
> > > > > modifying binderThroughputTest to send a node with each transaction and
> > > > > stashing it in the server. This results in the number of nodes
> > > > > increasing by one for every transaction sent. I got the following table
> > > > > of roundtrip latencies (in µs):
> > > > > 
> > > > > Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
> > > > > 0 - 10,000        │          176.88 │         92.93 │          99.41
> > > > > 10,000 - 20,000   │          437.37 │         87.74 │          98.55
> > > > > 20,000 - 30,000   │          677.49 │         76.24 │          96.37
> > > > > 30,000 - 40,000   │          901.76 │         83.39 │          96.73
> > > > > 40,000 - 50,000   │         1126.62 │        100.44 │          94.57
> > > > > 50,000 - 60,000   │         1288.98 │         94.38 │          96.64
> > > > > 60,000 - 70,000   │         1588.74 │         88.27 │          96.36
> > > > > 70,000 - 80,000   │         1812.97 │         93.97 │          91.24
> > > > > 80,000 - 90,000   │         2062.95 │         92.22 │         102.01
> > > > > 90,000 - 100,000  │         2330.03 │         97.18 │         100.31
> > > > > 
> > > > > It should be clear that the current Rust code becomes linearly slower
> > > > > per insertion as the number of calls to rb_next() per transaction
> > > > > increases. After this change, the time to find an ID number appears
> > > > > constant. (Technically it is not constant-time as both insertion and
> > > > > removal scan the entire bitmap. However, quick napkin math shows that
> > > > > scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
> > > > > in a benchmark where the rountrip latency is 100µs.)
> > > > > 
> > > > > I've included a comparison to the C driver, which uses the same bitmap
> > > > > algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
> > > > > for faster descriptor lookup").
> > > > 
> > > > Thanks for the solid numbers!
> > > > 
> > > > > This currently checks if the bitmap should be shrunk after every
> > > > > removal. One potential future change is introducing a shrinker to make
> > > > > this operation O(1), but based on the benchmark above this does not seem
> > > > > required at this time.
> > > > > 
> > > > > Reviewed-by: Burak Emir <bqe@google.com>
> > > > > Acked-by: Carlos Llamas <cmllamas@google.com>
> > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > > > ---
> > > > >  drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
> > > > >  1 file changed, 47 insertions(+), 17 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
> > > > > index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
> > > > > --- a/drivers/android/binder/process.rs
> > > > > +++ b/drivers/android/binder/process.rs
> > > > > @@ -19,6 +19,7 @@
> > > > >      cred::Credential,
> > > > >      error::Error,
> > > > >      fs::file::{self, File},
> > > > > +    id_pool::IdPool,
> > > > >      list::{List, ListArc, ListArcField, ListLinks},
> > > > >      mm,
> > > > >      prelude::*,
> > > > > @@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
> > > > >  struct ProcessNodeRefs {
> > > > >      /// Used to look up nodes using the 32-bit id that this process knows it by.
> > > > >      by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
> > > > > +    /// Used to quickly find unused ids in `by_handle`.
> > > > > +    handle_is_present: IdPool,
> > > > >      /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
> > > > >      /// the underlying `Node` struct as returned by `Node::global_id`.
> > > > >      by_node: RBTree<usize, u32>,
> > > > > @@ -381,6 +384,7 @@ impl ProcessNodeRefs {
> > > > >      fn new() -> Self {
> > > > >          Self {
> > > > >              by_handle: RBTree::new(),
> > > > > +            handle_is_present: IdPool::new(),
> > > > >              by_node: RBTree::new(),
> > > > >              freeze_listeners: RBTree::new(),
> > > > >          }
> > > > > @@ -775,7 +779,7 @@ pub(crate) fn get_node(
> > > > >      pub(crate) fn insert_or_update_handle(
> > > > >          self: ArcBorrow<'_, Process>,
> > > > >          node_ref: NodeRef,
> > > > > -        is_mananger: bool,
> > > > > +        is_manager: bool,
> > > > >      ) -> Result<u32> {
> > > > >          {
> > > > >              let mut refs = self.node_refs.lock();
> > > > > @@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
> > > > >          let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
> > > > >          let info = UniqueArc::new_uninit(GFP_KERNEL)?;
> > > > >  
> > > > > -        let mut refs = self.node_refs.lock();
> > > > > +        let mut refs_lock = self.node_refs.lock();
> > > > > +        let mut refs = &mut *refs_lock;
> > > > > +
> > > > > +        let (unused_id, by_handle_slot) = loop {
> > > > > +            // ID 0 may only be used by the manager.
> > > > > +            let start = if is_manager { 0 } else { 1 };
> > > > > +
> > > > > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > > > > +                match refs.by_handle.entry(res.as_u32()) {
> > > > > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > > > > +                    rbtree::Entry::Occupied(_) => {
> > > > > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > > > > +                        res.acquire();
> > > > > +                        kernel::warn_on!(true);
> > > > > +                        return Err(EINVAL);
> > > > 
> > > > EINVAL means that user provides a wrong parameter. Here's a data
> > > > corruption. Maybe EFAULT?
> > > 
> > > Hmm ... EFAULT is used when the user passes a dangling pointer that
> > > copy_from_user() refuses to use. I would like to avoid confusion with
> > > that as well. Is there a third option?
> > > 
> > > > > +                    }
> > > > > +                }
> > > > > +            }
> > > > > +
> > > > > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > > > > +            drop(refs_lock);
> > > > > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > > > > +            refs_lock = self.node_refs.lock();
> > > > > +            refs = &mut *refs_lock;
> > > > > +            refs.handle_is_present.grow(resizer);
> > > > 
> > > > This continues puzzling me. Refs_lock protects refs, and the spec
> > > > says:
> > > > 
> > > >     a reference’s scope starts from where it is introduced and
> > > >     continues through the last time that reference is used.
> > > > 
> > > > https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html
> > > > 
> > > > The last usage of refs is at .grow_request() line, because later it's
> > > > reused with the new value. 
> > > > 
> > > > If my reading of the spec is correct, after dropping the refs_lock,
> > > > you may get rescheduled, and another thread may follow the same path.
> > > > Because refs_lock is dropped explicitly and refs - implicitly, the
> > > > concurrent thread can grab both and follow with resizing the id map.
> > > > 
> > > > When your first thread will get back, you'll end up resizing the
> > > > already resized map.
> > > > 
> > > > I asked your AI, and it says that this race is indeed possible for
> > > > exactly that reason. But it doesn't break memory safety, so the
> > > > compiler is happy about it...
> > > 
> > > You're right that since we release the mutex, someone else may have
> > > resized the map in the meantime. That's why we implemented grow() to do
> > > nothing if it was already resized:
> > > 
> > > 	pub fn grow(&mut self, mut resizer: PoolResizer) {
> > > 	    // Between request to grow that led to allocation of `resizer` and now,
> > > 	    // another thread may have already grown the capacity.
> > > 	    // In this case, do nothing, drop `resizer` and move on.
> > > 	    if resizer.new.len() <= self.capacity() {
> > > 	        return;
> > > 	    }
> > > 	
> > > 	    resizer.new.copy_and_extend(&self.map);
> > > 	    self.map = resizer.new;
> > > 	}
> > 
> > OK, I see...
> > 
> > I expected that you'll switch the pool state to 'growing' before
> > dropping the ref_lock. This way, the concurrent thread would be able
> > to go straight to the new iteration.
> > 
> > Something like:
> >         
> >         if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> >                 ...
> >         }
> > 
> >         if refs.is_growing() {
> >                 drop(refs_lock);
> >                 continue;
> >         }
> >         let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> >         refs.set_growing()
> >         drop(refs_lock);
> >         let resizer = grow_request.realloc(GFP_KERNEL)?;
> >         ...
> > 
> > Your approach also works, assuming you're OK with a useless alloc/free
> > bitmaps in the case of collision.
> 
> Hmm, having the extra ones just repeatedly lock/unlock also isn't great.
> I guess one alternate option could be to have a separate mutex that you
> take while allocating, but ehh I think the current logic is fine.

Not sure I understand... In case you've found nothing and need to
grow, you anyways have to lock and unlock, at least to allocate a
bigger bitmap.

But this doesn't look critical to me because probability of such event
decreases exponentially as the ID pool grows. So either approach works.
 
> > So, I can take this series as is, or I can wait for v7 if you prefer.
> > 
> > Please advise.
> 
> Sure, go ahead! And thanks for the reviews!

OK, will do.

Thanks,
Yury

Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Alice Ryhl 2 months, 1 week ago

On Wed, Nov 26, 2025 at 5:13 PM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Wed, Nov 26, 2025 at 03:56:17PM +0000, Alice Ryhl wrote:
> > On Wed, Nov 26, 2025 at 10:31:03AM -0500, Yury Norov wrote:
> > > On Wed, Nov 26, 2025 at 08:35:55AM +0000, Alice Ryhl wrote:
> > > > On Tue, Nov 25, 2025 at 09:26:46PM -0500, Yury Norov wrote:
> > > > > On Tue, Nov 25, 2025 at 01:59:42PM +0000, Alice Ryhl wrote:
> > > > > > To find an unused Binder handle, Rust Binder currently iterates the
> > > > > > red/black tree from the beginning until it finds a gap in the keys. This
> > > > > > is extremely slow.
> > > > > >
> > > > > > To improve the performance, add a bitmap that keeps track of which
> > > > > > indices are actually in use. This allows us to quickly find an unused
> > > > > > key in the red/black tree.
> > > > > >
> > > > > > For a benchmark, please see the below numbers that were obtained from
> > > > > > modifying binderThroughputTest to send a node with each transaction and
> > > > > > stashing it in the server. This results in the number of nodes
> > > > > > increasing by one for every transaction sent. I got the following table
> > > > > > of roundtrip latencies (in µs):
> > > > > >
> > > > > > Transaction Range │ Baseline (Rust) │ Bitmap (Rust) │ Comparison (C)
> > > > > > 0 - 10,000        │          176.88 │         92.93 │          99.41
> > > > > > 10,000 - 20,000   │          437.37 │         87.74 │          98.55
> > > > > > 20,000 - 30,000   │          677.49 │         76.24 │          96.37
> > > > > > 30,000 - 40,000   │          901.76 │         83.39 │          96.73
> > > > > > 40,000 - 50,000   │         1126.62 │        100.44 │          94.57
> > > > > > 50,000 - 60,000   │         1288.98 │         94.38 │          96.64
> > > > > > 60,000 - 70,000   │         1588.74 │         88.27 │          96.36
> > > > > > 70,000 - 80,000   │         1812.97 │         93.97 │          91.24
> > > > > > 80,000 - 90,000   │         2062.95 │         92.22 │         102.01
> > > > > > 90,000 - 100,000  │         2330.03 │         97.18 │         100.31
> > > > > >
> > > > > > It should be clear that the current Rust code becomes linearly slower
> > > > > > per insertion as the number of calls to rb_next() per transaction
> > > > > > increases. After this change, the time to find an ID number appears
> > > > > > constant. (Technically it is not constant-time as both insertion and
> > > > > > removal scan the entire bitmap. However, quick napkin math shows that
> > > > > > scanning the entire bitmap with N=100k takes ~1.5µs, which is neglible
> > > > > > in a benchmark where the rountrip latency is 100µs.)
> > > > > >
> > > > > > I've included a comparison to the C driver, which uses the same bitmap
> > > > > > algorithm as this patch since commit 15d9da3f818c ("binder: use bitmap
> > > > > > for faster descriptor lookup").
> > > > >
> > > > > Thanks for the solid numbers!
> > > > >
> > > > > > This currently checks if the bitmap should be shrunk after every
> > > > > > removal. One potential future change is introducing a shrinker to make
> > > > > > this operation O(1), but based on the benchmark above this does not seem
> > > > > > required at this time.
> > > > > >
> > > > > > Reviewed-by: Burak Emir <bqe@google.com>
> > > > > > Acked-by: Carlos Llamas <cmllamas@google.com>
> > > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > > > > ---
> > > > > >  drivers/android/binder/process.rs | 64 ++++++++++++++++++++++++++++-----------
> > > > > >  1 file changed, 47 insertions(+), 17 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs
> > > > > > index f13a747e784c84a0fb09cbf47442712106eba07c..9264961fd92b33c07fcd5353740cc0b1ec978afd 100644
> > > > > > --- a/drivers/android/binder/process.rs
> > > > > > +++ b/drivers/android/binder/process.rs
> > > > > > @@ -19,6 +19,7 @@
> > > > > >      cred::Credential,
> > > > > >      error::Error,
> > > > > >      fs::file::{self, File},
> > > > > > +    id_pool::IdPool,
> > > > > >      list::{List, ListArc, ListArcField, ListLinks},
> > > > > >      mm,
> > > > > >      prelude::*,
> > > > > > @@ -367,6 +368,8 @@ impl ListItem<{Self::LIST_NODE}> for NodeRefInfo {
> > > > > >  struct ProcessNodeRefs {
> > > > > >      /// Used to look up nodes using the 32-bit id that this process knows it by.
> > > > > >      by_handle: RBTree<u32, ListArc<NodeRefInfo, { NodeRefInfo::LIST_PROC }>>,
> > > > > > +    /// Used to quickly find unused ids in `by_handle`.
> > > > > > +    handle_is_present: IdPool,
> > > > > >      /// Used to look up nodes without knowing their local 32-bit id. The usize is the address of
> > > > > >      /// the underlying `Node` struct as returned by `Node::global_id`.
> > > > > >      by_node: RBTree<usize, u32>,
> > > > > > @@ -381,6 +384,7 @@ impl ProcessNodeRefs {
> > > > > >      fn new() -> Self {
> > > > > >          Self {
> > > > > >              by_handle: RBTree::new(),
> > > > > > +            handle_is_present: IdPool::new(),
> > > > > >              by_node: RBTree::new(),
> > > > > >              freeze_listeners: RBTree::new(),
> > > > > >          }
> > > > > > @@ -775,7 +779,7 @@ pub(crate) fn get_node(
> > > > > >      pub(crate) fn insert_or_update_handle(
> > > > > >          self: ArcBorrow<'_, Process>,
> > > > > >          node_ref: NodeRef,
> > > > > > -        is_mananger: bool,
> > > > > > +        is_manager: bool,
> > > > > >      ) -> Result<u32> {
> > > > > >          {
> > > > > >              let mut refs = self.node_refs.lock();
> > > > > > @@ -794,7 +798,33 @@ pub(crate) fn insert_or_update_handle(
> > > > > >          let reserve2 = RBTreeNodeReservation::new(GFP_KERNEL)?;
> > > > > >          let info = UniqueArc::new_uninit(GFP_KERNEL)?;
> > > > > >
> > > > > > -        let mut refs = self.node_refs.lock();
> > > > > > +        let mut refs_lock = self.node_refs.lock();
> > > > > > +        let mut refs = &mut *refs_lock;
> > > > > > +
> > > > > > +        let (unused_id, by_handle_slot) = loop {
> > > > > > +            // ID 0 may only be used by the manager.
> > > > > > +            let start = if is_manager { 0 } else { 1 };
> > > > > > +
> > > > > > +            if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > > > > > +                match refs.by_handle.entry(res.as_u32()) {
> > > > > > +                    rbtree::Entry::Vacant(entry) => break (res, entry),
> > > > > > +                    rbtree::Entry::Occupied(_) => {
> > > > > > +                        pr_err!("Detected mismatch between handle_is_present and by_handle");
> > > > > > +                        res.acquire();
> > > > > > +                        kernel::warn_on!(true);
> > > > > > +                        return Err(EINVAL);
> > > > >
> > > > > EINVAL means that user provides a wrong parameter. Here's a data
> > > > > corruption. Maybe EFAULT?
> > > >
> > > > Hmm ... EFAULT is used when the user passes a dangling pointer that
> > > > copy_from_user() refuses to use. I would like to avoid confusion with
> > > > that as well. Is there a third option?
> > > >
> > > > > > +                    }
> > > > > > +                }
> > > > > > +            }
> > > > > > +
> > > > > > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > > > > > +            drop(refs_lock);
> > > > > > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > > > > > +            refs_lock = self.node_refs.lock();
> > > > > > +            refs = &mut *refs_lock;
> > > > > > +            refs.handle_is_present.grow(resizer);
> > > > >
> > > > > This continues puzzling me. Refs_lock protects refs, and the spec
> > > > > says:
> > > > >
> > > > >     a reference’s scope starts from where it is introduced and
> > > > >     continues through the last time that reference is used.
> > > > >
> > > > > https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html
> > > > >
> > > > > The last usage of refs is at .grow_request() line, because later it's
> > > > > reused with the new value.
> > > > >
> > > > > If my reading of the spec is correct, after dropping the refs_lock,
> > > > > you may get rescheduled, and another thread may follow the same path.
> > > > > Because refs_lock is dropped explicitly and refs - implicitly, the
> > > > > concurrent thread can grab both and follow with resizing the id map.
> > > > >
> > > > > When your first thread will get back, you'll end up resizing the
> > > > > already resized map.
> > > > >
> > > > > I asked your AI, and it says that this race is indeed possible for
> > > > > exactly that reason. But it doesn't break memory safety, so the
> > > > > compiler is happy about it...
> > > >
> > > > You're right that since we release the mutex, someone else may have
> > > > resized the map in the meantime. That's why we implemented grow() to do
> > > > nothing if it was already resized:
> > > >
> > > >   pub fn grow(&mut self, mut resizer: PoolResizer) {
> > > >       // Between request to grow that led to allocation of `resizer` and now,
> > > >       // another thread may have already grown the capacity.
> > > >       // In this case, do nothing, drop `resizer` and move on.
> > > >       if resizer.new.len() <= self.capacity() {
> > > >           return;
> > > >       }
> > > >
> > > >       resizer.new.copy_and_extend(&self.map);
> > > >       self.map = resizer.new;
> > > >   }
> > >
> > > OK, I see...
> > >
> > > I expected that you'll switch the pool state to 'growing' before
> > > dropping the ref_lock. This way, the concurrent thread would be able
> > > to go straight to the new iteration.
> > >
> > > Something like:
> > >
> > >         if let Some(res) = refs.handle_is_present.find_unused_id(start) {
> > >                 ...
> > >         }
> > >
> > >         if refs.is_growing() {
> > >                 drop(refs_lock);
> > >                 continue;
> > >         }
> > >         let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > >         refs.set_growing()
> > >         drop(refs_lock);
> > >         let resizer = grow_request.realloc(GFP_KERNEL)?;
> > >         ...
> > >
> > > Your approach also works, assuming you're OK with a useless alloc/free
> > > bitmaps in the case of collision.
> >
> > Hmm, having the extra ones just repeatedly lock/unlock also isn't great.
> > I guess one alternate option could be to have a separate mutex that you
> > take while allocating, but ehh I think the current logic is fine.
>
> Not sure I understand... In case you've found nothing and need to
> grow, you anyways have to lock and unlock, at least to allocate a
> bigger bitmap.

I'm mostly concerned that it's going to unlock/lock many many many
times while the other thread is busy allocating.

Alice

> But this doesn't look critical to me because probability of such event
> decreases exponentially as the ID pool grows. So either approach works.
>
> > > So, I can take this series as is, or I can wait for v7 if you prefer.
> > >
> > > Please advise.
> >
> > Sure, go ahead! And thanks for the reviews!
>
> OK, will do.
>
> Thanks,
> Yury

Re: [PATCH v6 6/6] rust_binder: use bitmap for allocation of handles

Posted by Burak Emir 2 months, 2 weeks ago

> > +
> > +            let grow_request = refs.handle_is_present.grow_request().ok_or(ENOMEM)?;
> > +            drop(refs_lock);
> > +            let resizer = grow_request.realloc(GFP_KERNEL)?;
> > +            refs_lock = self.node_refs.lock();
> > +            refs = &mut *refs_lock;
> > +            refs.handle_is_present.grow(resizer);
>
> This continues puzzling me. Refs_lock protects refs, and the spec
> says:
>
>     a reference’s scope starts from where it is introduced and
>     continues through the last time that reference is used.
>
> https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html
>
> The last usage of refs is at .grow_request() line, because later it's
> reused with the new value.
>
> If my reading of the spec is correct, after dropping the refs_lock,
> you may get rescheduled, and another thread may follow the same path.
> Because refs_lock is dropped explicitly and refs - implicitly, the
> concurrent thread can grab both and follow with resizing the id map.
>
> When your first thread will get back, you'll end up resizing the
> already resized map.
>
> I asked your AI, and it says that this race is indeed possible for
> exactly that reason. But it doesn't break memory safety, so the
> compiler is happy about it...

Yes - the writes to the map field are synchronized.
So there is no data race (= unsynchronized {read, write} / write
conflict) and corruption cannot happen.
Never mind that there is a race involving data : )

The scenario you describe is why id_pool grow() and shrink()
double-check before doing copying.
So the thread that loses the race has allocated a new array for
nothing, but things move on.

[PATCH v6 1/6] rust: bitmap: add MAX_LEN and MAX_INLINE_LEN constants
[PATCH v6 2/6] rust: bitmap: add BitmapVec::new_inline()
[PATCH v6 3/6] rust: id_pool: rename IdPool::new() to with_capacity()
[PATCH v6 4/6] rust: id_pool: do not supply starting capacity
[PATCH v6 5/6] rust: id_pool: do not immediately acquire new ids
[PATCH v6 6/6] rust_binder: use bitmap for allocation of handles